No hype Claude Opus 4.8 review—my real experience
Summary
Claude Opus 4.8 excels at one-shot feature building (69.2% on SWE-bench Pro, 5 points above 4.7) but struggles with edge cases, hallucinations, and contextual understanding in existing codebases—revealing a critical gap between initial capability and production-ready code.
Key Takeaways
- Opus 4.8 costs $5 per input / $25 per output tokens (significantly higher than predecessors) and defaults to 'high effort' mode, requiring careful ROI calculation for production agentic workflows before deployment.
- One-shot feature generation works well (autonomously coded 20-minute feature that shipped live), but the model consistently fails on the 'last 10%'—debugging, edge case handling, and iterative refinement with existing code.
- Verified hallucination risk: model invents solutions based on hypothesis rather than data, particularly in scoped follow-ups and bug-hunting scenarios, even on high-effort reasoning mode—critical consideration for autonomous agents.
- Context blindness with existing codebases: model struggles to understand elevation/scope when asked to rebase branches or integrate into live projects, requiring multiple correction cycles rather than understanding code structure boundaries.
- Ambition gap in creative/exploratory tasks: model generates functional but conservative solutions, lacking the lateral thinking needed for novel use cases—useful signal for product teams evaluating agentic autonomy limitations.
Related topics
Transcript Excerpt
Welcome back to How I AI. I'm Clarvo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today we have a very special mini episode because Anthropic just dropped Opus 4.8, their latest state-of-the-art coding model. And I got a few hours of early access. and I'm here to share my very early thoughts about where this model is intended to perform well, where it did a great job and totally impress me, and where there's still a little bit further to go. Let's get to it. As you can tell, I am not in my regular how I AI studio. And that's because I am so excited to give you my early thoughts on Opus 4.8 and couldn't wait between meetings to share what I thought. So to get started, I want to talk about what this model is, what Anthropic has told us abo…
More from How I AI Podcast
- How the engineer behind Claude Cowork actually uses Claude | Felix Rieseberg (Anthropic)
- Why this Claude Code engineer uses HTML files as AI specs | Thariq Shihipar (Anthropic)
- The internal AI tool that's transforming how Stripe designs products | Owen Williams
- Claude Code Just Got WAY More Powerful
- Stripe's "Minions": How AI agents write 1,300 PRs weekly with 0 human coding