Why Opus 4.8 Pulled Me Back to Claude

By Every

Categories: Startup, Product, AI

Summary

Opus 4.8 marks Anthropic's comeback with a 63-point senior engineer benchmark score—30 points higher than 4.7 and nearly matching GPT-5.5—but Claude's fragmented app (chat/code/cowork tabs) still loses to Claude's superior UX, forcing users to switch between platforms despite the superior model.

Key Takeaways

Opus 4.8 scores 63 on senior engineer benchmarks, 30 points higher than 4.7 and nearly equal to GPT-5.5—represents major capability jump across coding, writing, and knowledge work simultaneously.
Model performance is heavily dependent on reasoning settings: extra high and high reasoning deliver significantly better results than medium/high, especially for complex programming and critical writing tasks.
Claude app UX fragmentation (separate chat/code/cowork tabs managed by different teams) creates friction despite superior model—harness quality now matters as much as model capability for user adoption.
Knowledge work tasks like automated slide deck generation show significant improvement—Opus 4.8 produces depth and styling comparable to human first-pass work, unlike thin auto-generated alternatives.
Competitive dynamics shifting: Claude users were defecting to Claude/GPT-5.5 during 4.7's weakness; Opus 4.8's strength pulls them back, demonstrating model quality directly impacts enterprise switching costs.

Related topics

Transcript Excerpt

It's model release day. Opus 4.8 8 drops today. But honestly, they could have called it Opus 5 cuz this is a really great model. Anthropic, I know you're trying to underpromise, but you are overd delivering. We have been testing it internally for about a week here at EveryY. And here is your day zero vibe check. Before we get into it, what is Every? Every is the only subscription you need to stay at the edge of AI. You can kind of think of us like an applied AI lab for the future of work. We're about 30 people. We're all early adopters of these tools. We write about all the new models, all the ways that we use it for use them for coding, writing, design, company building, and more. We have a suite of products that we build for ourselves to help us work better with AI. And we also do a lot …

More from Every