Since the Claude 3 generation launched back in 2024, Anthropic has sold Sonnet as its Goldilocks model. Opus is the smart one—expensive and slow. Haiku is the fast, cheap one you don’t trust with real reasoning. Sonnet is the one in between: smart enough for complex work, cheap and quick enough for daily use.
The catch is that “just right” only holds if you’re not too expensive, slow, or dumb next to the alternatives—and Sonnet 5 is all three. Turn Sonnet 5 up to its highest effort and it only matches Opus 4.8 running at low-to-medium, while costing more per task. The cheap option isn’t cheap. On speed, any gains over Opus are swallowed up by the time you spend correcting its missteps. The fast option isn’t fast. On intelligence, Kieran Klaassen watched it devote three hours to a broken build, getting stuck in loops and burning a lot of tokens without making any progress in the process. The smart option isn’t smart enough.
In trying to be the Goldilocks model, Sonnet 5 ends up both too hot and too cold compared to the alternatives.
Sonnet 5 is a decent model with the bad luck to arrive in the wake of everything that came before it. We get into all of it below.
What Anthropic is saying
Anthropic is pitching Sonnet 5 as the most agentic Sonnet yet—Opus 4.8-level results at a fraction of the price, with fewer of the old model’s rough edges. Our testing found the coding and agentic gains are modest, and the two strongest marketing claims—cheaper, and close to Opus—don’t survive contact with our workflows.
“The most agentic Sonnet model yet”
Mostly true, but mixed. It makes more decisions than older Sonnet models, though not always better ones. Sometimes it stops asking clarifying questions and decides it has enough context too early.
“Performance close to that of Opus 4.8,” at lower cost
Shaky. To get near Opus 4.8, Sonnet 5 appears to need higher effort settings, where the price advantage starts to disappear.
A “substantial improvement” over Sonnet 4.6
Some improvement, yes. Substantial feels overstated. Our testing found that it’s not obviously faster or cheaper than its predecessor, or good enough to replace Opus 4.8 for essential tasks.
Lower hallucination, less sycophancy, and more reliable refusals of unsafe tasks
Anthropic says Sonnet 5 hallucinates less than Sonnet 4.6, but we still caught it hallucinating or misreading source material in ways that made it hard to trust. The sycophancy and refusal claims hold up better, although it can read as obstinate or adversarial if you’re used to a friendlier model.
The Reach Test

“I ran the whole LFGBench suite on it, and I don’t see the fit. Its Rubber Duck store kinda works, but there’s no reason to reach for it over Opus 4.8 or Fable—it’s not faster, it’s not cheaper, it’s just worse. On the agentic stuff it keeps getting stuck in loops and running forever without the intelligence to actually solve anything. It’s not a model for coding—it thinks it’s smart enough to keep running in circles, but it’s not.”

“I had a look at the benchmarks and it’s kind of meh—not cheaper, not faster, not better, and I’m not sure why they bumped it to 5 when it feels like a 4.8. It did fine on our PowerPoint template and didn’t make any major mistakes, and I’d actually consider it for AskRally [an audience simulation tool that I built], but the pricing is a little out versus Gemini, which is my current default at $2 per 1 million against Sonnet’s $3. Not fast or cheap enough to compete with open-source, not smart enough to compete with GPT-5.5. There’s a use for it if the numbers line up on a specific job, but nothing here made me want to switch a daily workflow over.”

“I wanted Sonnet 5 to be useful for fast, practical go-to-market work: follow-up emails, campaign drafts, the stuff where you want a model to get the shape right quickly so you can polish instead of rebuild. I gave it the same Codex Power User Camp follow-up email prompt as Opus 4.8. Opus asked clarifying questions first and came back close to sendable. Sonnet 5 skipped that step and gave me the shell of an email I might send, but one I’d have to rewrite line by line. It just did the thing, but worse.”

“I had high hopes for Sonnet 5 as a faster, more iterative writing partner. But it’s too stubborn, too opinionated, and not smart enough to meet that need when GPT-5.5 is sitting right there as an alternative. I don’t have a category where it becomes my default: I wouldn’t use it for coding or workflow development over Opus 4.8 or Fable, and I don’t trust it enough on writing to hand it the parts of the process where I lean on models most.”

“I really enjoyed Sonnet 5 at first. It seemed to be able to handle much of the collaborative UI/UX polish work that the previous Sonnet couldn’t—and that Opus is too slow for. Its responses are brusque without being robotic; it’s the Claude model with the most, for lack of a better word, attitude. But its ability to pull the right context for editing tasks is hit-or-miss. Sometimes it checks facts and fetches links from the Every archive without being asked, other times it doesn’t (whereas GPT-5.5 reliably does both). Once Fable 5 came back online and I could run UI work through its speedier low effort setting, I found myself wondering when I’d use Sonnet 5 at all. Maybe next week, when Fable leaves the Claude Max plan.”
Subscribers only
Only available for paid subscribers
Get full access to the verdicts, benchmarks, and model comparisons.
Subscribe to unlock →Coding: Too weak for hard work
Writing: Competent prose with shaky judgment
Knowledge work: Opus 4.5 work in an Opus 4.8 world
Agent behavior: Agency that can’t be trusted
The verdict: A capable model that arrived a generation late
Katie Parrott is a staff writer at Every. You can read more of her work in her newsletter.
To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.
Discover Every’s upcoming workshops and camps, and access recordings from past events.
For sponsorship opportunities, reach out to [email protected].
Get all of our AI ideas, apps, and training
Every is the only subscription you need to stay at the edge of AI, trusted by 100,000 builders.
Expert led courses and camps
Four productivity apps
A Discord community learning together