Our verdict: Anthropic's latest model, Opus 4.6, is the best AI coder we've tested. It also sometimes makes maddening errors.
It solved a real iOS coding task that stumped both the GPT-5.3 Codex and Opus 4.5. It is more thorough, explores context more carefully, and is smarter than Opus 4.5. But this power has trade-offs. It's slower and a bit more verbose, and it still falls prey to classic Claudisms: It sometimes makes changes you didn't expect and doesn't always know its own skills and capabilities.
When it comes to coding, it feels like Anthropic looked at Codex's strong points—thorough, precise work on tough tasks—and tried to incorporate them into this release.
On writing and editing, the drafting experience is more fluid than Opus 4.5. It applies editorial rules more consistently and translates technical concepts into accessible prose more naturally. But in a blind test, the team preferred Opus 4.5's prose—Opus 4.6 seems more prone to AI-isms like “X not Y” constructions than its predecessor.
What Anthropic told us
Best-in-class for coding and professional work
Built to power agents that handle whole categories of real-world work, excelling across the entire software development lifecycle.
Its most agentic model yet
It drives tasks forward with less handholding—parallelizing work, gathering more context, and taking smarter autonomous actions.
“Adaptive Thinking” replaces “Extended Thinking”
The model adjusts how much it thinks based on the difficulty of your question. On easy tasks, it skips the deep reasoning step entirely. This is on by default everywhere.
The Reach Test
“I shipped a merged pull request on a codebase I've never touched—it researched an unsolved iOS problem and wrote a working fix that left me stunned. I also like its default parallelization for knowledge-work tasks. It raises the ceiling for what's possible with AI coding. But I also found it to be sometimes unreliable, and to require closer management than Codex does.”
“It's better at understanding real code bases and doing more work longer than Opus 4.5. For vibe coders starting fresh, it might not be a super big jump, but it's a really nice step up for day-to-day coders with bigger projects and a great refinement from what was missing. The model's medium thinking option, a setting to make the model think less, is a good, faster alternative.”
“On the morning I got access to the new model, I canceled my Anthropic subscription because I wasn't using it anymore. That afternoon, I got access to this new model, and I might resubscribe because of it. It's good at thinking and figuring out gnarly issues, such as how to start Monologue keyboard dictation without forcing the user to switch back manually to the original app, or new features that are complicated to implement. I'm quite shocked that Dan, with Opus 4.6, was able to push a pull request to the Monologue iOS app.”
“The drafting experience is so much more fluid and responsive than Opus 4.5—I feel like I'm collaborating rather than wrestling. What I'm really loving, though, is the agentic-ness of the chat experience. The resourcefulness and adaptation to the needs of your request are noticeable. I almost thought I was in Claude Code. One caveat: In the blind writing test, I was the outlier—the team preferred Opus 4.5. But I'm confident the AI ‘smell’ will resolve with time and better prompting.”
The headline findings
Two big stories emerged from our testing.
Members only
Only available for paid subscribers
Get full access to the verdicts, benchmarks, and writing analysis.
Subscribe to unlock →Signal cascade
Arc lint, node haze, frame tethered; looped soft and landed wide. The line held, the hinge set, the lock slid early.
Trace spool, log fog, seam drop; path locked, run shipped, no extra pulls. The mesh held, the marks lined up.
Stalled
Overcast rules, edge case drift, no lock on the return path.
Missed
Half pass, seam gaps, return line loose and unstitched.
Landed
Mapped shell, stitched seam, full path landed clean.
Parallel drift
Multi lanes sparked, then braided back without prompts. The flow stayed intact, the blocks aligned, the bridge held.
Long arcs, cadence banded, next move early. The trace held its line under load.
“Shape held, lane kept, run stayed clean.”
— Team note
Scope locks early
Frames the task, trims edge noise, keeps the lane clean end to end.
Parallel threads by default
Spins multiple passes, braids the best line, keeps output stable.
Adapts to intent
Finds adjacent tools, shifts tone, lands clean without perfect prompts.
Pace wavers under load
The arc slows, the loop elongates, and the cadence slips between passes.
Surface alignment
Contours drift in visual builds; polish lands uneven across frames.
Signal haze
Occasional claims outrun the ground truth beneath them.
Core systems
Strong foundations; steadier in hard architecture.
Agentic lanes
Parallel flow; wider reach across tasks.
Writing & editing
Rhythm patterns show; polish still uneven.
Opus 4.6 for coding
Bench arcs tighten; long runs resolve with fewer gaps.
1. Drift Field
Surface precision, layout fidelity, and constraint hold.
2. Island Loop
Depth cues, frame pacing, and spatial coherence.
Speed: the trade-off
Longer arcs, cleaner landings; faster lanes stay lighter.
Opus 4.6 for writing and editing
Tone shifts, cadence bands, and signature traces emerge.
Pattern A
Tight loops, clean cadence, predictable hinge.
Pattern B
Smoother flow, lighter drag, subtler seams.
Pattern C
Varied texture, uneven rhythm, higher drift.
Final thoughts
Arc and lane converge; the trace holds under pressure, the seam lands clean.
Get all of our AI ideas, apps, and training
Every is the only subscription you need to stay at the edge of AI—trusted by 100,000 builders.
Expert led courses and camps
Four productivity apps
A Discord community learning together