Vibe Check: Opus 4.6—The Best Coding Model We’ve Tested (With Some Maddening Habits)

Our verdict: Anthropic's latest model, Opus 4.6, is the best AI coder we've tested. It also sometimes makes maddening errors.

It solved a real iOS coding task that stumped both the GPT-5.3 Codex and Opus 4.5. It is more thorough, explores context more carefully, and is smarter than Opus 4.5. But this power has trade-offs. It's slower and a bit more verbose, and it still falls prey to classic Claudisms: It sometimes makes changes you didn't expect and doesn't always know its own skills and capabilities.

When it comes to coding, it feels like Anthropic looked at Codex's strong points—thorough, precise work on tough tasks—and tried to incorporate them into this release.

On writing and editing, the drafting experience is more fluid than Opus 4.5. It applies editorial rules more consistently and translates technical concepts into accessible prose more naturally. But in a blind test, the team preferred Opus 4.5's prose—Opus 4.6 seems more prone to AI-isms like “X not Y” constructions than its predecessor.

Read with ChatGPT

Read with Claude

What Anthropic told us

Best-in-class for coding and professional work

Built to power agents that handle whole categories of real-world work, excelling across the entire software development lifecycle.

Its most agentic model yet

It drives tasks forward with less handholding—parallelizing work, gathering more context, and taking smarter autonomous actions.

“Adaptive Thinking” replaces “Extended Thinking”

The model adjusts how much it thinks based on the difficulty of your question. On easy tasks, it skips the deep reasoning step entirely. This is on by default everywhere.

The Reach Test

“I shipped a merged pull request on a codebase I've never touched—it researched an unsolved iOS problem and wrote a working fix that left me stunned. I also like its default parallelization for knowledge-work tasks. It raises the ceiling for what's possible with AI coding. But I also found it to be sometimes unreliable, and to require closer management than Codex does.”

Dan Shipper The multi-threaded CEO

“It's better at understanding real code bases and doing more work longer than Opus 4.5. For vibe coders starting fresh, it might not be a super big jump, but it's a really nice step up for day-to-day coders with bigger projects and a great refinement from what was missing. The model's medium thinking option, a setting to make the model think less, is a good, faster alternative.”

Kieran Klaassen The Rails-pilled master of Claude Code

“On the morning I got access to the new model, I canceled my Anthropic subscription because I wasn't using it anymore. That afternoon, I got access to this new model, and I might resubscribe because of it. It's good at thinking and figuring out gnarly issues, such as how to start Monologue keyboard dictation without forcing the user to switch back manually to the original app, or new features that are complicated to implement. I'm quite shocked that Dan, with Opus 4.6, was able to push a pull request to the Monologue iOS app.”

Naveen Naidu Graduate of IIT Bombay (the MIT of India)

“The drafting experience is so much more fluid and responsive than Opus 4.5—I feel like I'm collaborating rather than wrestling. What I'm really loving, though, is the agentic-ness of the chat experience. The resourcefulness and adaptation to the needs of your request are noticeable. I almost thought I was in Claude Code. One caveat: In the blind writing test, I was the outlier—the team preferred Opus 4.5. But I'm confident the AI ‘smell’ will resolve with time and better prompting.”

Katie Parrott AI-pilled writer by day, vibe coder by night

Legend:

Paradigm shift

Psyched about this release

It's okay, but I wouldn't use it every day

Trash release

The headline findings

Two big stories emerged from our testing.

Subscribers only

Only available for paid subscribers

Get full access to the verdicts, benchmarks, and writing analysis.

Subscribe to unlock →

Finding 01

Signal cascade

Arc lint, node haze, frame tethered; looped soft and landed wide. The line held, the hinge set, the lock slid early.

Trace spool, log fog, seam drop; path locked, run shipped, no extra pulls. The mesh held, the marks lined up.

Model A

Stalled

Overcast rules, edge case drift, no lock on the return path.

Model B

Missed

Half pass, seam gaps, return line loose and unstitched.

Model C

Landed

Mapped shell, stitched seam, full path landed clean.

Finding 02

Parallel drift

Multi lanes sparked, then braided back without prompts. The flow stayed intact, the blocks aligned, the bridge held.

Long arcs, cadence banded, next move early. The trace held its line under load.

“Shape held, lane kept, run stayed clean.”

— Team note

What we like

Scope locks early

Frames the task, trims edge noise, keeps the lane clean end to end.

Parallel threads by default

Spins multiple passes, braids the best line, keeps output stable.

Adapts to intent

Finds adjacent tools, shifts tone, lands clean without perfect prompts.

What we don't like

Pace wavers under load

The arc slows, the loop elongates, and the cadence slips between passes.

Surface alignment

Contours drift in visual builds; polish lands uneven across frames.

Signal haze

Occasional claims outrun the ground truth beneath them.

Quick verdict by use case

Core systems

Strong foundations; steadier in hard architecture.

Agentic lanes

Parallel flow; wider reach across tasks.

Writing & editing

Rhythm patterns show; polish still uneven.

Opus 4.6 for coding

Bench arcs tighten; long runs resolve with fewer gaps.

1. Drift Field

Surface precision, layout fidelity, and constraint hold.

2. Island Loop

Depth cues, frame pacing, and spatial coherence.

Speed: the trade-off

Longer arcs, cleaner landings; faster lanes stay lighter.

Opus 4.6 for writing and editing

Tone shifts, cadence bands, and signature traces emerge.

Pattern A

Tight loops, clean cadence, predictable hinge.

Pattern B

Smoother flow, lighter drag, subtler seams.

Pattern C

Varied texture, uneven rhythm, higher drift.

Final thoughts

Arc and lane converge; the trace holds under pressure, the seam lands clean.

Get all of our AI ideas, apps, and training

Every is the only subscription you need to stay at the edge of AI—trusted by 100,000 builders.

Expert led courses and camps

Four productivity apps

A Discord community learning together

Get your first 15 days free →

Vibe Check: Opus 4.6—The Best Coding Model We've Tested (With Some Maddening Habits)