Vibe Check: Fable 5 Is the Best Coding Model in the World

A warp drive for power users—but overpowered for everyone else.

June 8, 2026

As I walked to work this morning, I listened to a 2007 lecture by the philosopher Hubert Dreyfus, the author of the seminal text What Computers Can’t Do. I’ve listened to this lecture many times, but I always struggle to follow because the recording is grainy and muddy. The version I listened to today was brightened, leveled, and crystal clear, as if I were in the same room with Dreyfus. It was not on a finicky website, but on a custom web app on my phone that allowed me to see the whole lecture transcribed and each sentence light up as Dreyfus spoke, so I could easily follow along.

Later, on my laptop, I wandered through a strange video game: a pixel-perfect 3D rendering of Jorge Luis Borges’s Library of Babel, an infinite library composed of hexagonal rooms containing every piece of text ever written, including this article. I picked books off of its endless shelves and rode its spiral staircases.

Then, because I also have a job, I read a report that synthesized hundreds of detailed Every subscriber survey responses and our entire web analytics stack, and identified our biggest conversion issue. It proposed a clean, falsifiable experiment that no one else on the team had previously suggested.

All three of these are big projects that would normally take anywhere from hours to days or months. Instead, each one was made with a one-shot prompt to Fable 5, the new model out today from Anthropic. And it’s not just vibe-coded prototypes: Fable 5 scored a 91/100 on our Senior Engineer benchmark compared to Opus 4.8’s 63 and GPT-5.5’s 62. Anthropic describes it as the most capable model it has ever made available to the public, and we agree.

It is also the first of what the company is calling Mythos-class models. You’ve undoubtedly heard the word Mythos before—Anthropic declared Mythos too dangerous for public use in April and restricted access to a handful of partners. Since then, the company has given limited release to a chosen set of large companies for feedback. The model released today is a version with safeguards that prevent it from being used for anything related to cybersecurity or biology.

In our testing, we found that users who were highly adept with AI—at Level 7 or 8 on our AI adoption ladder—found it paradigm-shifting for their hardest tasks. Users who were lower down on the curve, however, struggled to find something to use it for.

Seven of us have been testing it for about a week at Every—putting it through its paces for coding, writing, business strategy, data analysis, and growth. This is our day-zero Vibe Check.

What Anthropic is saying

The company says Fable 5 is state of the art on nearly every benchmark it tested, including software engineering and knowledge work, and that its lead grows as assignments get longer and more complex.

The general-release model comes with unusual guardrails. Anthropic says Fable will block some requests involving cybersecurity and biology (bad news for Dan and his health-tracking habits); most of those queries will be routed to Claude Opus 4.8 instead. Anthropic is also releasing Mythos 5, the same base model without the cyber and biology safeguards, but access will remain limited. For most users, Fable is the version they will encounter.

Read the full report

1

The complete vibe check—every benchmark and our final verdict

2

A seat at Fable 5 Camp on June 12—watch the team push it live, paid members only

3

Every essay, course, and tool we ship—your full membership

Start free trial to unlock

Already a member? Log in.

Senior coder at high, architect at extra-high

Quiet bench, tighter seam, and a much longer runway between the obvious turn and the second pass. The shape holds when the brief is clear, and flattens when the brief is loose.

First station: Strong close on a familiar loop

Phase glass, ribbon ladder, static river around the handoff. Quiet markers hold their lane while the outer pass keeps folding back into the first frame of the task, with minor lift at the edges and a denser closure at the margin.

Velvet checkpoints, longer weather, and a measured hinge across the rollout path. The middle layer keeps echoing earlier notes without dropping the thread or flattening the edge cases through the first half of the bench.

00 → 00
00 vs. 00
00.0
00 & 00

Second station: Restraint as the tell

Lattice note, amber fork, and a small weather system over the review shelf. The structure reads calmer at first glance, then starts to narrow under pressure. Winter syntax, patient seams, and a quiet bend through the finish line.

Placeholder frame with the same footprint as the original scene image. Preview held at article scale.

The close end keeps its shape through the turn while the open end opens up the frame. The two halves trade tension across the pass.

Third station: Where the brief goes loose

Open prompt, soft constraints, and a wide field around the first decision. The model leans on its own scaffolding here, filling the gaps with plausible structure that holds for a paragraph and then drifts toward the nearest familiar shape.

Tighten the brief by one notch and the drift disappears. The same loop that wandered on the open pass snaps back to a clean line, which is the tell we kept circling all weekend: precision in, precision out.

00 → 00
00.0

Quiet markers, longer runway, and a measured hinge that only shows up when the task stops describing itself. The edge cases are where the release earns its score.

Best overall, with pesky tells

Tight lattice across the prose tracks, a clear line on the cover pass, and a steady seam through the middle of the brief. The graders kept finding the same handful of soft tells across the lower-effort runs.

The runner-up keeps the score honest, and the leaderboard reads narrower than the headline suggests. The shape of the gap matters more than the raw number, especially when the next dial sits one notch over.

Placeholder frame with the same footprint as the original draft comparison. Preview held at article scale.

Surface tells, soft repetitions, and a familiar pattern across the lower-effort passes. The structure holds at high while the medium loop keeps slipping into the same handful of habits across the sample set.

Push the effort dial and the tells thin out fast. The cover pass gets sharper, the seams tighten, and the same prompt that read mechanical on the quick run starts to carry a voice through the back half of the brief.

Placeholder frame with the same footprint as the original side-by-side. Preview held at article scale.

The gap between the top two reads as a matter of taste, not capability. Both clear the bar; one just leaves fewer fingerprints on the way through. That distinction is the whole story of the writing bench this round.

Fast and versatile, but cautious

Quiet hand on the daily loop, a measured pace through the back half of the brief, and a careful posture on the open-ended pass. The hand-off feels lighter than it reads on paper.

Velvet checkpoints across the open seam, a slow turn through the first frame, and a longer hinge on the agentic side of the bench. The model keeps deferring small decisions back to the operator across the whole run.

Placeholder frame with the same footprint as the original artifact. Preview held at article scale.

Give it a clear owner and a tight spec and the caution turns into an asset: fewer wrong turns, cleaner hand-backs, and a paper trail you can actually audit. The trade is speed for legibility, and on knowledge work that trade usually pays.

Left fully open, the same posture reads as hesitation. The model waits for a signal that never comes and the loop stalls one decision short of done. Structure is the unlock here, not horsepower.

The verdict

The shape of the weekend was the same across every chair: open the model, watch it think, hand it the work, and let it return a tighter draft than the last release would have managed on the same loop.

The pattern repeats across every chair: the model rewards a clear brief and punishes a loose one, and the distance between those two outcomes is wider than any single benchmark number lets on.

So the recommendation is less about whether to reach for it and more about when. Hand it the work that has edges. Keep the open-ended exploration somewhere faster and cheaper, and let this one close the loops that actually need closing.

That's the whole verdict in one line: a strong closer that wants a clear target. Give it one and it's the best we've tested. Leave the target vague and it hands the decision right back to you.

Fable 5, answered

Is Anthropic's Mythos Fable 5 good for coding?

Yes—it's the strongest coding model Every has tested. On Every's Senior Engineer benchmark, the hardest coding test we run, Fable 5 scored 91 out of 100, near the range of the human engineers who've taken it; Opus 4.8 scored 63 and GPT-5.5 scored 62. It's at its best owning a whole assignment end-to-end, where it can plan, use tools, and repair its own output over multi-hour runs.

What is Fable 5 best for?

Large, delegable jobs you can hand off and check later—building a feature or app from a single prompt, deep code review, and synthesizing large datasets. It rewards a sharp problem frame and a strong review process. Small asks barely reveal its advantage.

When should I use Fable 5 instead of a faster model?

Reach for Fable 5 when the task is big and you can run it asynchronously; keep a faster model nearby for quick edits, small bug fixes, and rapid back-and-forth. Fable is slow and token-hungry, especially at high effort. Paying that latency cost only makes sense when the assignment is large enough to justify it.

Is Fable 5 good for writing?

It's mixed. Fable's judgment and use of context are excellent, but its speed works against the rapid-iteration rhythm most writing depends on. Katie Parrott found it too slow for her drafting process. It's better suited to long, delegable knowledge work than to fast creative iteration.

How much does Fable 5 cost?

Fable 5 is generally available across Claude.ai, Claude Code, and the desktop app at $10 per million input tokens and $50 per million output tokens. That price is about two times the cost of Opus 4.8 and more than three times the cost of Sonnet 4.6. Combined with its slower runs, it can burn through tokens and usage limits quickly.

Is Fable 5 worth the cost and the slower speed?

It depends on the job. For big, high-stakes assignments that maxed out previous models, Fable 5's results can justify the wait and the spend. For everyday small tasks, it's overkill. Match it to whole-job delegation, not interactive chat.

How do I get the best results from Fable 5?

Treat it as an asynchronous agent, not a chat partner: Hand it a complete, well-framed assignment, let it run, and review the output carefully. Its ceiling is high enough that the main limit is the quality of your framing and review, so invest there and run several jobs in parallel rather than waiting on one.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn.

Katie Parrott is a staff writer at Every. You can read more of her work in her newsletter.

Go agent-native with Every

Every is a frontier AI lab for the future of work, trusted by 100,000 builders and operators.

Expert led courses and camps

Four productivity apps

A Discord community learning together

We use analytics and advertising tools by default. You can update this anytime.