Every
Vibe Check: Claude Sonnet 4 Now Has a 1-million Token Context Window
Midjourney/Every illustration.

Vibe Check: Claude Sonnet 4 Now Has a 1-million Token Context Window

Fast, reliable long-context responses—for a price

Aug 12, 2025Updated Jun 25, 2026

Comments

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Today, Anthropic is releasing a version of Claude Sonnet 4 that has a 1-million token context window. That’s approximately the entire extant set of Harry Potter books in each prompt.

We got early access last week, so you know we had to put it to the test. We did three main tests on Claude Sonnet 4:

  1. Long context text analysis: We hid two movie scenes in 1 million tokens of context, and asked Claude to find those scenes and do a detailed analysis of them in one shot.
  2. Long context code analysis: We loaded the entire codebase for Every’s content management system (plus some padding to get to 1 million tokens) and asked Claude to do four detailed code analysis tasks in one shot.
  3. AI Diplomacy: We played Claude Sonnet 4 in AI Diplomacy to see how it would perform at world domination.

For the text analysis tasks, we compared it against the 1-million token context models from Google—Gemini 2.5 Pro and Gemini 2.5 Flash. Claude Sonnet 4 performed well—it was generally faster and hallucinated less than Gemini models.

But its text analysis answers were less detailed, and its code analysis was less complete.

Here's your day zero hands-on vibe check.

Analyzing movie scenes in a million tokens of context

We buried two modern movie scenes deep inside 900,000 words of Sherlock Holmes novels. Scene one: Two cousins dealing with grief at JFK Airport (from Jesse Eisenberg’s "A Real Pain," 2024). Scene two: Tom Hanks taking all the caviar at a Manhattan party (from Nora Ephron’s "You've Got Mail," 1998).

We hid one at line 26,581 and the other at line 42,245. That's 43% and 68% through 900,000 words of model-distracting mystery. All three found both scenes, and correctly analyzed them. Here’s how they stacked up.

Speed

Claude Sonnet 4 blazed through the task, returning an answer in about half the time of Gemini Flash and Pro:

Claude Sonnet 4: 41.8 seconds ✅

Gemini 2.5 Flash: 69.2 seconds

Gemini 2.5 Pro: 78.0 seconds

Advantage: Claude.

Accuracy

All three models returned accurate analysis of the scene. However, both Gemini Flash and Pro sometimes incorrectly identified the title of A Real Pain as another movie. Sonnet 4 never hallucinated—it just declined to assign a title.

Advantage: Claude.

Detail

The Gemini models returned an incredibly detailed scene analysis. Here’s an excerpt from Flash’s analysis of the character dynamics in the scene (note this run hallucinated the title as Lady Bird):

Uploaded image

Claude, on the other hand, returned much sparser details:

Uploaded image

Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.

Advantage: Gemini.

The verdict: If you need speed and accuracy, Claude is the winner. If you want high-quality, detailed analysis, Gemini is a better bet.

Analyzing a million-token codebase


Become a paid subscriber to Every to unlock this piece and learn about:

  1. Claude Sonnet 4's speed advantage over Gemini
  2. How the models performed at analyzing a million-token codebase
  3. Claude's remarkable strength when playing our benchmark game, AI Diplomacy


Create a free account to continue reading

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Every ContentEvery Content
AI&I PodcastAI&I Podcast
MonologueMonologue
CoraCora
SparkleSparkle
SpiralSpiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in.

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

PencilFront-row access to the future of AI
CheckIn-depth reviews of new models on release day
CheckPlaybooks and guides for putting AI to work
CheckPrompts and use cases for builders

Related Essays

Comments

You need to login before you can comment.
Don't have an account? Sign up!

We use analytics and advertising tools by default. You can update this anytime.