.png&w=3840&q=75)
Vibe Check: Claude Sonnet 4 Now Has a 1-million Token Context Window
Fast, reliable long-context responses—for a price
Was this newsletter forwarded to you? Sign up to get it in your inbox.
Today, Anthropic is releasing a version of Claude Sonnet 4 that has a 1-million token context window. That’s approximately the entire extant set of Harry Potter books in each prompt.
We got early access last week, so you know we had to put it to the test. We did three main tests on Claude Sonnet 4:
- Long context text analysis: We hid two movie scenes in 1 million tokens of context, and asked Claude to find those scenes and do a detailed analysis of them in one shot.
- Long context code analysis: We loaded the entire codebase for Every’s content management system (plus some padding to get to 1 million tokens) and asked Claude to do four detailed code analysis tasks in one shot.
- AI Diplomacy: We played Claude Sonnet 4 in AI Diplomacy to see how it would perform at world domination.
For the text analysis tasks, we compared it against the 1-million token context models from Google—Gemini 2.5 Pro and Gemini 2.5 Flash. Claude Sonnet 4 performed well—it was generally faster and hallucinated less than Gemini models.
But its text analysis answers were less detailed, and its code analysis was less complete.
Here's your day zero hands-on vibe check.
Analyzing movie scenes in a million tokens of context
We buried two modern movie scenes deep inside 900,000 words of Sherlock Holmes novels. Scene one: Two cousins dealing with grief at JFK Airport (from Jesse Eisenberg’s "A Real Pain," 2024). Scene two: Tom Hanks taking all the caviar at a Manhattan party (from Nora Ephron’s "You've Got Mail," 1998).
We hid one at line 26,581 and the other at line 42,245. That's 43% and 68% through 900,000 words of model-distracting mystery. All three found both scenes, and correctly analyzed them. Here’s how they stacked up.
Speed
Claude Sonnet 4 blazed through the task, returning an answer in about half the time of Gemini Flash and Pro:
Claude Sonnet 4: 41.8 seconds ✅
Gemini 2.5 Flash: 69.2 seconds
Gemini 2.5 Pro: 78.0 seconds
Advantage: Claude.
Accuracy
All three models returned accurate analysis of the scene. However, both Gemini Flash and Pro sometimes incorrectly identified the title of A Real Pain as another movie. Sonnet 4 never hallucinated—it just declined to assign a title.
Advantage: Claude.
Detail
The Gemini models returned an incredibly detailed scene analysis. Here’s an excerpt from Flash’s analysis of the character dynamics in the scene (note this run hallucinated the title as Lady Bird):
Claude, on the other hand, returned much sparser details:
Claude’s overall response was consistently around 500 words—Flash and Pro delivered 3,372 and 1,591 words by contrast.
Advantage: Gemini.
The verdict: If you need speed and accuracy, Claude is the winner. If you want high-quality, detailed analysis, Gemini is a better bet.
Analyzing a million-token codebase
Become a paid subscriber to Every to unlock this piece and learn about:
- Claude Sonnet 4's speed advantage over Gemini
- How the models performed at analyzing a million-token codebase
- Claude's remarkable strength when playing our benchmark game, AI Diplomacy
The Only Subscription
You Need to
Stay at the
Edge of AI
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in.
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools













Comments
Don't have an account? Sign up!