Vibe Check: GPT-5.5 Has It All

Frontier models usually come with tradeoffs. You get more depth, but less speed. More agency, but less control. Better code, but worse prose. The surprising thing about GPT-5.5, the new OpenAI model out today, is how few of those tradeoffs it asks you to make.

It’s much faster than Opus 4.7, easier to collaborate with, better at writing than any OpenAI model we’ve used since GPT-4.5 and GPT-4o, and the strongest model we’ve tested on our new Senior Engineer Benchmark, which measures how well models can rewrite a slop-coded codebase the way a senior engineer would.

On that benchmark, GPT-5.5 with extra high reasoning reached 62.5 on its best run, while Opus 4.7 at a similar reasoning level landed in the low 30s. For reference, human senior engineers score in the high 80s and low 90s. GPT-5.5 performed best, however, when it executed a plan written by Opus 4.7—curious.

For a long time, OpenAI looked like it was trying to be everywhere at once: Sora for video, Atlas for browsing, consumer ChatGPT features, creative media tools, and whatever else might turn AI into the next mass-market platform. Meanwhile, Anthropic doubled down on work, and Claude became the default for coding agents, long-running engineering tasks, and professional workflows.

GPT-5.5 gives OpenAI something it badly needed: a fast, capable workhorse model for the professional tasks where most AI use happens.

GPT-5.5 is OpenAI’s clearest bid to reclaim the code-and-work narrative. It does not win everything. Opus 4.7 seems to write better plans and have a superior eye for design and product details. But GPT-5.5 is faster, steadier, and easier to trust for everyday professional work.

Read with ChatGPT

Read with Claude

Thanks to our Sponsor: Hapax

AI with real agency

The worst part about AI today is that it’s passive—you need to prompt it to get what you want, slowing down your team and keeping its abilities limited to those who know how to use it best. Hapax fixes that. Hapax observes how your organization works, figures out what to automate on its own, and deploys custom AI workers to help each employee be more effective and efficient. It does it all without being prompted or set up. And it’s trustworthy: their customers include banks managing up to $90 billion in assets. Try it today with HAPAXDEMO to get 15 credits.

Get started

Want to sponsor Every? Click here.

What OpenAI told us

OpenAI is pitching GPT-5.5 as a higher-capability model for complex work, especially tasks where stronger reasoning, higher reliability, and fewer retries yield a finished result faster and cheaper.

1 million-token context window

The context window remains 1 million tokens, with supported tools and rate limits similar to GPT-5.4.

Prompt caching

GPT-5.5 supports extended prompt caching for reusing long context across requests, but not in-memory caching for faster same-session reuse.

Medium reasoning by default

GPT-5.5 defaults to medium reasoning effort, unlike GPT-5.4, where the default was none.

No API availability at launch

GPT-5.5 launches in ChatGPT and Codex first, with the API coming later while OpenAI finishes additional safety and security validation.

More expensive than GPT-5.4

API pricing is set at $5 per 1 million input tokens and $30 per 1 million output tokens for GPT-5.5, with GPT-5.5 Pro at $30 and $180. OpenAI’s argument is that for harder tasks, better reasoning and fewer retries can lower the cost per completed task even when the per-token price is higher.

Pricing comparison

GPT-5.5: $5/1M input tokens, $30/1M output tokens

GPT-5.5 Pro: $30/1M input tokens, $180/1M output tokens

GPT-5.4: $2.50/1M input tokens, $15/1M output tokens

Opus 4.7: $5/1M input tokens, $25/1M output tokens

Why GPT-5.5 feels different

GPT-5.5 is built on a new pre-train—the broad, expensive training run that teaches the base model its underlying patterns before instruction tuning, tool use, and reasoning scaffolds are added in post-training. Post-training can make a model more obedient, safer, or more agentic. A new pre-train can change the model’s center of gravity.

OpenAI had already made a strong case that it was competitive again with GPT-5.4, which used the same pre-train as earlier GPT-5.x models. Releasing a new pre-train now suggests it wants to keep pressure on Anthropic—betting that the next answer to Claude starts with a different base model underneath, not just better scaffolding around the same one.

The most obvious change is speed. GPT-5.5 is much faster than Opus 4.7 in head-to-head tests, and conveys a low-friction competence. It is easier to iterate with, keep in the loop, and trust with everyday professional work. It also spends more time on planning and reviewing, asks more questions, and checks its work before moving on, especially at extra high reasoning.

GPT-5.5 is good at turning messy inputs into orderly, usable outputs: dashboards, curricula, run-of-show documents, consulting prose, and transcript-grounded writing. But the new pre-train does not solve everything. It can still be bland, struggle with Ruby, and trail Opus 4.7 on PowerPoint presentations, spatial composition, and ambitious prototypes.

The Reach Test

Dan ShipperThe multi-threaded CEO

“GPT-5.5 is my new daily driver. It’s what I reach for first on every coding task from vibe coding to serious engineering. And it’s my main model for most other agentic knowledge-work tasks from spreadsheets to research. It’s also the model I use by default in my OpenClaw setup.”

Kieran KlaassenFather of compound engineering

“GPT-5.5 feels very capable, and you can see it thinking harder. The planning and review cycles are longer, and on the best tasks it feels similar to Opus 4.7, which I had called the best model so far. But I’m mixed on it for product work. It can build deep functionality, but the design doesn’t always come together. The details are often good; the whole can feel random. It’s strong in a way I respect, but not yet in a way that consistently inspires me. To be a daily driver, I need a model that’s very good in all things, not just one or a few. It needs to be better at starting from scratch and filling in the blanks while still following instructions closely.”

Mike TaylorPowerPoint engineer

“GPT-5.5 is the model I’d use when I need to get the job done without babysitting it. It’s less flashy than Opus, but it’s more natural, more accessible, and more client-ready. For dashboards, curricula, run-of-shows, and normal consulting docs, I trust it more. Opus still has more edge, and for high-stakes tasks I’m personally invested in that’s exactly what I want—especially for PowerPoint, sharp copy, or impressing a client. I’ll stick with Opus as my daily driver, but turn to GPT-5.5 when I need work I can use without thinking.”

Katie ParrottAI-pilled writer by day, vibe coder by night

“I haven’t touched ChatGPT for writing in almost a year, but that changes now: I’m switching my writing workflow over to GPT-5.5 and adapting my writing plugin for Codex. This model gives me more confidence in the structure of a piece than Opus 4.7 does: the idea progression is cleaner, and the draft feels easier to revise. It still has some AI smell in the over-smoothed transitions and over-used constructions, and Opus can be better at punchy framing. But GPT-5.5 has the mix of speed and sensitivity to feedback that I need for writing every day.”

Naveen NaiduCodex power user

“The thing that changed for me with GPT-5.5 is how many different kinds of work I started trusting Codex with. I used it across my own native iOS and Mac to-do app, Monologue backend work, MCP, the auth website, iOS and Mac client work, support drafts, and production debugging. One day it was building a native Swift app in one giant thread; another day it was implementing OAuth-only MCP across backend, frontend, and API surfaces under a deadline; another day it was reading Intercom history and drafting replies that sounded like me. Older Codex models already felt great for real engineering. Now I’m using it as my default model for almost everything.”

Legend:

Paradigm shift

Psyched about this release

It’s okay, but I wouldn’t use it every day

Trash release

Subscribers only

Only available for paid subscribers

Get full access to the verdicts, benchmarks, and model comparisons.

Subscribe to unlock →

Coding: Better at sustained engineering

Rewrites vibe coded codebases (almost) like a senior engineer

LFG: Reliable at building, but wobbly on design

Writing: Smooth prose with stronger bones

Its drafts are easier to revise

Knowledge work: The dependable operator

It makes better dashboards

The verdict

Reach for GPT-5.5 if…

Reach for Opus 4.7 if…

Katie Parrott is a staff writer at Every. You can read more of her work in her newsletter.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Discover Every’s upcoming workshops and camps, and access recordings from past events.

For sponsorship opportunities, reach out to [email protected].

Get all of our AI ideas, apps, and training

Every is the only subscription you need to stay at the edge of AI, trusted by 100,000 builders.

Expert led courses and camps

Four productivity apps

A Discord community learning together

Get your first 15 days free →

Vibe Check:GPT-5.5 Has It All