Vibe Check: GPT-5.2 Is an Incremental Upgrade

Was this newsletter forwarded to you? Sign up to get it in your inbox.

OpenAI just announced its newest model release: GPT-5.2—a 0.1 bump over GPT-5.1. After a few days of internal testing, that incremental 0.1 feels about right.

Dan Shipper calls it a solid quality-of-life improvement for ChatGPT, but not enough to make a huge difference. It’s getting harder to notice improvements in pure chat—current foundation models already meet most of our needs there.

GPT-5.2 does shine on extended, complex knowledge work-type tasks. For example, Dan asked it to review Every’s profit and loss statement to find and analyze where we spent money in November. It worked for two hours straight—including checking through every formula and individual expense—and delivered an accurate and well-structured summary.

GPT-5.2 reports back after completing a two-hour analysis assignment. (Screenshot courtesy of Dan Shipper.)

This improved performance on long-running knowledge work tasks is reflected in the benchmarks: OpenAI says GPT-5.2 scores 70.9 percent on GDPVal—meaning it outperforms industry experts on real-world knowledge work tasks 70.9 percent of the time. GPT 5.1 scores only 38.8 percent. (If you’re suddenly concerned, read more on why GDPVal isn’t a good reflection of whether a model can do an entire job.)

As for writing chops, Spiral general manager Danny Aziz ran GPT-5.2 through 50 user writing requests scored on criteria like reader engagement and AI-ism avoidance. It landed at 74 percent, trailing Opus 4.5‘s 80 percent but matching Sonnet 4.5. One bright spot: It’s less prone to tired AI constructions like “It’s not X, it’s Y.”

We also tested its compatibility with Cora, Every’s AI email assistant. General manager Kieran Klaassen hooked it up to Cora and found GPT-5.2 excels at following instructions—tell it to be sarcastic, and it delivers more bite than Claude Haiku. But it’s also less resourceful than other models: When asked to figure out Kieran’s current location, Opus 4.5 thought to search his email and nailed it. GPT-5.2 didn’t think to try.

As of publish time, there wasn’t a final Codex version of the model for us to test for coding.

Bottom line: If you’re a ChatGPT Pro subscriber, GPT-5.2 is worth exploring for longer-running analytical tasks. For a leap in everyday chat, temper expectations—the real gains will likely come when this model powers agentic tools like Codex.

Overall, we’ll happily use this model for day-to-day ChatGPT use, but Opus 4.5 is still our workhorse for tasks that require the most creativity, intelligence, and autonomy.

Katie Parrott is a staff writer and AI editorial lead at Every. You can read more of her work in her newsletter.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

We build AI tools for readers like you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.

For sponsorship opportunities, reach out to [email protected].