Every illustration/OpenAI.

GPT-4.5 Won’t Blow Your Mind. It Might Befriend It Instead.

We’ve been testing the latest model for a few days. Here’s what we found.

21 2

Sponsored by: Every

Every is hiring!

If you're interested in any of these positions, email Brandon Gell at [email protected] with a link to your LinkedIn and/or X profile and a paragraph about why you're the right fit.

  • A full-stack growth marketing lead to help grow Every and all of our products. If you live to drive top of funnel, this is a dream job.
  • full-stack AI engineer for Cora. We're building a calm inbox and need an engineer to help us. Launched less than a month ago, Cora has over 1,000 daily active users and 10,000 on the waitlist, and product leaders like Andrew Wilkinson and Mike Krieger love it. (You must have experience with Rails for this role.)
  • full-stack designer who can work across our website, products, courses, and other initiatives. You should move fast, be scrappy, and be fluent with AI tools.

💡Want to learn more about us? Check out this piece by Spaces about our team and approach.

Was this newsletter forwarded to you? Sign up to get it in your inbox.


We’ve had access to GPT-4.5 for the past few days, and it is our unfortunate duty to report that it is not totally mind-blowing. 

OpenAI just released it as a research preview for ChatGPT Pro users, and to be sure, it benchmarks really well: OpenAI told us it got 64 percent on the Simple QA benchmark—almost double GPT–4’s score. SimpleQA tests for world knowledge in tricky areas without lots of data, so it should hallucinate less. But that’s not the big headline.

OpenAI billed it as compassionate, emotionally intelligent, and a good creative writer—sort of like Claude 3.5 but better—and it lives up to that. 

So, in sum: It might not totally blow your mind, but it might befriend it instead. 

From bland assistant to bestie—once you’re used to it

4.5 is not a major step up from 4o, but it is a step in a new direction—one with fewer refusals, more human answers, better formatted responses, and less rigidity. 

Before the blood rushes from your face, you start to feel lightheaded, and your thumbs twitch restlessly as they begin to compose an all caps tweet about THE END OF SCALING LAWS, stop and take a deep breath. Let us apply a cold compress and a soothing aphorism:

Don’t panic.

Here’s why:

1. This shift is surprisingly timely. This week Anthropic launched Claude 3.7 Sonnet with significantly improved coding ability but far less warmth and emotional intelligence than 3.5. So there's a lot of room open for another model that can be a creative, empathetic, and supportive friend and coach in your day-to-day life. And that seems to be OpenAI’s goal with 4.5. 

Make your customers feel important

In the long run, what your customers will remember is how you make them feel. Jotform AI Agents make sure they’re never kept waiting, with quick responses in 19 languages across any platform, including text, WhatsApp, and Messenger. They’ll respond quickly, politely, and accurately, whether your customer has sent you a complaint email or they’re writing in to say something nice about your product.

2. It’s important to acknowledge the reality. This is disappointing. Earlier this year The Information and other outlets reported on rumblings about OpenAI’s “Project Orion” not performing as expected in terms of intelligence, and those seem to be largely correct. Presumably, OpenAI has expended a lot of resources on GPT 4.5, and in a world where things were going to plan it would knock our socks off—especially because Claude 3.7 Sonnet is being extolled for its coding ability.

3. Let’s get some perspective. Sometimes it takes a while to get to know a model. We’re entering into territory where interacting with models via chat isn’t enough to understand their capabilities. We need to use them within other apps—like Cursor, Cora, or Sparkle—and create new benchmarks (more on this soon). It’s sort of like testing a new graphics card by doing your email—it would be hard to notice the differences.

To be completely honest, we didn’t love 4.5 on the first day we used it: It felt slow, we encountered hallucinations, and it was harder to steer. But in subsequent testing, it grew on us. OpenAI said it’s more opinionated and less sycophantic than other models, and that tracks: You might hate opinionated people at first but grow to appreciate them over time.

So we’ll reserve final judgment on 4.5 until we’ve spent at least a few weeks with it, and it’s started to filter into other tools that can use it in more powerful ways, such as writing tools like Lex

We have a hunch it will be particularly effective within Advanced Voice Mode. In fact, OpenAI told us on a call that they think of 4.5’s output as being meant for the ear rather than read on a page. Its responses have pauses and line breaks that make them feel more emotive and conversational. This quality, which they call "Orion prose," makes the model's outputs easier to read aloud and more engaging, almost as if they were designed for oral delivery. 

Source: ChatGPT-4.5. Courtesy of Alex Duffy.

4. Consider how this fits into OpenAI’s broader strategy before you start screaming about how the company is cooked. It’s shipping a lot, and very quickly. It’s willing to release products with rough edges, which seems to apply to this model too. But it’s also iterating quickly—those rough edges inevitably get smoothed out. 4.5 seems significantly faster in our testing today than it did yesterday, for example.

Also, 4.5 is not OpenAI’s only frontier model. 4.5 is a base model—it gives you its first response—not a reasoning one (which shows you how it thinks) like o1 and o3. OpenAI’s reasoning-focused models operate on a totally different set of scaling laws.

Let’s get into our testing.

4.5 is more extroverted and less neurotic

We wanted to get a good idea of how large, complex models like GPT change over time, so we decided to give it personality tests. We tested both GPT-4o and GPT-4.5 (along with other models) on the Dark Triad and OCEAN tests meant to reveal your key personality traits. 

On the Dark Triad test, GPT-4.5 scored slightly lower on narcissism and Machiavellianism but was marginally more psychopathic. 

On the OCEAN assessment, GPT-4.5 rated as more extroverted, open, agreeable, and conscientious, and less neurotic than GPT-4o, which tracked with our experience when asking it questions like, “Where would you move to in New York City?” or, “Tell me an actually funny joke about whatever you find most interesting.” GPT-4.5 was happy to share its opinion instead of responding with a canned “I am only a language model” line as it has in the past. 

Then we decided to have some fun and asked them to fill out a couple of BuzzFeed quizzes. First was an aesthetic-focused quiz where both models ended up with the same result: Dark Academia. They also took a 40-question personality test, which declared them both “true entertainers who like to be the center of attention, have big hearts, and are overthinkers who don’t like to sit still.”

Both models tend to agree on most things. They prefer facts to feelings, don't believe in love at first sight, never half-ass anything, and have very clear goals in life. 

There were some notable differences, though. GPT-4.5 was generally less anti-social, favoring dinner and a movie with a friend over countryside hikes alone, and was more open to the possibility of supernatural beliefs (not ruling out ghosts), whereas GPT‑4.0 was slightly more skeptical. 

Questions and answers for the BuzzFeed quizzes, and the final assessments. Source: ChatGPT-4.5 for the graph, BuzzFeed for the images.

Writing and instruction following

One of the first tasks we gave GPT-4.5 was to clean up and summarize a transcription. I (Dan) freely-associated an idea about why it’s hard to find problems to solve as a founder, and asked both 4o and 4.5 to clean it up. 

GPT-4o followed the instructions directly and just cleaned up that test, producing a fairly good response. 

Source: ChatGPT-4o. Courtesy of Dan Shipper.

Here’s 4.5:

Source: ChatGPT-4.5. Courtesy of Dan Shipper.

Both models tried to express and summarize the same concepts, but 4.5’s response is better: Its initial sentence is a clean and crisp summary of the main point I'm trying to make, and from there it gradually deepens and expands on it.

4o also responded with a good first sentence, but it incorporated more complicated ideas—problems as “the atomic unit of startups”—making it harder to understand. It’s a lot for your brain to process.

So we can conclude that 4.5 is a better writer. It presents ideas in a more human, easily understood, less bland way. 

But it has trouble following instructions. The first time I asked 4.5 to summarize my thoughts, instead of producing a summary, it wrote an essay. That’s because earlier in the chat, I had asked it to write in that format, so it followed those earlier instructions rather than the latest ones. That’s not great, and it’s worth thinking about why. OpenAI told us that 4.5 has a lot more opinions than 4o. This makes it more creative and a better writer, but it’s a trade-off: It’s going to give you what it thinks the best response would be rather than exactly what you asked for. Working with a model like this can be more frustrating and trickier, because it’s not a people pleaser like previous models. As I use it, I’m being much more specific in my prompting, reiterating what I want and when I don’t want it to go off the rails. 

When I refreshed 4.5 and asked it the same prompt, it performed the cleanup well—and what it wrote was a lot better than 4o.

Empathy

GPT-4.5 is more emotionally intelligent than previous OpenAI models, which may be its strongest distinguishing feature. I’m currently living in Panama, and it coached me through the bruised ego I got when I fell off my scooter:

Source: ChatGPT-4.5. Courtesy of Dan Shipper.

Take the initial line: “That’s a tough one, but here’s exactly what to take away from that spill.” It’s empathetic and direct. 

GPT-4o would either be fake-empathetic, responding with something like, “I’m so sorry to hear that, that is truly terrible,” or it would respond like a customer service representative at an insurance call center, with, “I’m not a medical professional, so I can’t really help you with this, but here are some things to think about.” 

By contrast, 4.5 feels like your friend. It’s still giving you what you need, but it’s more real. That tone is fairly consistent, which makes the experience of using it more appealing and fluid, responding appropriately to changing demands in context.

Hallucinations

OpenAI emphasized that GPT-4.5 should hallucinate far less than GPT-4o. Unfortunately, that wasn’t our experience. I (Dan) asked GPT-4.5 to imagine Socrates as a character in a Chekhov story, and to pick which Chekhov character Socrates would be and why:

Source: ChatGPT-4.5. Courtesy of Dan Shipper.

This is a task that GPT-4o, among other models, gets right consistently; unfortunately, GPT-4.5 doesn’t. Here it suggests a character named Nikolai Ivanovich but confuses him with his brother in the story. 4.5 outperforms other GPT models on Simple QA, which implies that it’s less likely to hallucinate, but this is my experience with it.

GPT-4.5 consistently responds to this particular question with a hallucination. The question is, why? I honestly can't say yet. A charitable interpretation is that it’s more creative and willful, so it gets confused between trying to give me what it thinks I want and what it thinks would be the best answer for me. A less charitable answer is that this model isn't as good at questions like this. Only time will tell.

4.5’s future

ChatGPT-4.5 isn’t the mind-blowing step up that we thought it might be, but it is an interesting new direction that takes some getting used to. That seems to be a newer pattern with later models, especially those that are a little bit less sycophantic. They require some prodding to learn how to interact with them in a way that feels good, so at first blush, you might not like it. 

But the true test is in what you end up using over the long term, and we’ll keep using this and update you as we learn more. We want to hear your thoughts, too—let us know what you think of 4.5 in the comments.


Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

Alex Duffy is the consulting lead and a staff writer at Every, where he writes about empowering people with AI tools and technology in Context Window. You can follow him on X at @theheroshep and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex. Deliver yourself from email with Cora.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.

Find Out What
Comes Next in Tech.

Start your free trial.

New ideas to help you build the future—in your inbox, every day. Trusted by over 75,000 readers.

Subscribe

Already have an account? Sign in

What's included?

  • Unlimited access to our daily essays by Dan Shipper, Evan Armstrong, and a roster of the best tech writers on the internet
  • Full access to an archive of hundreds of in-depth articles
  • Unlimited software access to Spiral, Sparkle, and Lex

  • Priority access and subscriber-only discounts to courses, events, and more
  • Ad-free experience
  • Access to our Discord community

Thanks to our sponsor: Jotform

Make your customers feel important

In the long run, what your customers will remember is how you make them feel. Jotform AI Agents make sure they’re never kept waiting, with quick responses in 19 languages across any platform, including text, WhatsApp, and Messenger. They’ll respond quickly, politely, and accurately, whether your customer has sent you a complaint email or they’re writing in to say something nice about your product.

Comments

You need to login before you can comment.
Don't have an account? Sign up!
Paul Carney about 23 hours ago

Thanks for the insights, Dan. I like that the engine will understand a little more, but I shy away of saying it is "empathetic" because only humans can do that. No matter how well these are programmed, they do not have human experiences and really felt how something impacts us physically and emotionally. If we start showing it empathy back, we lose the ability to take advantage of many of its capabilities, like drilling it over and over if it fails to follow instructions - something we should not do with humans.

Thanks for getting this out there for us to learn!

Bailey.Wier about 9 hours ago

Appreciated this post, thank you, Dan. Am curious:

What WAS
What WAS your first “ Writing and instruction” prompt about founders? :)

Every

What Comes Next in Tech

Subscribe to get new ideas about the future of business, technology, and the self—every day