Every
Why o3 Is the Best Model Yet for Real-world Learning
ChatpGPT/Every illustration.

Why o3 Is the Best Model Yet for Real-world Learning

I asked two OpenAI LLMs to help me get fit—one stood out.

May 5, 2025Updated Jun 25, 2026

Comments

Was this newsletter forwarded to you? Sign up to get it in your inbox.


When OpenAI’s new reasoning model o3 came out, Every’s CEO Dan Shipper and OpenAI’s Sam Altman agreed that AI is changing the future of learning: If you aren’t using it to learn every day, they said, you’re “not going to make it.”

OK, I thought, I’ve got a challenge for o3: Make me physically stronger. Ten times stronger, in fact.

It’s been a life goal of mine to improve my chinups. I started 2024 unable to do even one, and months of working out alone got me nowhere. It wasn’t until I started working with a calisthenics trainer, Silvia, that I finally, after half a dozen focused sessions, got my first shaky repetition. 

Now I want to do ten.

What better way to test AI’s capacity for teaching people in the real world than to ask it to help me achieve a goal I’ve never even come close to? 

The more I thought about it, the more I liked this plan. I’d pit GPT-4o against o3 and see which model gave me a better chance of progressing from one to 10 unassisted chin-ups. I wanted to know which one would be a better teacher: 4o, the fast and reliable model I’ve been using as my daily driver, or o3, the more advanced reasoning model. Would either be up to the task? Would one emerge victorious? Let’s find out. 

What I’m going to judge GPT-4o and o3 on

I would use OpenAI’s older standard model GPT-4o and o3 separately to generate a training plan. I created a set of rubrics against which to evaluate the models, based on what I think matters when you’re trying to learn something in the real world: quick feedback so you don’t make the same mistake over and over again, advice that’s tailored to your specific situation, incremental progress, and the motivation to keep going. 

  • Responsiveness: How quickly do I get feedback?
  • Personalization: Is the advice tailored to me?
  • Progress: Does it help me get closer to my goal? 
  • Motivation: How excited am I to keep showing up and putting in the work?

To judge the LLMs’ training plans, I also needed to define what “good” looks like. I trust my trainer, and she’s already delivered real results—so her guidance and the techniques she uses with me will serve as my baseline, the standard against which I’ll measure everything else. 

Design as unique as your imagination

Every artist needs a good partner. Recraft can be yours. Browse thousands of styles of images, then blend them to create exactly what you want. Stay ahead of new image trends. Create a visual universe of your own, whether you’re getting together a new product image of just having fun.

Try it out today.


Become a paid subscriber to Every to unlock the rest of this piece and learn about:

  • How a simple exercise plan can reveal models' strengths and weaknesses
  • o3's ability to catch tiny details and understand them correctly
  • The subtleties of o3's communication style that set it apart

Thanks to our sponsor: Recraft

Design as unique as your imagination

Every artist needs a good partner. Recraft can be yours. Browse thousands of styles of images, then blend them to create exactly what you want. Stay ahead of new image trends. Create a visual universe of your own, whether you’re getting together a new product image of just having fun.

Try it out today.

Create a free account to continue reading

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Every ContentEvery Content
AI&I PodcastAI&I Podcast
MonologueMonologue
CoraCora
SparkleSparkle
SpiralSpiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in.

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

PencilFront-row access to the future of AI
CheckIn-depth reviews of new models on release day
CheckPlaybooks and guides for putting AI to work
CheckPrompts and use cases for builders

Related Essays

Comments

You need to login before you can comment.
Don't have an account? Sign up!

We use analytics and advertising tools by default. You can update this anytime.