.png&w=3840&q=75)
Why o3 Is the Best Model Yet for Real-world Learning
I asked two OpenAI LLMs to help me get fit—one stood out.
Was this newsletter forwarded to you? Sign up to get it in your inbox.
When OpenAI’s new reasoning model o3 came out, Every’s CEO Dan Shipper and OpenAI’s Sam Altman agreed that AI is changing the future of learning: If you aren’t using it to learn every day, they said, you’re “not going to make it.”
OK, I thought, I’ve got a challenge for o3: Make me physically stronger. Ten times stronger, in fact.
It’s been a life goal of mine to improve my chinups. I started 2024 unable to do even one, and months of working out alone got me nowhere. It wasn’t until I started working with a calisthenics trainer, Silvia, that I finally, after half a dozen focused sessions, got my first shaky repetition.
Now I want to do ten.
What better way to test AI’s capacity for teaching people in the real world than to ask it to help me achieve a goal I’ve never even come close to?
The more I thought about it, the more I liked this plan. I’d pit GPT-4o against o3 and see which model gave me a better chance of progressing from one to 10 unassisted chin-ups. I wanted to know which one would be a better teacher: 4o, the fast and reliable model I’ve been using as my daily driver, or o3, the more advanced reasoning model. Would either be up to the task? Would one emerge victorious? Let’s find out.
What I’m going to judge GPT-4o and o3 on
I would use OpenAI’s older standard model GPT-4o and o3 separately to generate a training plan. I created a set of rubrics against which to evaluate the models, based on what I think matters when you’re trying to learn something in the real world: quick feedback so you don’t make the same mistake over and over again, advice that’s tailored to your specific situation, incremental progress, and the motivation to keep going.
- Responsiveness: How quickly do I get feedback?
- Personalization: Is the advice tailored to me?
- Progress: Does it help me get closer to my goal?
- Motivation: How excited am I to keep showing up and putting in the work?
To judge the LLMs’ training plans, I also needed to define what “good” looks like. I trust my trainer, and she’s already delivered real results—so her guidance and the techniques she uses with me will serve as my baseline, the standard against which I’ll measure everything else.
Design as unique as your imagination
Every artist needs a good partner. Recraft can be yours. Browse thousands of styles of images, then blend them to create exactly what you want. Stay ahead of new image trends. Create a visual universe of your own, whether you’re getting together a new product image of just having fun.
Try it out today.
Become a paid subscriber to Every to unlock the rest of this piece and learn about:
- How a simple exercise plan can reveal models' strengths and weaknesses
- o3's ability to catch tiny details and understand them correctly
- The subtleties of o3's communication style that set it apart
Thanks to our sponsor: Recraft
Design as unique as your imagination
Every artist needs a good partner. Recraft can be yours. Browse thousands of styles of images, then blend them to create exactly what you want. Stay ahead of new image trends. Create a visual universe of your own, whether you’re getting together a new product image of just having fun.
Try it out today.
The Only Subscription
You Need to
Stay at the
Edge of AI
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in.
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools













Comments
Don't have an account? Sign up!