OpenAI’s o1 Model, Explained

Was this newsletter forwarded to you? Sign up to get it in your inbox.

OpenAI launched a new model, o1 (previously code-named Strawberry), yesterday. It’s significantly better at reasoning tasks, scoring in the 89th percentile in competitive programming, and exceeding Ph.D.-level smarts on physics, biology, and chemistry questions.It’s been taught to use chain of thought reasoning to answer each question it’s given rather than just blurting out a response.

Chain of thought, of course, has been around for a long time. It’s the practice of asking a language model to solve problems by thinking out loud. You’re probably better at doing long division if you write out the steps one by one than you are at doing it in your head. Language models are the same way: Chain of thought creates a tunnel of reason that keeps the AI on track.

Chain of thought used to be just a prompting technique that would improve outputs in the original GPT models.

o1 is different because it’s been trained via reinforcement learning to always use chain of thought in its responses without any extra prompting required. Now, when you ask ChatGPT with o1 enabled a question, up pops an expandable thinking indicator that lets you see its thought process:

It also gets the classic strawberry problem correct. Hooray! I’ve been playing around with o1 a lot for the last day and will have much more to say over the next few weeks, but I wanted to give you a quick reaction today.

Become a paid subscriber to Every to unlock the rest of this piece and read about: