GPT-4o and OpenAI’s Race to Win Consumers

Was this newsletter forwarded to you? Sign up to get it in your inbox.

When OpenAI makes an announcement, it often knocks our socks off. But 12 hours after the ChatGPT maker’s latest release, our socks, oddly, remain very much attached to our feet. So, by historical standards, yesterday’s release was relatively modest compared to, say, last year’s GPT-4 launch.

Still, some of the new features are sneakily exciting. More importantly, they tell us a lot about OpenAI’s product strategy and roadmap: knock the socks off the average consumer, for free.

Yesterday, the company announced three new things:

GPT-4o, a large language model that can process video, audio, and text all at once
A desktop app (coming first to Macs—sorry, Microsoft)
API access to GPT-4o that is two times faster, 50 percent cheaper, and has five times higher rate limits than its previous version

I (Dan) am going to explain what it all means from a strategic, technical, and consumer perspective. Then Evan will wrap it up with what it means for the enterprise.

OpenAI’s consumer product strategy

In one sentence: OpenAI just released a model that is better, faster, and free.

GPT-4o is not a huge leap forward in terms of intelligence—it seems to be slightly above the level of GPT-4 Turbo for capabilities like knowledge, reasoning, and comprehension—but it’s the most user-friendly model the company has ever released, and it’s available to everyone at no charge.

OpenAI is already the dominant destination chatbot, and these features are aimed at maintaining its position. It’s running the classic Silicon Valley playbook: (1) Release great technology, (2) raise lots of money to make that technology as cheap as possible, and (3) get as much distribution as you can, as quickly as you can. In the long run, OpenAI wants to make sure that ChatGPT continues to be synonymous with this generation of AI products, while it keeps pushing the limits of what the technology can do.

With this strategy, OpenAI doesn’t necessarily need to push the technical frontier with every release. GPT-4 is already intelligent enough for most everyday consumer use cases—but it used to cost $20 a month. For most AI users, who were previously using the free ChatGPT 3.5, the move to GPT-4o will be a big upgrade. ChatGPT currently has more than 100 million monthly active users. If we assume only 5 percent of those are paid, more than 95 million people just upgraded to the best model they’ve ever used.

It’s a leap in intelligence comparable to the one that most AI enthusiasts experienced a year ago with GPT-4. That’s going to make a world of difference.

Why GPT-4o is technically interesting

Bundling video, audio, and text into one model “omnimodel”—or what is usually called a “multi-modal” model—is surprisingly powerful. To understand why, let’s first talk about how something like voice instructions used to work.

Until now, when you spoke to ChatGPT, it would record you and transcribe the audio with a speech-to-text model called Whisper. Then, it would send the transcription to GPT-4, get an answer, and read the answer out loud with a text-to-speech model. That’s at least four steps! So it was slow. But it was also not very smart.

For example, if you wanted to interrupt the model while it was speaking, or ask it what song was playing in the background, it wouldn’t work. Why? The speech-to-text model, Whisper, is separate from GPT-4, and it’s not intelligent enough to know what’s being asked of it. In the interruption case, it would first have to translate your voice to text, which takes a long time, so interruptions would be haphazard. In the song identification case, it might try to transcribe the lyrics but would probably do a poor job, leaving GPT-4 without much information to go on to identify the song.

Now, with GPT-4o, voice is processed natively without having to first be converted to text. It’s a one-step process. So now it can handle interruptions naturally, process songs, or do a variety of other things. If you’d like, it can talk faster or even sing to you.

There are plenty of other subtle advantages in other modalities. For example, with previous models, it’s been impossible to consistently generate the same visual character (like an image of a character with a specific look) across different AI images. It’s also been difficult to get AI to output images with legible text; for example, if you asked it to generate a building with a sign over it that says “Cafe,” it might write “Caffee.” Now, because image generation and intelligence are in the same model, both of those problems are solved.

Video also adds a new layer of interactivity. GPT-4o can talk to you about what it’s seeing through your phone’s camera. When your parent needs help getting their printer to work, GPT-4o can do it for them. Or, if you want a tour guide to explain the sights to you in a new city, GPT-4o can see what you’re seeing and tell you about it. The opportunities to explore are massive.

Multi-modal—or omnimodel—capabilities like this aren’t exactly new. For example, Google’s Gemini processes video and text in the same model. And OpenAI’s previous models have had image capabilities. But this new release adds more modalities, improving the outputs. And, crucially, it’s extremely fast.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Still, some of the new features are sneakily exciting. More importantly, they tell us a lot about OpenAI’s product strategy and roadmap: knock the socks off the average consumer, for free.

Yesterday, the company announced three new things:

GPT-4o, a large language model that can process video, audio, and text all at once
A desktop app (coming first to Macs—sorry, Microsoft)
API access to GPT-4o that is two times faster, 50 percent cheaper, and has five times higher rate limits than its previous version

I (Dan) am going to explain what it all means from a strategic, technical, and consumer perspective. Then Evan will wrap it up with what it means for the enterprise.

OpenAI’s consumer product strategy

In one sentence: OpenAI just released a model that is better, faster, and free.

It’s a leap in intelligence comparable to the one that most AI enthusiasts experienced a year ago with GPT-4. That’s going to make a world of difference.

Why GPT-4o is technically interesting

The importance of the OpenAI desktop app

Source: Screengrab from OpenAI livestream.

I spend a lot of time on my computer flipping back and forth between ChatGPT, other open tabs, and various desktop apps. But now, ChatGPT has a desktop app. You can screen share with it so it can see everything on your computer. You can also easily bring it up any time you’re using your computer by pressing a hotkey command. My workflow will be significantly streamlined.

And, while it’s a step forward in convenience, the present state of the desktop app is less interesting than its future. OpenAI has broken out of the sandboxed environment of a browser tab. Now, it has landed on your computer.

This is critical for three reasons:

It has access to significantly more data. ChatGPT can access your files in addition to anything open in your browser. Private data makes AI much smarter: The more context you can give it about the task you need it to do, the better it should perform.
It gains the ability to be proactive instead of reactive. You might not have to think so much about when to use ChatGPT if it can use context to know when to pop up and be useful.
It gains the ability to operate your computer for you. OpenAI hasn’t said anything yet, but, in theory, it could do common workflow tasks on your computer for you, like doing research, buying products, or organizing your email. This move is clearly designed to enable future AI agents that are much more powerful.

The desktop app is a strategic move for another reason: Google and Apple are almost guaranteed to be integrating AI into their browsers and operating systems. As I wrote in December 2022, this is a powerful threat to OpenAI.

If Apple integrates a good LLM into MacOS, or Google does the same in the Chrome browser, you may not need to visit the ChatGPT website very often anymore. But a desktop app might help OpenAI shield against this vulnerability: Even if you’re not opening your browser that much, you can just use a hotkey command to access ChatGPT.

Now, on to Evan for enterprise analysis and some concluding thoughts.—Dan Shipper

Enterprise contracts

To me, the most intriguing part of the announcement didn’t come from OpenAI’s product marketing materials—it came from Sam Altman’s blog:

“It now looks like we’ll create AI and then other people will use it to create all sorts of amazing things that we all benefit from. We are a business and will find plenty of things to charge for, and that will help us provide free, outstanding AI service to (hopefully) billions of people.” [Emphasis added]

OpenAI is already unique among LLM startups not only because it has the best model, but because it’s doing something even more rare—making money. The company hit a $2 billion annualized revenue run rate (multiplying its latest month’s revenue by 12) in December 2023. To me, Altman’s comments indicate that OpenAI plans to make money by banking on enterprise ChatGPT subscriptions and paid API access to underlying models—and to use free GPT-4o as the world’s greatest top-of-funnel to let everyone else get hooked on the magic.

The free tier makes good business sense. While ChatGPT is a capable general purpose tool, it doesn’t yet have the sufficient context or specificity for most workflows. It's why we are bullish on our homegrown AI-assisted writing app Lex. It’s why the most compelling demo from OpenAI was not from OpenAI, but rather from the third-party Be My Eyes app, which used the video functionality to assist the blind. You get a taste of generative AI from ChatGPT and purchase from a specialized application when you’re ready.

Two things are true in SaaS: Everything is a derivative of Microsoft Excel, but Microsoft Excel couldn’t do all the things that the derivatives based on it can. The same is true in AI. Every AI startup is a derivative of ChatGPT, but ChatGPT can’t do all the things that the derivatives based on it can.

Software founders will be able to take the technical benefits of GPT4o and apply them to problems that require context beyond the scope of ChatGPT. In that regard, for many founders, the most significant news may well be that the API is twice as fast, 50 percent cheaper, and with five times higher rate limits than the previous iteration.

ChatGPT as the meta-layer on top of all of your applications

If you squint, you can see how all of these pieces come together. The expanded context window and memory will allow ChatGPT to understand you and your workflows. The desktop app will become a meta-layer sitting on top of all your applications. The vision, audio, and text generation will be precise and highly intelligent, guided by all of this context.

With yesterday’s announcement, ChatGPT moved a lot closer to becoming a meta-layer capturing value on top of all existing technology. It is exciting—and scary, if it disrupts your business—but it is still a vision, not yet a product. Much of the delivery of that vision depends on whether GPT-5 is able to deliver a significant enough leap forward in intelligence—and quickly enough that competitors like Apple or Google don’t get there first.

In the meantime, the board is set. All we can do is wait and see where the dice roll.—Evan Armstrong

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast How Do You Use ChatGPT? You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

Evan Armstrong is the lead writer for Every, where he writes the Napkin Math column. You can follow him on X at @itsurboyevan and on LinkedIn, and Every on X at @every and on LinkedIn

Subscribe to join 70,000+ readers for the best tech writing on the internet