We Tried OpenAI’s New Agent—Here’s What We Found

Learn how to do the best writing of your life—with AI

How to Write With AI is a course taught by Every lead writer Evan Armstrong. You’ll learn how to use AI tools like ChatGPT, Claude, Spiral, and Lex to transform blank pages into powerful content that resonates across the internet. The four-week cohort-based class runs from Feb. 13 through Mar. 6 and includes:

Live lectures and hands-on workshops
A writing group overseen by an Every-trained editor
Interviews with successful internet writers including Every CEO and cofounder Dan Shipper
30 days of quick writing exercises
Your own customized LLM prompt for improving your drafts
A chance to share your writing with Every’s 90,000-plus readers

The class has already been taken by over 90 students, including founders, aspiring writers, engineers, and university professors. Check out the course website for more information and to enroll:

Want to sponsor Every? Click here.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Today, OpenAI announced Operator, a new research preview of ChatGPT that acts as an agent for your repetitive tasks. It can autonomously perform actions for you like shopping for airline tickets, making restaurant reservations, buying flowers and more.

Operator has access to its own browser, and you can watch it navigate the web in real time—and it allows you to step in to take control whenever you want. Unlike previous web-browsing experiences inside of ChatGPT, Operator is designed to handle tasks end-to-end rather than requiring your input in between.

OpenAI gave Every early access to Operator this week and we’ve been putting it through its paces. Here’s what we found.

A tour of Operator—how it works

Operator’s home screen—which lives on operator.chatgpt.com instead of inside of ChatGPT—looks a lot like vanilla ChatGPT with one key difference. ChatGPT usually greets you with the message, “What can I help you with?”

Instead, Operator greets users with, “What can I help you do?” The difference is slight, but revealing: It’s very much concerned with getting work done—and it’s not as much of a general-purpose tool as ChatGPT proper is (at least for now).

Source: Images courtesy of OpenAI.

You can ask Operator to do whatever you want but, below the fold, Operator shows suggested tasks that it can perform for you on some of OpenAI’s partner sites. For example, it suggests finding four tickets to a Kendrick Lamar concert, or researching dinner recipes that take less than 30 minutes and involve chicken.

If you query Operator by, for example, typing, “Find out where Jamie XX has shows scheduled and how much tickets are for each,” you’ll be able to watch it search the web for concerts, and click through StubHub until it completes your task:

At any point, you can take control of its remote browser and nudge it along—for example, to enter in a username and password. If you ask it to, it will also save important account details so, if you log in once, it can take action inside of your accounts without bothering you again.

Eventually Operator will end up at a checkout page and return to you for payment details:

When Operator works, it can take a task that would normally take 15–20 minutes of clicking around and do it for you automatically. It’s a window into the future of how we will all be interacting with software in the coming months and years.

One of the coolest parts of Operator is its saving and sharing feature. Once it has finished a task, Operator makes it easy to save a workflow—like updating a spreadsheet with the latest sales numbers—and rerun it again. It even provides a sleek video of its session that you can watch and share with other people.

You can imagine building up a library of these Operator workflows over time that will automatically do a lot of your repetitive tasks for you—according to your preferences. It might make common chores like buying your weekly groceries or finding a flight that fits your exact preferences much easier.

Operator is a research preview, though, so it’s not perfect. Here’s some of the ups and downs that we noticed in our testing.

What we noticed while diving deeper

Operator is limited in what it can browse

One of the peculiarities of Operator’s design is that it doesn’t use your browser. Instead, it uses a browser in one of OpenAI’s data centers that you can watch and interact with remotely. The upside of this design decision is that you can use Operator wherever and whenever—for example, on any mobile device.

The downside is that many sites like Reddit already block AI agents from browsing so they can’t be accessed by Operator. In this research preview mode, Operator is also blocked by OpenAI from accessing certain resource-intensive sites like Figma or competitor-owned sites like YouTube for performance or legal reasons.

Operator is often stuck in a frustrating glass case—it can’t use all of its powers because it’s hemmed in:

Learn how to do the best writing of your life—with AI

Live lectures and hands-on workshops
A writing group overseen by an Every-trained editor
Interviews with successful internet writers including Every CEO and cofounder Dan Shipper
30 days of quick writing exercises
Your own customized LLM prompt for improving your drafts
A chance to share your writing with Every’s 90,000-plus readers

The class has already been taken by over 90 students, including founders, aspiring writers, engineers, and university professors. Check out the course website for more information and to enroll:

Want to sponsor Every? Click here.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

OpenAI gave Every early access to Operator this week and we’ve been putting it through its paces. Here’s what we found.

A tour of Operator—how it works

Source: Images courtesy of OpenAI.

Eventually Operator will end up at a checkout page and return to you for payment details:

Operator is a research preview, though, so it’s not perfect. Here’s some of the ups and downs that we noticed in our testing.

What we noticed while diving deeper

Operator is limited in what it can browse

Operator is often stuck in a frustrating glass case—it can’t use all of its powers because it’s hemmed in:

It’s a Taskrabbit, not a research assistant

One of the first things we tried (because we’re writers) is to use Operator for things other than getting tasks done. For example, we asked it to read the first chapter of War and Peace, and summarize all of the little details of each character and what they demonstrate about human psychology and behavior. Operator did a fantastic job of locating War and Peace on the Project Gutenberg website and reading Chapter 1.

But its summaries were bland and high-level:

In the summary above, it makes a correct observation: “Characters are conscious of their social standing, with some, like Anna Pavlovna, carefully managing interactions to maintain decorum.” But the summary has a SparkNotes flavor; it’s not detailed enough for me to really understand what’s going on in the chapter.

This is a task that OpenAI’s o1 would do much better at given the same information—but o1 doesn’t have the ability to autonomously perform tasks yet. And this gives you a sense for how OpenAI is thinking about Operator so far. The company is focused on making it great for performing repetitive workflows automatically and less focused on other aspects of its intelligence.

It’s truly autonomous, but prompting still matters

One of the most impressive things about Operator is that it can do lengthy tasks on its own with minimal prompting. For example, we had it perform a task that took more than 20 minutes: We asked it to help us understand how Spotify Wrapped has evolved over the years. What did it start out as? What does it include now that’s new? It needed a little encouragement here and there to keep going, but it ended up eventually accomplishing the task it had been set. This is a significant improvement from the agentic experiences of 12 or even six months ago that would frequently go off the rails after only a few seconds.

That said, it still matters how you prompt Operator. It has a higher chance of succeeding at the task you give it if you tell it more details about how you want it to be accomplished.

For example, in our Spotify Wrapped example, we asked it to gather and summarize search result data from different years. It originally failed because it didn’t know how to filter search data by year. But when we told it to use Google’s Advanced Search Tools, which provide a year-by-year filter to search, it worked.

We’ve only played around with Operator for a few days, so we imagine there’s much more advanced prompting that could get more out of its capabilities.

OpenAI continues its pivot to a consumer-first company

Operator is only a research preview, so it’s not a polished product. But the fact that it’s a research preview is telling. Do you know what else was also originally a research preview? ChatGPT.

OpenAI is going back to the original strategy that worked so well with ChatGPT: Release early and often, even when there are rough edges. Not only that, but release consumer products rather than just APIs.

This is dramatically different from where OpenAI started, and it differs from what competitors like Anthropic are doing. Anthropic also has an autonomous agent, Claude Computer use, but that’s only been released as an API, so adoption has been fairly limited.

What OpenAI learned with ChatGPT is that the form factor in which you release AI matters just as much to adoption as the underlying technology. So it’s releasing its first agentic experience as a consumer product—rough edges and all—rather than as just a developer API.

Even though Operator is limited today, we expect it to rapidly improve. It’s a good time to take stock of the repetitive tasks you’re doing on your computer every day—you may not need to do most of them a year from now.

Below, for paying subscribers, we have details on all of the experiments we tried with Operator and the results so that you can get a sense for what it’s good at—and not.

Our experiments with Operator

Here are some of the things we tried with Operator and the results:

Ticket buying

Our prompt: “Find me 2 tickets for the next Jamie XX show in Los Angeles.”

Result:

With selected partner (Stubhub), success

Without a selected partner, failed to navigate blocked sites

Our prompt: “Find out where Jamie xx has shows scheduled and the price of tickets for each.”

Result: With selected partner (Stubhub), failed

Hallucinated, and after nudging, eventually succeeded

House cleaning

Our prompt: “I need a one-time house cleaner for our home by tomorrow.”

Result: With selected partner (Thumbtack), failed

Didn’t ask me where I was located and tried to present me with a cleaner in Virginia (presumably next to the data center this browser is from)

AI news

Our prompt: “What is the latest in AI news?”

Result: With selected partner (Axios), OK

Returned a very brief summary of a single story

Spotify Wrapped

Our prompt: “I want to learn about what makes Spotify Wrapped so successful.”

Results:

Simple prompt, failed

Got stuck in loops, didn’t work well, limited research back

Complex prompt, OK

Needed to be nudged back on track multiple times and the report wasn’t great, but it did accomplish the task once given specifics on how to do so with advanced search; refused three requests to get information from 2024

Booking an Uber

Our prompt: “How much is an UberX to the airport right now?”

Result: With selected partner (Uber), success

Asked good questions: “Could you provide me with a pickup location so I can find the price for an UberX ride to the airport?” and “Could you specify which airport you would like to set as the dropoff location?” Operator asked me to sign in and then provided the answer. “The current fare for an UberX ride from the University of Southern California to Los Angeles International Airport (LAX) is $43.47. Would you like to proceed with booking this ride?”

Summarize a book

Our prompt: “Go read through the first chapter of War and Peace. Tolstoy was an incredible observer of human behavior that he wrote into the way he has the characters interact with each other. I want you to list out all of his observations and summarize for me what he notices about human nature that I might not know.”

Result: Without selected partner and with a complex prompt, partial success

Couldn’t dive deeper

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

Alex Duffy is the consulting lead and a staff writer at Every, where he writes for the weekly Context Window column. You can follow him on X at @theheroshep and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex. Deliver yourself from email with Cora.

Get paid for sharing Every with your friends. Join our referral program.