Who Wins the AI Agent Battle?

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Big tech has agents on the brain.

At the World Economic Forum in Davos last week, OpenAI chief product officer Kevin Weil argued that 2025 was the year of the AI agent, the next evolution beyond the chatbot in which AI software goes out into the world and executes tasks on behalf of users. Meta CEO Mark Zuckerberg has predicted that the company will have an AI agent with the skills of a “mid-level” engineer by the end of the year.

These forecasts are quickly becoming reality. Two days after Weil’s comment, OpenAI released Operator, the company’s first publicly available agent. Operator works by accessing a remote web browser. You give it a task and virtually watch over its shoulder within the ChatGPT interface as it completes that task. It could, say, make a restaurant reservation or fix your code. OpenAI isn’t first to market: There are more than a dozen competitors offering a similar product. But OpenAI is the biggest game in town, boasting 300 million weekly active users.

If agents fulfill the promise that Silicon Valley has made, then we are in for a dramatic reinvention of both knowledge work and busy work over the next year. But first we need to answer a really important question: What is an AI agent? And from there we need to establish how these products will work—and which companies will dominate with them.

Defining agents

Here’s a boring, technical definition: An AI agent is a type of model architecture that enables a new kind of workflow.

The AI architecture that has underpinned ChatGPT takes a command, formulates a response, and returns it. Ask it something simple, like, “Does an umbrella block the rain?” and GPT-4o returns the answer, “Of course it does, dumbass.” The large language model answers the question using its own internal data—its training set and the prompt you’ve fed it. It’s a straightforward, linear workflow: Enter one prompt, receive on output.

By contrast, agentic workflows are loops—they can run many times in a row without needing a human involved for each step in the task. A language model will make a plan based on your prompt, utilize tools like a web browser to execute on that plan, ask itself if that answer is right, and close the loop by getting back to you. If you ask, “What is the weather in Boston for the next seven days, and will I need to pack an umbrella?” an agent would form a plan, use a web browsing tool to check the weather, and apply its existing corpus of knowledge to know that if it’s raining, you would need an umbrella. After that, it would check if its answers are right and finally say, “It’ll be raining (like it always does in Boston, you dumbass) so, yes, pack an umbrella.” Here, one input elicits multiple actions by the model. You’re not starting a call-and-response, you’re conducting an orchestra.

Agentic workflows are so powerful because there are multiple steps to accomplish the task, each of which you can optimize to be more performative. Perhaps it is faster and/or cheaper for one model to do the planning and smaller, specialized models handle each sub-task contained within the plan. Or maybe you build specialized tools to incorporate into the workflow. You get the idea.

With the release of Operator, two new dimensions of agents were thrown into sharp relief:

The context the agent has, and
The user interface that can oversee the agent.

Context is king

If you ever worked with someone who doesn’t share your native language, you know that sometimes requests get lost in translation and you end up with something different than what you asked for. Whether due to cultural differences or language confusion, the end result requires tweaking.

It’s the same way with agents. Their success is contingent on them having the right context for the task. After all, their first language isn’t English—it’s math. To get an agent to accomplish a task, you need to give it examples of what success looks like.

Here’s an example: If you ask a generalist model like GPT-4o, “Turn this essay into a tweet,” it will almost certainly give you something with emojis, hashtags, and sloppy writing. If you prompt, “Give me a tweet in the style of [insert your favorite X account here],” you’ll typically get a better output. If you give 5–10 examples of tweets in the style you are looking for, you’ll get something much closer to the mark.

You’re feeding it “cultural context.” The model doesn’t know what it is seeing, but it is an amazing pattern matcher. Writing is an exceptionally challenging task to complete with AI because there are thousands of subtle things that go into crafting a sentence. (If you want to learn how, I’m teaching a course on How to Write With AI. Registration closes soon, so act quickly if you want to join).

This context problem is compounded when you move from simple generation to AI agents having permission to take action. So when we consider who has a winning AI agent, the first box to check is context. Does the agent have easy access to enough examples and data so that it can complete the task?

You can visualize agents on a spectrum. On the left-hand side is “vertical task automation,” and on the right is “horizontal selling of AI agents.” A vertical work application automates a variety of tasks within one industry—think AI agents drafting legal documents, such as Harvey. In the middle are AI agents geared toward one task, such as software engineering. Cognition Labs, the maker of Devin, focuses on performing one large task—writing code—that cuts across many industries. On the far right are companies that sell AI agents as a service. You pay to access AI agents that can do a variety of horizontal tasks, like calendaring, note-taking, or making a PDF summary. Lindy, which offers a tool that has dozens of AI agents, is an example of this kind of company.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Big tech has agents on the brain.

Defining agents

Here’s a boring, technical definition: An AI agent is a type of model architecture that enables a new kind of workflow.

With the release of Operator, two new dimensions of agents were thrown into sharp relief:

The context the agent has, and
The user interface that can oversee the agent.

Context is king

You’re feeding it “cultural context.” The model doesn’t know what it is seeing, but it is an amazing pattern matcher. Writing is an exceptionally challenging task to complete with AI because there are thousands of subtle things that go into crafting a sentence. (If you want to learn how, I’m teaching a course on How to Write With AI. Registration closes soon, so act quickly if you want to join).

Every illustration.

By their position on this spectrum, you can infer the type of context that a company will need to corner and what their competitive edge will need to be. Harvey is hoping to take care of highly-specialized tasks that require an understanding of legal codes, norms of lawyers’ communication, and document guidelines. Lindy wants to be good enough across a huge range of activities ranging from scheduling and research to meeting summarization, and will be more focused on ease of use versus narrow domain success.

Operator falls squarely into the far-right column as a horizontal agent that can do a little bit of everything. While it has some advantages through ChatGPT’s memory feature—where the application retains information about you—it doesn’t have that much user-specific context. OpenAI will have to compete on the quality of its model and the specific design tradeoffs it makes.

Once the company has the data, the natural question is: How can users interact with those agents?

Robotic intern management

As I’ve been arguing for years, the correct way to think about AI is as the world’s simultaneously smartest and dumbest intern. It’ll blow your mind at what it can accomplish and shock you with its failures.

If you’ve ever had an intern—especially an MBA student from a top business school—you know that you can only put up with a certain amount of their bullshit. Eventually you just throw up your hands and do it yourself. Similarly, the interfaces that companies have to let users control their agents is bounded by how much error management you can deal with. If the agents just require a click versus a typed conversation, you can obviously handle more of them.

An OpenAI executive recently explained why the company went with a remote browser within ChatGPT for its user interface instead of having OpenAI operate your personal browser for you. “We debated many form factors for Operator, and landed on using a remote browser,” vice president of product Peter Welinder wrote. “Since most computer work happens there anyway, it can run in the background, and it’s infinitely scalable.”

Screenshot of the author looking for a restaurant in Paris. Note how there is a remote browser tab open within ChatGPT.

The choice of a remote browser also has the effects of limiting Operator to the context of the internet. If you ask it a general query, it’ll usually search for something on the open web, so it is beholden to the search engine paradigm by which the internet currently operates and lacks access to your personal knowledge database. Welinder’s comment on “infinitely scalable” is worth noting. The company obviously envisions a day where most knowledge workers are participating in the “allocation economy,” as Dan Shipper has described, where the humans’ tasks are to manage enormous numbers of agents doing labor.

However, because Operator is only accessible through ChatGPT Pro right now, that number of agents—for my easily distractible lizard brain—maxes out at around 10. Any more than that, managing the errors they encounter becomes too much for me because of the context switching. One of them is always stuck on a “are you human” CAPTCHA, needing guidance, or just randomly breaking. Someday (possibly soon) that number will increase until you can reasonably handle working with dozens of agents at a time. Most likely you’ll work with an “agent manager” who has a team of agents that they manage, similar to how managers in large organizations accomplish tasks today.

I hate to say this, but the interface dynamic, once again, favors big tech. Because the previous paradigm in Silicon Valley rewarded applications like Meta and Google that aggregated attention, and those same aggregators are the ones that are heavily investing in AI agents now, it is going to be hard for startups to break through. Not impossible! But challenging.

What happens next

The AI agent race won’t be decided by who has the smartest models. It’ll come down to who can get the most context in the interface that users are willing to switch to or already use.

This dynamic leads to a tyranny of distribution. ChatGPT’s swarm of 300 million users isn’t just a number—it’s a moat. Meta could release an agent that hallucinates 5 percent of the time, but if it’s baked into WhatsApp’s 2 billion-person user base, it’ll still dominate. The real battleground is the boring stuff:

Data ownership: Does your agent understand your quirks and workplace well enough to represent you?
Permission protocols: Letting an AI answer your email is one thing—letting it argue with your CFO’s agent is another

These problems are why Zuckerberg keeps shoving AI into every Meta property ranging from Instagram to Whatsapp. It’s not about intelligence—it’s about existing where your context already lives. For Operator to become the dominant agent, OpenAI will have to upgrade the model with reduced error rates and increased data retention to beat the data advantages of incumbents.

So yes, the agents are coming. But their success won't be measured in tokens processed—it'll be in how seamlessly they integrate into our existing workflows. The real question isn't whether we'll adopt them. It's whether we'll admit how much we enjoy the demotion to robot manager.

Evan Armstrong is the lead writer for Every, where he writes the Napkin Math column. You can follow him on X at @itsurboyevan and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex. Deliver yourself from email with Cora.

Get paid for sharing Every with your friends. Join our referral program.

What did you think of this post?

Amazing Good Meh Bad

Comments

You need to login before you can comment.
Don't have an account? Sign up!

Georgia Patrick 6 months ago

Evan... always, I enjoy the context you bring into your articles. The main questions all of my readers have about any announcement is this: Will this lower mortgage rates? Will this lower the price of eggs? Will this make more people want to know me and buy my services? Will this lower credit card interest rates? Workflows don't matter if you have no work. Most of the 340 million Americans are not concerned about AI Agents until you put a lens on it where they can see their life, their family, their future, and their communities of support.

♡ 0 · Reply

Who Wins the AI Agent Battle?

Defining agents

Context is king

Defining agents

Sponsored by: LTX Studio

Take your idea to the next level

Context is king

Robotic intern management

What happens next

What did you think of this post?

Ideas and Apps to
Thrive in the AI Age

What is included in a subscription?

Ideas and Apps to
Thrive in the AI Age

What is included in a subscription?

Related Essays

God Is in the Bubbles

The Addiction Economy

Dad Mode

Comments

Who Wins the AI Agent Battle?

Defining agents

Context is king

Defining agents

Sponsored by: LTX Studio

Take your idea to the next level

Context is king

Robotic intern management

What happens next

What did you think of this post?

Ideas and Apps to Thrive in the AI Age

What is included in a subscription?

Ideas and Apps to Thrive in the AI Age

What is included in a subscription?

Related Essays

God Is in the Bubbles

The Addiction Economy

Dad Mode

Comments

Learn the SkillsAI Can't Replace

Ideas and Apps to
Thrive in the AI Age

Ideas and Apps to
Thrive in the AI Age

Learn the Skills
AI Can't Replace