Can a Startup Kill ChatGPT?

The destination chatbot market has become a knife fight for dominance.

A year after OpenAI’s release of ChatGPT-4, Google and Anthropic have caught up in the quality of their chatbot products. They’ve both released public or private beta models (Gemini 1.5 Pro and Claude 3 Opus, respectively) that sport larger context windows than GPT-4. They also match GPT-4 by benchmark and surpass it in vibes, in certain cases.

OpenAI is still clearly the winner by popularity, though. I’d be surprised if Google or Anthropic truly threatens it for mainstream adoption in the near future. And I’m curious to see what OpenAI does next. I’d be willing to bet that the company has a compelling response up its sleeves. Today, though, I want to talk about startups.

How vulnerable is ChatGPT to disruption? Let’s say OpenAI maintains its dominant position against incumbents—will it be able to maintain it against disruption?

Both Google’s Gemini mishap and the sometimes-backwards trajectory of ChatGPT response quality over the last few months suggest to me that large chatbot players are vulnerable to disruption—unless they modify their product strategy. Let me explain why.

A quick primer on disruption

The word “disruption” is used colloquially to mean any instance where a startup beats an incumbent, but in its original formulation, it meant something specific.

Disruption, as theorized by Clayton Christensen in the early 1990s, is a process by which a startup offers a lower-cost product that performs worse along standard dimensions of performance for a small subset of customers outside of the mainstream. The product gets adoption, though, because it performs better on a new dimension of performance that is important to its niche customer set. Over time, the disruptor improves on standard performance metrics so that it can move up-market to higher-value customers, while maintaining its other advantages.

The startup is able to displace a larger, well-managed incumbent because the latter sees that the startup’s original product is lower cost and lower margin, and generally performs worse. So it looks like a bad business. Therefore, the incumbent fails to react until it’s too late.

It’s a neat theory because it proposes that businesses fail not because managers are stupid, but because smart managers at incumbents properly following the incentives of their business and sticking close to their customers become blind to disruptive innovations. (For more on disruption theory from me, see this previous piece.)

Disrupt, dip, dive, and disrupt

Disruption happens when a startup releases a product that’s lower cost and lower performing than a large incumbent’s product—except that it outperforms in one dimension that the incumbent won’t copy.

There’s a very clear area where this applies in chatbot land: returning risky responses.

In December 2022 I wrote in "Artificial Unintelligence":

“We’re already at a point in the development of AI where its limitations are not always about what the technology is capable of. Instead, limits are self-imposed as a way to mitigate business (and societal) risk.”

In other words, while large incumbents are scrounging for every last available GPU and hoovering up every last bit of data they can find to train ever-larger models, they are ignoring the fact that inference quality isn’t limited by the amount of data they use for their training runs. Instead, it is limited by their willingness to look bad or get sued.

The destination chatbot market has become a knife fight for dominance.

How vulnerable is ChatGPT to disruption? Let’s say OpenAI maintains its dominant position against incumbents—will it be able to maintain it against disruption?

A quick primer on disruption

The word “disruption” is used colloquially to mean any instance where a startup beats an incumbent, but in its original formulation, it meant something specific.

Disrupt, dip, dive, and disrupt

There’s a very clear area where this applies in chatbot land: returning risky responses.

In December 2022 I wrote in "Artificial Unintelligence":

As chatbots get more widely distributed, the responses that they return tend to become blander. They’ll refuse to reproduce copyrighted work, take sides on thorny political issues, or dispense medical and legal advice.

Why is this? Large incumbents need to toe a fine line between giving users what they want—and not causing their legal, communications, and compliance departments to lose their minds.

Potential legal and reputational exposure is a much greater problem in chatbots than it is in search. Why? A search result is a list of links to other people’s sites. Google doesn’t have to take as much responsibility for the links that it serves because it’s linking to the sites that are most relevant to the user’s query.

Things are different in chatbot land. A chatbot gives an answer written for a user that has Google (or any other LLM provider’s) name on it. As the chatbot scales, its parent company will force it to act more like a corporate comms person: bland, edges sanded off, carefully honed to not piss anyone off. This is not a technical necessity—it’s a necessity of the way many stakeholders inside of a large organization act to constrain its communication.

This presents an opportunity for small startups. You don’t have to beat OpenAI or Google at building the largest model with the largest amount of data. You just have to be willing to take a risk by, say, allowing your chatbot to say things that OpenAI or Google’s models are not allowed to say. Suddenly, you’ll have surpassed frontier model performance without state-of-the-art technology.

You’d still only be serving a specific niche, and it’s not clear that there’s a path to scaling without running into the same blandness issues that incumbents face. You’d also have to deal with the legal, moral, and reputational repercussions. But there are ways to create such a product.

How incumbents will likely respond

As I wrote in December 2022, incumbents are likely to learn to distribute legal, moral, and reputational risk to their users. Over time, the risk landscape will look more like search (“We’re just linking!”) than one in which they attach their names to every result their models return.

I see three ways to distribute risk:

Open source: Meta bears less risk for what people do with its open-source models than Google with Gemini or OpenAI with ChatGPT. This is for the simple fact that the company doesn’t host the models itself.
APIs to enable third-party apps: All of the major players allow anyone to build an app with their models, and the model they expose is already less restrictive in responses than the one available on the first-party app.
Custom chatbot personalities: If the response from ChatGPT or Gemini comes not from ChatGPT or Gemini itself but from, say, the way an individual brand or consumer tuned the model, the companies can get away with riskier results. ChatGPT’s custom GPT feature is a step in this direction.

Maybe it’s my bias as an internet writer, but the custom chatbot personalities option is the most interesting. I think the best business position for large incumbents is to allow the responses generated by AI to be written by another brand or “author” who appeals to a specific audience. (The same goes for tools like Arc [in which I’m an investor] and Perplexity, both of which automatically summarize search results into “articles” for users. The output is generic, but over time I expect it to be voice- and personality-dependent.)

I can imagine, for example, a scenario whereby readers of this newsletter opt for ChatGPT or Perplexity to write answers in an Every house style that mimics the curious, technical, and entrepreneurial outlook of our readers. I could see the same thing for people who enjoy writing and perspectives that I find unappealing, like Breitbart or TMZ.

Ultimately, I think allowing AI to represent a wide diversity (within certain limits) of perspectives and viewpoints is the best business decision for incumbents and the best experience for users—and leaves room for a new breed of publishers and content creators to thrive in an AI-first world. If incumbents don’t enable it, I’d expect startups will.

Game on.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast How Do You Use ChatGPT? You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.