Reddit: The Internet’s Bellwether

Is AI training a sustainable business model for the future?

Every illustration/Reddit.

Sponsored By: Amplitude

Struggling to understand user behavior and boost retention in your digital product? Amplitude can help with free and discounted tools for startups. With Amplitude, it's easy to:

  • Uncover user insights: Get a clear picture of how users interact with your app.
  • Boost retention: Identify drop-off points and optimize for user engagement.
  • Scale with confidence: Make data-driven decisions to fuel your startup's growth.

Startups: Apply today to get one year free on Amplitude's Growth Plan and unlock:

  1. Unlimited product analytics
  2. Feature flag management
  3. Session replay for qualitative insights
  4. Advanced features like behavioral analysis and experiments
  5. Audience management and more

In August 2005, Kevin Kelly, co-founder of Wired magazine, had an epiphany: We had overlooked the majesty of the internet. “Kings of old would have gone to war to win such abilities,” he wrote. “Only small children would have dreamed such a magic window could be real…The success of the Web at this scale was impossible. But if we have learned anything in the past decade, it is the plausibility of the impossible.”

This was nearly 20 years ago! The “plausibility of the impossible” has been pushed far beyond what Kelly wrote about. The digital miracles of today, wonders like LLMs, weren’t on the radar back then. 

On Kelly’s list of the advances of 2005—which includes “road maps with driving directions…help wanted ads…tax forms, [and] TV guides”—were a select few services that have printed cash, like Intuit making billions with its tax prep software Turbotax. But most others have gone the way of the dodo. Look at how any company involving TV that isn’t Netflix has struggled to turn a profit.

One of the few remaining sites from that period is Reddit, the last great social media company of the early 2000s. The company, which sees 76 million users log on daily to post and read content they can’t find anywhere else, recently filed its S-1 to go public. Reddit is the closest thing we have to a global community expertise graph—anyone can share what they know best, whether that’s strange bugs or financial independence. It is a miracle. It is so much better than the regular internet that most Google queries can be improved by simply tacking on Reddit's moniker at the end of your question. 

Reddit is perhaps the embodiment of Kelly’s plausibility of the impossible. And yet, like all of its social media peers that aren’t owned by Mark Zuckerberg, it is a remarkably middling business.

According to its S-1, the company grew revenue by 21 percent, with $804 million in 2023, and daily active users are up even more, over 27 percent year over year. Unfortunately, the average revenue is down 2 percent year over year to $3.42 per user. The company lost $91 million last year—mostly driven by a staggering $439 million in research and development expenses (typically software engineer and designer salaries), which is 55 percent of its revenue. That ratio of spend would be fairly normal for a startup that’s three to five years old, but Reddit was founded in 2005. When Facebook went public in 2012, it ended that year with 27 percent of revenue being spent on R&D. 

The past 20 years of the internet have taught us one lesson: Enormously useful does not equal enormously profitable. Take Snapchat (which had 414 million daily active users last year) or Pinterest (which had 498 million monthly active users in the same period), both of which have been mediocre successes at best. To build a winning company, you need multiple overlapping strategic advantages and the internal culture necessary to execute those advantages.

Reddit’s defense is that it “[d]id not begin meaningful monetization efforts…until 2018, and we are currently exploring new strategies for monetization.” Not trying to monetize until 13 years after being founded is, by definition, a ZIRP.

But I’m not quite prepared to write Reddit off as a soon-to-be-failure created by the conditions of the old internet. It’s because 2024 has something different—AI.  

Understanding your users' behavior and boosting retention has never been simpler than with Amplitude. Their platform provides startups with unlimited access to product analytics, allowing you to get a clear picture of how users interact with your app. By identifying drop-off points, you can optimize for better engagement and increase retention rates.

Amplitude's Growth Plan also includes feature flag management and session replay, giving you qualitative insights into user behavior. Plus, with advanced features like behavioral analysis and experiments, you're equipped to make data-driven decisions that fuel growth.

What is Reddit and the internet in the age of AI?

Part of that $804 million in revenue is the $60 million annual license fee that Google is paying Reddit for the latter’s data. Reddit argues that its data will be key to training LLMs in the future:

“In a world increasingly saturated with AI-generated content, we expect users to increasingly seek out and value fresh ideas, and that models will need to refresh their learning from these ideas.”

In a sense, the company is right—AI model companies do need clean, fresh data sets to train on. Reddit can provide that, and because users keep coming back to the site, the dataset will be kept up to date. Data licensing is something that media companies (social or otherwise) haven’t had available as a revenue stream in the past. In a just world, it would be large enough to sustain the information ecosystems of yesteryear. Rather than rely on classified ads or paywalls, a local newspaper could instead be compensated by the AI model provider for its data. 

However, I worry that the licensing might also be like the 1793 French government arguing that “due to our growing sharp blades and wooden posts industry, we expect to greatly benefit from guillotine technology.” You are selling the thing that will eventually kill you. 

Reddit will have to make sure that users aren’t just using those same AI tools to produce their posts, degrading the quality of the content. There will need to be some form of Know Your Customer (KYC), an identity verification system similar to what traditional finance uses, or maybe a crypto solution. Regardless, low-quality AI spam will be an issue.In addition, if AI models get as powerful as I think they can, they could make such a superior written entertainment product that no newsletter or community forum can match its utility. *Looks in mirror uncomfortably.*

Kelly took for granted that things on the internet would be “available on demand and free of charge.” Free of charge has mostly been true, but now that big tech has sucked up so many advertising dollars that a lot of media companies can’t survive, it’s unclear whether this state will be sustainable going forward. If models like ChatGPT or Gemini could, theoretically, be so good that their output replaces the majority of internet content, the monetary incentive to create new training data for the internet also decreases. In that case, training data that is paid for using parasocial capital—i.e., likes and comments on a site like Reddit—would end up being even more valuable. 

Technology can sometimes shrink the total revenue in a market while dramatically increasing the utility given, so everyone makes less money while consumers are that much happier. Just look at the music industry, with streaming, or the digital camera industry, with smartphones, as recent examples. ILLMs could significantly improve consumers’ quality of life and be a net boon for society—and simultaneously devastate the majority of legacy content markets. Jeff Bezos is notorious for saying, “Your margin is my opportunity.” In the case of AI, it might be the same story for companies that rely on advertising.

The internet from 2000–2022 worked so well because free content helped media companies grow faster. And if you were smart, you could capture enough first-party data to serve more performative advertising. If this is no longer the case, the questions that all content hosts have to ask are, “Can providing my data to the AI companies give me more revenue than other options? Would it be better to hide it from the web crawlers? To distribute only via email?” 

The answers aren’t clear, but Reddit is a bellwether, an example that the rest of the rapidly aging internet will have to learn from. Thankfully, it is going public right at the time we need to figure it out.

Evan Armstrong is the lead writer for Every, where he writes the Napkin Math column. You can follow him on X at @itsurboyevan and on LinkedIn, and Every on X at @every and on LinkedIn.

Was this newsletter forwarded to you? Sign up to get it in your inbox.

Like this?
Become a subscriber.

Subscribe →

Or, learn more.

Thanks to our Sponsor: Amplitude

If you're grappling with understanding user interactions and increasing retention, Amplitude offers a comprehensive solution with its array of free and discounted tools tailored for startups. It's an opportunity to dive deep into user insights, enhance engagement, and scale your startup with confidence. Ready to accelerate your journey to product-market fit? Click here to apply for one year free on Amplitude's Growth Plan.

Read this next:

Napkin Math

The One-person Billion-dollar Company

Can AI agents make you a billionaire?

8 Feb 8, 2024 by Evan Armstrong

Napkin Math

Devote Yourself to the Cause of Your Life

Your to-do list can wait

1 Aug 3, 2023 by Evan Armstrong

Napkin Math

The Cost of Greatness

What will be the blood sacrifice on your altar of ambition?

2 Jun 1, 2023 by Evan Armstrong

Thanks for rating this post—join the conversation by commenting below.


You need to login before you can comment.
Don't have an account? Sign up!
Arun Palanichami about 1 month ago

Evan - This was timely and interesting! Did you see this research paper which had a cool graphic that highlighted the impact of ChatGPT's impact on Stack Overflow vs Reddit? Bottomline - ChatGPT is undermining question-and-answer communities, like StackOverflow, but not ones that require human interaction, like Reddit… yet. (Both are sources for training data)

Arun Palanichami about 1 month ago

@oliohermes about 1 month ago

I've been looking through the data and total music industry revenue is up since the advent of streaming, are you referring to physically recorded music?

Every smart person you know is reading this newsletter

Get one actionable essay a day on AI, tech, and personal development


Already a subscriber? Login