How Microsoft Is Building for a World of Metered Intelligence

Was this newsletter forwarded to you? Sign up to get it in your inbox.

As I rode in my Uber to Microsoft’s annual Build conference on Monday, I fondly recalled a time when you could get anywhere in San Francisco for $5. Those days are long gone. Venture capitalists lost their appetite to supply unlimited funding in a viciously competitive market, and Uber needed to show a path to profitability ahead of its 2019 IPO.

There are signs that the “$5 Uber era” of LLMs is over now, too. AI labs are subsidizing subscriptions to the tune of thousands of dollars, which can’t continue forever. This year Anthropic, OpenAI, and SpaceXAI are all going public—and like Uber seven years ago, they’ll need to take a hard look at their books. On June 1, the eve of the event, Microsoft sparked outrage by switching to token-based billing on GitHub Copilot. Some users said their bills jumped from $39 to over $3,000 per month.

Rather than backtracking on billing, Microsoft used the conference stage in California to make the case for using AI more pragmatically in the face of rising costs. I came away from the event thinking that Microsoft is the first company to get real about a world where intelligence is available on tap, but constrained by how many coins you can put in the meter. Here is what the company’s vision looks like in practice, and what it might tell us about how we’ll be paying for and pricing AI in the future.

Intelligence on and off the meter: A product approach

In his opening speech, CEO Satya Nadella addressed pricing concerns head-on. He promised “unmetered intelligence to every desk and every home,” an AI-era update to Bill Gates’s vision of “a computer on every desk.”

Microsoft CEO Satya Nadella promised “unmetered intelligence to every desk and every home.” (All images courtesy of Mike Taylor.)

The most tangible way to experience that vision is with the RTX Spark, a new laptop Microsoft designed for AI workloads with Nvidia. The device is able to run a medium-sized 128-billion-parameter model locally (frontier models are in the trillions of parameters) so developers can get a lot of work done without paying a penny for tokens. Microsoft is taking advantage of the fact that the leading open-source models like Kimi-K2.6, which have a trillion parameters, are too big to fit on most laptops, and is betting that budget-conscious coders might not mind being a year or two behind the frontier and use a smaller model. The device will be released in the fall.

The RTX Spark laptop follows earlier feature announcements that show that Microsoft wants to decrease switching costs for customers by being the place where you can use any model, agent, or harness. The laptop has a rebuilt smart terminal app that allows you to run any coding agent harness and has adopted popular terminal commands from the Mac ecosystem to make the shift easier for developers.

Even the GitHub Copilot Desktop app, also released at the conference, makes it easy to switch providers between OpenAI-built, Anthropic-built, and local open-source models running on your device.

When questioned about the affordability of agentic coding, Mario Rodriguez, GitHub’s chief product officer, cited the automatic model routing feature in GitHub Copilot, which can delegate less complicated tasks to cheaper models. In my interview with Kyle Daigle, GitHub’s chief operating officer, he lamented that developers tend to choose “the model of the day, or week, or hour,” even when the task doesn’t merit that kind of power. A person probably will not manually switch to a cheaper model for that final step, “but the tools could.” I’ve also long argued that not every task needs to be done by a frontier model.

I get the feeling the team built this model router feature for themselves after facing the same problem everyone else is right now—Microsoft itself has been cancelling Claude Code licenses to reduce costs.

Features like automatic model routing show that Microsoft understands how runaway costs hurt enterprises that need tighter control over spending. The AI labs won’t let large companies buy highly subsidized individual “Max” plans, so big companies end up paying full freight on every token they burn. One that wasn’t properly monitoring usage is rumored to have spent an eye-watering half a billion dollars on Claude tokens in a single month.

That wasn’t the only news that day: Microsoft’s research lab, led by Mustafa Suleyman, released a set of new (cheaper) smaller models spanning image, voice, transcription, coding, and reasoning.

Tackling costs through model optimization

But when you don’t use the latest models to save cost, there’s a higher risk of making a costly mistake. One answer was a phrase I heard over 100 times at the one-day event:...

Become a paid subscriber to Every to unlock this piece and learn about: