Transcript: 'He’s Building the Plumbing For AI to Use the Internet'

The transcript of AI & I with Alex Rattray is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:01:14
Why Alex likes running barefoot: 00:02:54
APIs and MCP, the connectors of the new internet: 00:05:09
Why MCP servers are hard to get right: 00:10:53
Design principles for reliable MCP servers: 00:20:07
Scaling MCP servers for large APIs: 00:23:50
Using MCP for business ops at Stainless: 00:25:14
Building a company brain with Claude Code: 00:28:12
Where MCP goes from here: 00:33:59
Alex’s take on the security model for MCP: 00:41:10

Transcript

(00:00:00)

Dan Shipper

Alex, welcome to the show.

Alex Rattray

Thanks, Dan. It’s really exciting to be here.

Dan Shipper

It's good to have you. So for people who don't know, you are the founder and CEO of Stainless, which is an API company. You make APIs for companies like OpenAI and Anthropic. And just about any big company that you might use their API, Stainless is probably behind it. Before that you worked at Stripe doing their API—surprise. And before that, most importantly, we were very good friends in college and we've remained good friends. We were both starting companies in college. I'm a tiny investor in Stainless, but it's been really, really fun to watch your journey and get to hang out together so much over the years. And I'm just very excited to bring you on to talk about AI and what you're doing at Stainless.

Alex Rattray

Thanks, Dan. Yeah, it's been really fun over the years. I mean, when we were in college I was working on a startup, you were working on a startup, you had a conference room at a venture capitalist office as your office. And you let me crash there with my cofounder and team, and we were just on the other side of the conference table hacking away into the evening. Very fond memories of those days. And these days it's not every evening, but on the weekends, the same thing is still happening. And you don't see that every day, and it's really a nice feeling. And it's been great to see everything happening with Every along the way.

Dan Shipper

Thank you. As I say, we started from the bottom. The thing that I always say when people—when I run into people and they ask me about you in order to embarrass you—I just talk about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Because when we first met you were not a fan of shoes and you were a fan of running. You want to talk about that?

Alex Rattray

It wasn't that I didn't like the concept of shoes, it's that I couldn't find a good pair. And at a certain point I was running through Nikes and they would bust open every few months. I think what was actually going on is I had really wide feet and I was buying probably narrow shoes, but shoes would constantly get ruined. And on a college budget, it's just like, this is no good. Eventually I decided, okay, the longer you wear your shoes, the more worn out they get. But the longer you just wear your feet, the tougher they get. So the longer you wear your feet—try it out. What could go wrong? I actually currently have a really annoying splinter in one of my feet, so don't actually try this at home, but—

Dan Shipper

Are you still running barefoot?

Alex Rattray

No, no. This is just from around the house.

Dan Shipper

I see. Dangerous.

Alex Rattray

But see, that's the thing. If I had been going around on the asphalt without shoes on, then my feet would've been tougher and I'd have no splinter.

Dan Shipper

So when you're not running barefoot, you were running Stainless.

Alex Rattray

I do wear shoes in the office. Typically.

Dan Shipper

That's interesting, because we have a shoes off policy. So you could actually do a no shoe policy if you wanted. We're a big formal tech company.

Alex Rattray

I see. That's sad.

Dan Shipper

Sad to hear. Happens to the best of us, I guess. So you're running Stainless—how many people are you now? You're around 50, right?

Alex Rattray

Just about.

Dan Shipper

That's pretty wild. And you started Stainless in a pre-AI world and now we're in an AI world, and I think you have some ideas for what the future of AI is going to be, and maybe how APIs fit into that. Do you want to paint a little bit of a picture for us about where we're going?

Alex Rattray

Yeah, I would love to.

Dan Shipper

So to start with what's an API? Not everybody's familiar with that.

Alex Rattray

So it stands for Application Programming Interface. There will not be a quiz.

Dan Shipper

Right, right. No quizzes.

Alex Rattray

Great. But basically it's how one computer program talks to another computer program. It's how computers talk to computers, how apps talk to apps. And so APIs are the dendrites of the internet. Dendrites are where your neurons connect and actually exchange information with each other. So if you have two neurons in your brain, but they're not talking to each other, you're actually not thinking, right? There is no thought happening in a brain without connections between neurons. And if you think about the internet, if all these servers in the cloud aren't talking to each other, you wouldn't have internet, right? There's nothing going on if programs—internet software is doing nothing without APIs, without connections to other programs.

And so it's really fundamental to the mesh of pretty much all modern software, everything that we think of when we think about technology at this point. APIs are kind of at the heart and center of that, just like dendrites are the center of the mesh of the brain and how we think. And Stainless's mission from day one was to make it easier for computers to talk to computers. So Stripe had this great mission statement of "increase the GDP of the internet." And I thought that was really awesome. And at Stainless—we haven't ratified this—but I sort of think about increasing the interconnectedness of the internet. And you know, it's the long-running trend of technology to have more automation, right? Automation is what we mean when we say, okay, we're going to apply technology to something—we're generally going to be making things more efficient. And APIs are how most business-to-business interactions in some format or another become real, become automated.

What we see with the rise of AI is that there's a new computer that has entered the chat, right? There's a new kind of system that can talk to other systems. Or at least we would like it to be able to. You used to have either humans interacting with a computer through a user interface, a UI, or a computer interacting with a computer through an API. And now we have LLMs interacting with computers. And what's that through? And I'm sure anyone familiar with Every, and whose regular listeners, is going to be familiar with MCP—Model Context Protocol—which is a system for connecting LLMs to computers broadly speaking. And it's an area that we're investing in at Stainless. It's really, I think, part of our core mission of making it easy for computers to talk to computers.

And we've invested a lot of time at Stainless—the core product that we first brought to market is Software Development Kits, SDKs. And so these are ways of saying, okay, Stripe has this great REST API. You can send JSON over HTTP and get back JSON over HTTP. And if you want that to be really convenient, you're going to use the Stripe Python Library, the Stripe Python SDK. So you can go, if you're a Python developer, you'll go pip install stripe, and then in your application code you'll write stripe.customers.create, and all of a sudden you have a nice new customer object in your Stripe database. And you're off to the races. Or stripe.charges.create in the old days to charge a credit card.

And SDKs are what gives developers that easy way to interface with an API. What's the thing that gives LLMs an easy way to interface with an API? And you might say MCP and in a sense you'd be right. But what we're seeing so far as MCP is rolling out into the world and people are experimenting with it and trying it out is that it's not working so great. It's difficult to deliver on what I see as the core vision of what's so exciting about MCP, which is—just like a dashboard and a user interface lets you click around, see a bunch of stuff, fill out forms, click buttons, do things. Anything that you would do while you're interacting with software you do through the user interface generally. But LLMs interacting through MCP, it tends to be much more restricted. You can only do a few little things. There's usually not a ton of tools that you're going to be exposing to the models.

(00:10:00)

Dan Shipper

And just to stop you there, so I think what I'm hearing you say is—what MCP does, just like a website is built for humans to use, MCP is sort of the equivalent in certain ways of exposing a set of tools for the model that it can use to perform certain functions. Just like you might click a button on a website, MCP gives the model a bunch of things it can click on or use to get work done. So an example might be a Gmail MCP that has a send mail tool or a compose mail tool or a read inbox tool, that kind of thing. And instead of a human going on the Gmail website and doing it, it's the LLM essentially logging in and using it itself. And it's a native interface for language models. But you're saying that's not working that well. Can you tell me more about that?

Alex Rattray

So let's start actually with what I see as the big vision of MCP and in some sense the big vision of agentic AI in the first place. And I'll start with the most pedestrian example you can imagine—it's going to be funny given some of our context—which is, let's say Dan walks into my store and buys a pair of striped socks and maybe a few other things. And then the next day I hear back from Dan that there's something wrong. Unfortunately, it happens. And I turn to someone on my team and I say, hey, can we refund Dan for those striped socks he bought yesterday and send him a discount code for the next time he comes in with a little thank you note? Because we want to take care of our customers. This is the most normal thing to do in software—some little task like this.

And what you're going to do, what the member of my team would be doing, would be opening up an internal admin and looking around for some things. They might go to the Stripe dashboard and try to look through the list of payments or the list of transactions or orders and try to find one that has someone named Dan—which Dan, I don't know. There might be a bunch of Dans. Try to look through the list of products in the order and see whether there were some striped socks in there. That might be a few clicks required, depending on finding the right one. Then go to the screen where you can create a refund, create a refund, make sure it's the right amount, then go and create that discount, and then take that discount code and send it over to some other SaaS app where you log in to send some mail automatically, right? And of course if you step away from the consumer version of this to a business-to-business context, you might be going into Salesforce and sending a Slack message to an account administrator or account manager and so on and so forth.

And in the normal course of work, it's just the most normal thing in the world to be doing—having one task, going through five different apps each time, 15 different clicks and scrolls and loading spinners, just to do one simple thing. And the promise of agentic AI is to be able to take that same prompt I just said and type it into ChatGPT or Claude or whatever, and say, hey, chatty buddy, can you help refund my friend Dan? And just have the AI go off and do that and basically go through these five different apps and the 15 different screens and the various different button presses to complete the task and then come back and say, great, it's done.

In order to do that—now that's, there's only so many tool calls you have to make as an AI model to perform that exact linear chain of events. It's somewhat tractable. But if you think about this in the general case, you want the LLM to be able to do—you want your agent AI to be able to do anything that a human operator would've done, and you would want them to be able to do it without having to wait for a bunch of JavaScript to load on a website or anything like that. And that means you need not only the Stripe create refund tool and the Stripe list transactions tool and the Stripe list products and look up customer and create discount tool. You need not only those tools, but you need everything that you can do in the Stripe dashboard, which is basically everything that you can do in the Stripe API. And that's actually a lot—there are hundreds of different endpoints that you have access to in the Stripe API. The Stripe dashboard is actually massive. It's a huge application.

And if you were to take that list of tools today and go to an LLM and say, hey, here's our MCP definition for all of this. Here's a create refund tool. Here's a create transactions tool, so on and so forth, and you tell it all about those tools. Here's the description. Here's all the different request properties that you can send. Here's the response properties you can get back. Here's all the documentation for each of those things. Everyone listening to this should already know—you've just burned through your entire context budget. That's maybe hundreds of thousands of tokens just there, just in pretty much translating the Stripe OpenAPI spec directly over to MCP tools.

And today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on. But it's also confusing to the model. It's just too much to hold in your brain at one time. And that's just the Stripe part of it, right? Because what you're really trying to do is enable your operators to do anything they would normally do. And again, that spans many, many different SaaS tools, right? In the course of one interaction, it might be five. In the next interaction, it might be a different five. And so if you think about every single SaaS tool that your business uses on a daily basis to get your work done, ideally, you would want every single one of those tools to be exposed to your operators in their AI chat with every single tool available in there with every single nook and cranny and corner case available so that you can do anything through AI. That's the vision.

The transcript of AI & I with Alex Rattray is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:01:14
Why Alex likes running barefoot: 00:02:54
APIs and MCP, the connectors of the new internet: 00:05:09
Why MCP servers are hard to get right: 00:10:53
Design principles for reliable MCP servers: 00:20:07
Scaling MCP servers for large APIs: 00:23:50
Using MCP for business ops at Stainless: 00:25:14
Building a company brain with Claude Code: 00:28:12
Where MCP goes from here: 00:33:59
Alex’s take on the security model for MCP: 00:41:10

Transcript

(00:00:00)

Dan Shipper

Alex, welcome to the show.

Alex Rattray

Thanks, Dan. It’s really exciting to be here.

Dan Shipper

Alex Rattray

Dan Shipper

Alex Rattray

Dan Shipper

Are you still running barefoot?

Alex Rattray

No, no. This is just from around the house.

Dan Shipper

I see. Dangerous.

Alex Rattray

But see, that's the thing. If I had been going around on the asphalt without shoes on, then my feet would've been tougher and I'd have no splinter.

Dan Shipper

So when you're not running barefoot, you were running Stainless.

Alex Rattray

I do wear shoes in the office. Typically.

Dan Shipper

That's interesting, because we have a shoes off policy. So you could actually do a no shoe policy if you wanted. We're a big formal tech company.

Alex Rattray

I see. That's sad.

Dan Shipper

Sad to hear. Happens to the best of us, I guess. So you're running Stainless—how many people are you now? You're around 50, right?

Alex Rattray

Just about.

Dan Shipper

Alex Rattray

Yeah, I would love to.

Dan Shipper

So to start with what's an API? Not everybody's familiar with that.

Alex Rattray

So it stands for Application Programming Interface. There will not be a quiz.

Dan Shipper

Right, right. No quizzes.

Alex Rattray

(00:10:00)

Dan Shipper

Alex Rattray

Now, there's a lot of problems with that. The biggest one that I mentioned is this context window limit. But you also have all sorts of security and permissions problems because you don't want the AI to color outside the lines and say, okay, in addition to refunding Dan's socks, I also refunded every customer for all transactions ever, and then I sent a bunch of money to my own AI bank account. And so there's more to the challenge, but that's the vision.

Dan Shipper

But I think the place we started there was, you said it's not working. But I don't think that's the reason why it's not working today. Or is that the reason why it's not working today?

Alex Rattray

So what people do with MCP today is sometimes they'll try to expose all parts of their API. The way people build MCP tools is generally speaking, they have an underlying API—usually a REST API—and they wrap different parts of that, different endpoints or different operations, in MCP tools. And you can kind of do that in a one-to-one mapping or you can kind of handcraft things for the MCP. And today in order to succeed, people are finding that you really have to handcraft it for the MCP, for the LLMs. You have to say, okay, I'm making one specialized tool to look up a customer and refund their transaction based on a description.

Dan Shipper

Yeah, and I'll tell you from experience, the tool design is really non-trivial. For example, I was building a tool in Cora, which is our AI assistant—our AI email assistant. I wanted to do a bulk archive tool, so we have a single email archive tool, but what I wanted is a bulk one where you can archive many emails at once, right? And that seems easy, but there's a lot of interesting things to think about there, right?

First of all, in order to bulk archive, you need to search for those emails. We already have a search tool. So should the AI use the search tool and then use the bulk archive tool after that, or should the search be built into the bulk archive? Another example is we want to preview the things you're about to do before you go archive all the emails in your inbox. Should that be a separate tool—preview bulk archive—or should it just be in the bulk archive tool as part of the flow? So there's all these decisions that you have to make where you need to have the ergonomics of the model and how the model thinks in mind in order to make sure the model does the right thing more often than not.

Alex Rattray

Yeah, it's hard. It's hard. So I use this SDK analogy sometimes. So it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer wrapping an API, and I think we've cracked that nut. Stainless offers really great Python libraries, but we're building on the shoulders of giants here. A lot of people have done this over time. We haven't figured out how to expose an API ergonomically to an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer. And that's kind of a new research problem in a sense. And it's harder because I can go learn how to be a Python developer if I want. I can't really learn how to think or see like an LLM. But sure would be powerful if I could. And that makes it tricky.

We do have at Stainless, I think, some things that we're cooking up to address some of these problems, including the ones that you also mentioned. LLMs have a really hard time with repeated, sustained chains of actions. And you know, even if you get an API response back around, hey, list all the transactions, there's so much data and you might have to go through the next page and the next page and the next page to go through all the transactions to find the one that has Dan with the striped socks. And that's again, a ton of context with one or two small needles in the haystack. And LLMs are pretty good at that, but they're not perfect. And with too much hay, we all kind of end up throwing up our hands, and that's true for LLMs too. So yeah, so there's a lot of challenges today.

(00:20:00)

Dan Shipper

And so when you look at—I mean, you're building MCP servers for people, but when you build them, and just generally when you see people doing it well today, what are the principles or how do you think about making an MCP server that one, people use, which is actually a big one, and then two, when it is used actually does the right job?

Alex Rattray

There have been relatively few times that I've seen it done well. I have seen it done well. I am really hoping that at Stainless we're able to ship something that sort of just solves this for everybody in one fell swoop. Pretty soon we're kicking something up that I'm really excited about. But with today's technology, you really have to do a good job of product management. I mean, you have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they use and operate your software. And think about what could we unlock through AI where people would be doing things that they can't really do with our software today because it just got so much easier.

And then you have to do kind of a lot of engineering work usually to wrap it up in a bow that works for the models. And you have to set up a really good system for evals. And if you're doing MCP, you have to think about the different clients that people might be using. Are they using Cursor? Are they using Claude Code? Are they using something else? And the different models underlying all of that. So you end up with this pretty crazy matrix of things that you might want to optimize for and ways that you might want to evaluate and make sure that what you're offering is working well.

And it's also kind of a black box to get that feedback back to your servers so that you can find out, hey, we gave a tool call response here. We gave an answer of some kind. Was it actually any good? Did the user—was the LLM able to use it? And that's a problem that I think I haven't seen a lot of people solve yet as well. And so thinking about that as a first-class thing, maybe you have a send feedback tool. That's something that we've been thinking about doing just so if a user says out loud in the chat, oh man, that was useless garbage, now the MCP server's going to find out about that.

Dan Shipper

But is there anything specific you've learned about how to do it well, other than obviously you gotta talk to your customers, think about your use cases, but more concrete, more applicable stuff about how to design a good MCP server?

Alex Rattray

You want to keep the number of tools relatively small, relatively low. You want to have the tool name and the description be really precise and specific.

Dan Shipper

Aren't those two things at odds?

Alex Rattray

Yes. Good writing is hard. Yeah, I mean, that's why you can make a great tool of lookup person by name and product description and then refund them. You can make a great tool that does that.

And you also want a small number of properties in the input schema. You want a small number of parameters and you want them concisely described, but sufficiently described. This is also hard. And you want the response data to come back with a very small amount of data—only exactly what the model will need. That's also very hard because you may not know a priori which things the model's really looking for. And you know, we have a technique that we use in our MCP servers today where we give the model a JQ filter, which is a way of filtering out JSON. And that can work pretty well, but that's kind of a special trick.

Dan Shipper

Doesn't this mean that MCP just needs another level of abstraction—a search tool function, search tools, search, find a list of relevant tools given my task?

Alex Rattray

The tool browsing problem is definitely one very serious one, and that is one approach. And so we actually do this at Stainless today, where you can get an MCP server for your API that just has—I was saying earlier—the very simple thing of every endpoint is exposed as a tool. And if you have a small API, that works great. And you can also filter it out. So you expose an MCP server with only a small subset of your endpoints. That works great.

You can also use what we call dynamic mode, where there's three tools, no matter how big your API is. One is list endpoints. The other is get endpoint and learn about it. And then the last one is execute endpoint. And so that enables this context thing to scale really well, but it means there's three turns of the model just to do one thing. And so that gets slower. It's more expensive in another sense. And there's some lossiness—it doesn't perform, it performs pretty well usually, but not quite as well because the tools aren't loaded up in quite the same way.

Dan Shipper

Are you using MCP servers yourself?

Alex Rattray

Yeah, I use MCP to actually, funnily enough, not so much on the coding side, but I use it on the business side. So I'll use the Notion, HubSpot, Gong MCP servers to kind of say, hey—and actually MCP server for our database, a read-only copy of our database—and say, hey, what are the interesting customers that signed up for Stainless last week? And it'll go off and make a great query of our Postgres database, and then it can cross-reference those things in HubSpot and then look up our notes in Notion, maybe even look at transcripts in Gong and tell me all about it. It's incredible.

Dan Shipper

And so that's one of your big use cases? Are you doing that every week or how are you—I'm now interested, not even from an MCP perspective, but for anyone running a business that has some complexity and you're like, I want to know what's going on in the business. What are you actually doing and what is the report that comes out and how often are you doing that? And all that kind of stuff so I can steal it.

Alex Rattray

For me it's still usually in kind of playing around mode. One of the things is the MCP servers disconnect and then I get annoyed, and so you have to just kind of reconnect and whatever. It's not a huge deal. But there are a lot of little paper cuts still in a technology that's new that you're going to expect, that can hold back some amount of your usage.

One of the things that I've found really helpful kind of at the meta level—and I'm sure you've had other guests talk about this—is the practice of just collecting notes for the AI by the AI, and kind of edited and curated by yourself. So you know, I have a—I can't remember if I call it a note—I think I have a notes folder, a research folder or something in a special Git repo that I use just for this sort of internal stuff. And I'm like, hey, when you find interesting customer quotes, put them in this folder and give the full citation. So that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again. It has them kind of cached just on disk in Markdown files.

Dan Shipper

Wait, that's crazy. Wait, so how are you getting—what are you using to write into that Git repo? Is it Claude Code? Are you using ChatGPT? How does it get in there?

Alex Rattray

Yeah, I use Claude Code these days for that kind of thing.

Dan Shipper

And so you just have Claude Code open and running, and then a new customer testimonial comes in and you're just like, hey, can you throw this in my Git master company knowledge repository basically. And then whenever you need anything later, you're like, Claude, go search through my master repository to figure out where the best customer quote is for this?

Alex Rattray

Totally.

Dan Shipper

That's fucking so cool. Can we see it?

Alex Rattray

No, it's too messy and probably has a lot of confidential information. The latter being more important.

Dan Shipper

Wait, when you say it's messy, are you having Claude organize it at all or how is it structured?

Alex Rattray

There's a lot that I want us to do here that we haven't had the chance to do yet. There's some other lower-hanging fruit that I'm working through that our business team is working through right now, just on the basics of your CRM systems and so on. And so it's not well structured now, but I think that's fine. I don't plan to prioritize structuring it super well until we're using it more—I'm using it more broadly—because I use this stuff some of the time. One of the business people on the team uses it a fair amount. I think one or two of our customer support engineers use this stuff a lot. But it's not yet broader than that. And I would like it to get there. And once we see how everything's evolving, I think that's when we'll start bringing in more structure.

But as it is, Claude Code can handle unstructured stuff really well, so you don't have to think about it too hard in advance, in my view. You can move things around later.

(00:30:00)

Dan Shipper

What else do you have in there other than customer quotes?

Alex Rattray

SQL queries. So I'm a software developer. I don't write a lot of code these days, but I spend a lot of time doing that. And so when I say, hey, can you look up—you know, I might be, hey, how is our month-on-month growth of X, Y, Z metric over the last three months? You know, I did this recently, I did this for my last board prep. And it came out with a pretty good answer right away. And I was like, wow, this is awesome. And then I kind of looked a little bit deeper and I was like, oh, I actually want to exclude these users from this analysis and I want to filter it this way and filter it that way. And I kind of imbued more of this business context into that SQL query. And I iterated with Claude Code to get it to be better and better for the specific kind of metric that I was looking for, the specific kind of story that I was trying to tell. And then I got it to a good place and I was like, great. Let's dump this to an analysis folder or an analytics folder for future use.

Dan Shipper

And then next time you're doing your board prep, you can be like, hey, what was that query that we did last time? And it'll presumably go get it. That's really cool. What else?

Alex Rattray

You know, as any software team these days, we're using this also for, hey, a customer comes in with a question—can Claude Code just fix it? And so you'll have in some cases a Linear ticket is filed and then our support engineers are really very technical. And so they may not have the wall clock time to go down and chase down the fix themselves to an incoming bug. They have the technical skill, but guess what? Another customer writes in two minutes later and they want to jump on that. They don't want to be knee-deep in a debugger. And so something that we do sometimes is they'll file the ticket in Linear, and by default maybe they intend to do it later, or some other engineer's going to be doing it later. But hey, can we see if Claude Code can just take a crack at it? Is that going to work out 100 percent of the time? Definitely not. Is that going to work out 50 percent of the time still? No, to be honest with you. But can that improve the overall efficiency? Yeah, maybe. We're still, I would say, experimental there, but we're seeing a lot of promise.

Dan Shipper

You're making me want to have a shared Git—I mean, we have a bunch of Git repositories or whatever, but a shared Claude Code writes Markdown files repository as a shared brain. We do have, which you might want to try, we have Claude Code running on a Mac in the office and it's connected to Discord, so anyone on the team can chat with it on Discord. And that's actually maybe an interesting way to get things in and out of this sort of shared Git brain because the thing I'm thinking about for us is a lot of people on the team are using Claude Code in a terminal, but if you're non-technical, it's still a little bit intimidating, but Discord is pretty easy.

Alex Rattray

No, I love that. Especially if maybe everybody gets their own branch. And you know, if a member of your team is sort of teaching Claude in a sense—or teaching Claude Code—oh, our best practice for this is such and such, or like, that's not our writing style. This is our writing style. Or some important business context here is, you know, that this other company is a partner of ours, and you should know that and you should make a note of that. If in those Discord chats, each one has its own Git branch and at the end, any learnings that Claude had, it can commit, put up a pull request. And non-technical folks might also be a little bit intimidated by a pull request with Markdown files. But there's also those preview versions of Markdown diffs. And either they or technical members of your team might be able to review it and say, yeah, this is a good update to our shared brain.

Dan Shipper

That's really interesting. Well, I know you also in our pre-production call you were talking about you have a big vision for the future of AI. Do you want to talk me through that?

Alex Rattray

Yeah, I would love to. You know, we talked earlier about how agentic AI can make operators' lives a lot easier by taking certain pedestrian tasks and sort of running with it independently. And that's something that I think as an industry we're almost on the cusp of. And if you start stepping back and ask how you get there, and you also start asking about the steps beyond that and beyond that, a big part of the way I see things unfolding from here—I hate to say it—is the future of AI is cyborgs, which is sort of extra ridiculous because what is a cyborg other than already a robot? But you know, cyborg, as I understand it, is a term that means you're sort of part person and then part machine. And in this case, I mean when you go and talk to an agent, what you're going to be getting is part GPT neural net, LLM, part AI and part code where the machine, quote unquote, that I'm talking about is traditional CPU, not GPU software.

To me, I think I expect this to play out in two main ways. One is your kind of one-off operational use cases we were talking about a minute ago. And then the other is production software. And in the use case we were talking about a minute ago where someone needs to kind of perform some tricky one-off action with a bunch of points and clicks, and now we want an AI to just do a bunch of tool calls. The way I actually see that happening and what we're building towards is code execution. So rather than the model having a bajillion tools, the model has two tools. One to execute code where it just kind of has a text box of like, hey, put in some TypeScript and you're going to use this API's TypeScript SDK, and you're just going to write stripe.transactions list or stripe.charges.list. And you're going to do stripe.customers.retrieve and stripe.refunds.create. This is really easy for models. They're really good at writing code. And if you give that tool a little bit of sort of a readme where you say, here's an example request and here's some other resources, some other API calls that you can make, it's really good at extrapolating from patterns if the SDK or any API are well-formed and predictable. And then you give it an additional tool to kind of search the docs and ask questions to the docs. And anything it's not sure about or gets wrong on the first try, you give it the documentation.

And what this does for that scenario that we were talking about earlier is you have very, very limited impact on the context window upfront. I mean, we're talking about a thousand tokens or something, maybe less. And the context impact of doing a whole bunch of paginated list requests? Zero. The model will go look for somebody named Dan and it'll double check that the purchase had striped socks. And you might write three nested for loops, but then only at the end when it found the right thing, it'll console.log found Dan, customer ID, blah blah blah, transaction ID, blah blah blah. And then create refund, refund ID 1, 2, 3. And the context hit coming back from all of this is going to be 10 lines of text—it's really minimal. And all of this will run really, really quickly too, so you don't have a round trip to the model every time you're doing something. This is just CPU code and it runs in a server in the cloud right next to the Stripe API in AWS somewhere probably, and it goes super, super fast.

Dan Shipper

Okay. So what I understand you're saying is the language model has a tool where it can write code and send that code to—this tool that whoever the company is, whether it's Stripe or whatever, whoever's MCP server you're using—they'll go and execute that code and that code is going to interact with their API and then return the results rather than these sort of—you have 50 different possible tool calls and all that stuff. Model writes API code and API provider executes that code, runs it on their API and returns the results. Why wouldn't I just—why wouldn't my model just write the code that I then run myself instead of relying on an API provider to do it?

(00:40:00)

Alex Rattray

I expect that will happen a lot more. I expect that the code execution tool is going to become the most widely used tool. One of the problems that we have today is that the code execution tool doesn't work so well with libraries. LLMs have a hard time working with libraries and knowing exactly what version of the library it's using, using the right version—probably usually the latest version—and not hallucinating aspects of the API and knowing how to iterate if it hallucinates wrong. And if it can't use any library off NPM or the Python package index or anything like that really, really well, basically perfectly out of the box, then okay, well forget about using a library at that point. You just have to hit the raw HTTP API, and at that point, in order to figure out what's in there, you need the whole OpenAPI spec and you're back at square one because that document is massive.

And furthermore, something that's really scary about that is if you don't have a typed library with static typing where the computer can say what you're trying to do is wrong, then the LLM will try to make an API request that is wrong some percentage of the time. The code execution tool can run a type checker and say, oh, you're asking about stripe.transactions.list, but that actually doesn't exist. Stripe doesn't have a transactions API. You might want payment intents. You might want orders, you might want balance transactions. Which one do you want? And if the API provider is doing a great job building this tool, it'll return the documentation for all of these things inline. It might have its own AI look at what the model's trying to do and come up with a suggestion. And that subagent is well-trained, specified, always updating and isn't burdened with the context of the full conversation.

Dan Shipper

What do you think of the security model?

Alex Rattray

The security model is really, really interesting. This is another area where we're really starting to think about things at Stainless and I'm getting really excited about it. So if any listeners are really interested in this and have some ideas or want to talk, please do reach out. At the end of the day, I think the security has to take place at the API layer itself. Right now you see people trying to implement security by sort of limiting what's exposed through MCP, and that kind of makes sense. But at the end of the day, you could do anything that's in the API under the hood, right? And what people should be doing is using OAuth with granular permissions with proper scopes. And at that point, the security happens in the right place, which is at the API layer. There's limitations to OAuth scopes and it's pretty hard to build, so it'd be nice if someone made that easy. But in my view, that direction is sort of the right layer.

Dan Shipper

So going back to my earlier question, I'm thinking about the idea of having a model write code that then the API provider executes to interact with their API and then returns the results. Would you ever consider just creating a tool use tool that developers use? Because for example, I'm thinking about for Cora. We've got all these tools. Maybe Gmail's going to build a code use thing or whatever. But really I just want—I would probably use what you're talking about inside of Cora, but we would need a tool use tool or it's not a tool use tool. It's a computer use tool where—and I know OpenAI has this, but it's not really well built for lots of libraries and stuff. It's not a custom environment. I need a computer use tool where I control the environment and I can install different libraries in it and be able to call it any time to then call any API—or it has to have network access, basically. You guys should build that.

Alex Rattray

We're working on it.

Dan Shipper

Fuck. You're building it for developers who want to access MCP servers or people who are providing MCP servers?

Alex Rattray

We're starting with people who are providing MCP servers, but ultimately I think that we're going to need this to work such that you can give the model a code execution environment where it can hit not only the Stripe integration, but also the Salesforce integration and also anything else. And but not too much anything else. And so one of the advantages of starting where we're starting—of just one API provider—is that you ensure that there's no network connections allowed out of that sandbox where we're running the code to anything other than in this case, api.stripe.com. And that's really, really critical for security for something like this. And so there's ways to expand that bit by bit and keep things secure. It'll take some time.

The other thing I think to point out as you see some of these generalizations is it's not just that you want this code execution sandbox to work really well for any API, for any library, which I think we really do. I think we really need that. You also start to see that this is just a powerful model for AI doing stuff. And sometimes you want—you realize that the thing that the AI did this one time in this one-off case is actually enduringly useful. Maybe anytime a customer writes into support and says, hey, my socks had holes in them, you should automatically get a refund. You know? Maybe you want that, maybe you don't. But there's a lot of stuff that people do one time, and then two times, and then three times, and then they say, okay, we should automate this. Right? And that's what software teams do all day, every day.

I think we're also going to be seeing that with AI where the same code search tool that we're talking about, all the same prompting that will make an AI really, really good at interacting with an API in one of these code sandboxes, kind of almost quote unquote "in its brain" where it can write code in its head, run the code in its head, see the results, and then move forward with your query, with your task. It should be able to say, okay, actually this is enduringly useful code. Let me commit this to the repo.

Dan Shipper

Yeah, it's—you know, chat is a really good interface for exploring, but sometimes you just want a dashboard. You just—I just want to log into my Stripe dashboard and see all the stuff without having to be like, what is my MRR? It should just show up, you know? Because I just do that every day. But I want to push you as a hashtag value-add investor, because I think that there's this thing that happens in AI where often the first attempt at something, people try to be really cautious. And I'm sure that your customers care about you being cautious—big enterprise customers. But the things that get adopted are often the ones that are willing to take the risk to be YOLO very early.

So an example is DALL-E was wholly private for a long time and people were posting some images, but you couldn't get in. And then Stable Diffusion was just like, fuck it, anyone can use this. And then that just really started the whole image generation wave. Obviously Stable Diffusion sort of fumbled the bag, but they had a lead for a little while. Same thing for Claude Code. Honestly, if you look at—Codex is not as much anymore—but if you look at the difference between Codex, Cody, and Claude Code, Claude Code was just like, fuck it, YOLO mode. It's super industrious. It has a sandbox, but you can just do "dangerously skip permissions." And Codex just fell way behind because it was, first it was in the browser, and so the whole thing was locked down. And then it was in the CLI, but it was really built for pair programming. And so it just wasn't particularly industrious. It wouldn't go off and do a bunch of stuff. It didn't—it would get locked out of doing certain things, even if you did full auto mode. And now they've caught up because they're like, you can just let it do whatever you want.

And so I would really push you on—there might be a version that you could do today or tomorrow, or very soon, for individual developers that would let them set up this environment that, for example, I would use immediately. And I care about security, but I care a lot less than some gigantic enterprise company. But I think the people like me who are building at this scale are eventually, hopefully, going to be the big companies, but we're the ones that are really doing the AI-first adoption. Not the big companies.

Alex Rattray

Well, I would love to get this in your hands. What are some of the APIs your team uses the most?

Dan Shipper

I'm thinking—as we have a bunch of different products, but I'm thinking right now about Cora, the email assistant. And it has all of the big APIs that it's using. It's mostly the Gmail API. And so you're interacting with the assistant over chat, and then it has a list of tools that are archive email or draft email, or send email or whatever. There's a whole categorized tool so it categorizes your email in certain ways. And I think we would definitely try out something like this because it would—if it ran the same way it would make it much more flexible for us to make more tools and not break old ones, you know? It's really interesting.

(00:50:00)

Alex Rattray

I mean, in a sense, what I actually predict is that people who are quote unquote "building tools," once we have a code execution kind of super tool I'm talking about, is that the only way you really quote unquote "build a tool" is with instructions, with prompts, and the full power of everything you could possibly do in the API—in the Gmail API, for example. It's all there in one tool. But sometimes you have specific tasks or specific categories of work that you want to describe in a particular way to help the LLM perform a sequence of actions as productively as possible. And at that point, the only work in engineering that you have to do is prompt engineering. We'll see if it's that quote unquote "easy." As we all know, prompt engineering can be really tricky. It's hard. But I think that's part of the vision.

That being said, we do have some pretty nifty ways with the MCP servers that we generate today to help developers mix and match all the parts of the different tools underlying all the different parts of the API as they compose and write their own tools.

Dan Shipper

This is awesome. So for people who are listening and want to know more from you and know more from Stainless, where should they find you?

Alex Rattray

Stainless.com is our website. My name's Alex. And we have a LinkedIn and a Twitter somewhere. But as you tease me often enough, I'm not as good as I should be about posting there. So maybe give a follow and cross your fingers and you'll get some content, or at least visit Stainless.com.

Dan Shipper

Awesome. Alex, great to have you on. I can't wait to do more of this when you have some of these new things launched. This is really, really fun and great to chat.

Alex Rattray

Thanks, Dan. You too.

Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We build AI tools for readers you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.