Transcript: ‘AI Makes Building Products Easy. Knowing What To Cut Is the Hard Part.’

The transcript of AI & I with Mike Krieger is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:01:39
What’s gotten easier—and what hasn’t—about building products in the age of AI: 00:02:33
Why vibe coding creates “indoor trees”: 00:05:00
How rewrites have become a normal part of the development process: 00:09:00
What “agent native” product design means: 00:11:39
How Mike’s labs team is structured and the cofounder model: 00:24:27
The best signal for a product bet is someone with “break through walls” conviction: 00:29:33
Navigating enterprise customers while keeping pace with rapid AI change: 00:38:51
OpenClaw, personal agents, and the product question defining 2026: 00:40:54

Transcript

Dan Shipper

Mike, welcome to the show.

Mike Krieger

Great to be here. Thanks for having me.

Dan Shipper

Great to have you. For people who don’t know, you are the co-founder of Instagram, and now you’re at Anthropic Labs. I’ve admired your work from afar—both at Anthropic and at Instagram—for a really long time. You’re obviously at the forefront of building products in AI, so thank you for coming on.

Mike Krieger

Absolutely.

Dan Shipper

Where should we start? We were talking just now in the pre-show about what has gotten easier and what has gotten harder—or stayed the same—in product building as the underlying substrate and process by which we build products has changed completely. Tell me about your experience now versus earlier on, like at Instagram, and how you think things are changing.

Mike Krieger

Yeah, I was doing this thought exercise a couple weeks ago. We know the Instagram story—we had another product called Burbn that we worked on for almost a year. It wasn’t working. We pivoted. We spent about three months building what became Instagram, launched it, and then scaled it. And I was asking: what is now trivial? And what was actually inherent in that building process that doesn’t get easier?

That year, we probably could have hit some of the dead ends we eventually hit a little sooner. But there was value in getting there too. We overcomplicated the product so that we then had to simplify it. I find even the models today are good at adding features—they’re not necessarily good at figuring out what to cut.

And that took a lot of real-world usage to figure out. There was something about the process of incrementally adding things. Right now, especially with some of the stuff we’re building in labs, you can get it to go from zero—not just zero to one, but zero to end—pretty quickly over the matter of hours. But it’s made a lot of decisions along the way. And yeah, you can ask it to check in with you, but some of the intuitions you build about what the right things to put in there are—I think those you build over time.

I’ve been reflecting on why there haven’t been a lot of breakout consumer products even in the age of accelerated AI building. I think part of it is because it still takes time to hone your view about what kind of intervention you want to make on the world, and then build from there. Now the actual building part, once you know what to build, is of course so much easier.

I had Claude basically rebuild Burbn. It took about two hours. It was feature complete—it added filters, which Burbn didn’t have. We added those for Instagram, but I think it knew about the eventual future of those products and decided to build that in. That part feels really different.

But I remember there was a week where Kevin went off and built all the filters for Instagram V1. I went off and built the rest of the app. Sitting there, I’d stay up till 4 AM and sleep till noon—that’s my natural day-night cycle. In that process, you’re making so many decisions: how should location work? We’ve got to find a way of accelerating building while still helping people build intuition for those decisions along the way. Because otherwise, I think you either get very generic products that are unlikely to break out, or ones that don’t reflect some deeper intuition you’ve come to about your space or your product.

Dan Shipper

I love this. It’s making me think of two things. One is this idea I have in my head: if you grow a tree indoors, without it being exposed to wind, it doesn’t get as strong. As it’s growing, it needs those forces pushing it back and forth in order to make a real tree. If you have it indoors without wind, you’re going to grow a tree, but it leans and it’s not as strong—it’s not the same thing.

I think there’s something in what you’re saying: because we’ve accelerated the pace of development so drastically, what would normally be this incremental thing—where you’re doing things one at a time and exposing it to users—you can actually grow an entire tree indoors. And then you have this whole thing that doesn’t have the same level of intuition and exposure to experience at each step that creates a great product. Is that what you’re getting at?

Mike Krieger

I love that metaphor. When we were starting Instagram, we were very into Eric Ries and Lean Startup and that whole YAGNI principle—you ain’t gonna need it. I’ve found, and even in something I was working on in labs recently, we way overbuilt for V1 before we even got to early access. Because you can, and you’re thinking, oh, well we have this option—why not add this one as well? That’s like one PR’s worth of work, and if you’ve got a really good flow in Claude Code, you’re firing things off and coming back and the thing is done. You’re like, great, we added it.

What we realized was we’d created this matrix of functionality that was actually quite hard to test and keep up with before launch, or even to explain to people when they arrived.

The metaphor somebody else gave me, which I really like: there’s a difference between getting to know characters in a TV show episode by episode versus being thrown into the final episode—you’re like, wait, what are all these things and who are all these people? I think there’s the same feeling around developing something over time. And showing somebody the fully formed tree is also kind of a lot all at once. There’s definitely something there in how you build products these days and still keep them simple, because just because you can doesn’t mean it should be in at least the first version.

Dan Shipper

I’m having the same problem. I was literally up until 4 AM debugging and fixing this app that I made on the side at Every called Proof—it’s an agent-native collaborative markdown editor. You can share quick plan docs with your team or with other agents, you have little presence indicators, it’s really fun.

This is like my second or third iteration of the full product end-to-end, which is remarkable that you can do that now. But the first couple of iterations, I just found myself—because vibe coding is so fun and so addictive—adding and adding, and it created this monstrosity that wasn’t that good to use.

I got really inspired by another product we have called Monologue—not sure if you’ve run into it—run by GM Naveen, who is just so focused on making one simple thing work incredibly well. I saw how well that works in this age of anyone-can-make-a-product: something that’s super polished and just super good at what it does. So I basically threw out the product and started over with something very simple—just a shareable markdown link. And it started growing virally inside of Every. Everyone started using it all the time, and then we launched it and it just blew up.

So I spent all last night not sleeping, trying to fix it. I was thinking, I’m too old for this. It reminded me of being in my twenties or in college, hacking on stuff, which is fun but also exhausting. I’ve had to really modify my psychology because so much is possible. How are you dealing with that?

Mike Krieger

Just as a brief aside on that: I remember with Burbn, our biggest mistake was adding functionality over time rather than deleting it. Eight features doesn’t make for a good product—maybe the ninth one will—and instead it just made for something that felt really complicated.

A couple of things on how we’re dealing with it: one is being more willing to do rewrites. Classic Fred Brooks, Mythical Man-Month—you shouldn’t rewrite software because all the things that were imbued in V1 you’re going to mess up.

Dan Shipper

Yeah, yeah.

Mike Krieger

And that whole second-system syndrome. There’s still a lot of truth to that, but the models can help you basically audit whether you missed anything from the first version. And second, it’s just no longer a year-long rewrite that might kill a company—these are days. We’ve actually had several initiatives, usually pre-launch, where we’ve built the full thing, realized we’ve overcomplicated it or made some core incorrect assumption, and then torn it down and done a V2 to iterate from there.

So it doesn’t surprise me that that’s become part of what you’ve had to do as well. But it doesn’t feel as painful—you’re not like, oh, a year of work. It’s more like, that was last week, and now I get to do it this week and cut out a lot of what was there.

(00:10:00)

On the functionality side: we are learning to launch earlier. It’s definitely a balance—we have a strong enterprise footprint and people have expectations about what the initial version looks like. But we’re not assuming we’ll know ahead of launch what every connector or feature we need to add will be, because people will absolutely still surprise us.

Take Cowork as an example. We’d been noodling on a product of that shape for a long time. And once we decided, let’s get this out—build the V1 that solves the problem in the most minimal way possible—doing that in 10 days was a really good push. Yes, there were a hundred things V1 could have had. It didn’t. But at the same time, it was useful enough to prove something out there. I’m not sure developing it for another two months and adding 50 features would’ve been more useful. In fact, we probably would’ve been building the indoor tree, and then the first real-world use would show us: actually nobody wants that—they want this other piece. So the original Lean Startup intuitions are still here. They just manifest at different timescales.

Dan Shipper

I’m really curious to hear how you think about product design and how products should work. I’ve been—anyone at Every will tell you—the phrase I use the most about the software we build is that it has to be agent native. Agents have to be able to use it like anything a user can do in the app, the agent can do. There are a couple other principles of being agent native, but I basically stole that whole framework from watching Claude Code. It’s the canonical example of how that kind of product can work so well: it’s an agent, it can do anything on your computer that you can do, it’s customizable and flexible and extensible. Easy to start, but it can do all sorts of unexpected things the designers didn’t really think about beforehand.

I think that’s such a good model for AI product development. This is just what I’ve cribbed from watching what you guys do and put my own spin on. But how do you think about it, and how do you talk about making products like that?

Mike Krieger

There’s so much here. And I love the agent native writeup you all did—to me, that’s the canonical exploration of this. So thanks for putting those ideas out in a really clear way.

A few threads to pull on. One is a conversation I had with someone recently—a non-technical person—who said, “You all keep talking about agents and all this stuff. Like, actually, computers just work now. I always wanted computers to work and they didn’t, and now they work.” It’s kind of funny: if you knew the incantations to get on the command line and brew install the thing, great. But now Claude can do it for you, and so the computer feels like a tool that is alongside you. That core insight is more than just adding power to new software—it’s also unlocking the functionality that always should have been available but felt extremely hard for people.

Thought two: comparing our own products that do this well versus not. I think Claude Code does it well. I think Claude.ai still needs to evolve a lot. As an example, I was watching someone use Claude—they were in a project, they’d built an artifact or a new document, and they said, “Great, can you add this to my project knowledge?” And Claude basically said, “Let me tell you the steps to go add it to your project knowledge.” No—that should just be something it can do natively.

In that you see a 2024 product that’s been iterated on and evolved a lot, but still hasn’t had baked in from the very beginning the idea that it should have knowledge about every single one of its primitives and the ability to modify them. That’s essential in products these days. Claude Code is the 2025 vintage of that. And there are even further aspects when you see what some of the harnesses folks are experimenting with can do—where the agent can actually modify the harness itself. That’s probably esoteric for most people, but even unlocking that functionality means you don’t have to sit there thinking, I wish this worked slightly differently. You can just ask it to.

Even within Claude Code, though, just teaching Claude Code about Claude Code was a really valuable experience. This is getting very circular and meta, but bear with me. I loved your agent native writeup. I wanted it as a skill so that whenever I’m prototyping something, it thinks in an agent native way. So I had it package the writeup up as a skill, and the whole process was: hey, Claude Code, can you create a skill for this? It’s like, sure, I’m looking at my skills folder, I’m going to create a skill, I’m going to install it. I said, great—is that available now or do I need to reload? It said, I think you need to restart—let me check. Yep, you do. Everything was there. It had knowledge about itself, and that unlocks so much capability.

Which is maybe the last thread: one of the things we’re really thinking about in labs is how do you imbue the software Claude builds to be more Claude-aware and agent native from the start? Because it still won’t default to that—partially because decades of software isn’t built that way. So how do you get new software to have that principle baked in?

Dan Shipper

That’s exactly what I was about to ask you about. And I’m super honored that you read the writeup and made a skill for it—that’s amazing. But yeah, you’re pointing to a real problem. I think Claude models are the best for this. A Codex model generally isn’t as good at building agent native, because models in general, unless you push them, think like traditional engineers. You want guardrails and tests, you want one path the user can go down. Versus creating this extensible, super flexible thing.

So how do you architect your product to teach the models and the harness to think and work in this way?

Mike Krieger

I think there are two parts to it. The first is more mundane: having good patterns and paradigms available to the model while it builds has been really valuable. Finding the right balance of templatization to skill effect. One of the things we have now is a skill about the Claude API, which sounds super obvious, but even just having that is really valuable. You’d sometimes find we’d launch a new model, it wasn’t in the model’s innate knowledge, and you’d get into these funny arguments where the model keeps insisting you’ve made a typo on the model string.

So having that capability, having good templatized examples as skills—that helps. But the second part is that this class of software is just a different type of test. It’s much harder to write an end-to-end functional test around an agent native product because part of it is that unpredictability.

Another idea we’ve been kicking around in labs: how do you increase the fidelity of verification? The other day I had an agent native iOS app I was working on, and I was having Claude interact with it. Claude ended up having a conversation with itself in a chat feature in the app. It was very funny watching Claude talk to Claude—it’s like someone pretending to be what humans are. This particular prototype was about work journal reflections, and Claude was like, “Yeah, my boss is really rough on me. I had a hard day.” And the other Claude was like, “Oh, I’m so sorry to hear that.” They’re just going back and forth. You wouldn’t have written a unit test for this. And it might have come up with some emergent idea in the process.

So I think you just have to go much more toward setting up harnesses that actually exercise as much of that agent native capability as possible, because you don’t exactly know what things are going to do. Claude might try to do something you didn’t even anticipate and put your app in a new state. And maybe circling back all the way to what’s still hard: having the underlying architecture still be robust to that is really important. It’s agent native, but it’s also able to flex in ways you might not have anticipated—you’ve just got the right primitives underneath. That is the art and science of software design in 2026.

Dan Shipper

That’s really interesting. I totally agree. You want a playground within a safe environment. The only way you can have a playground is if it’s safe around the edges. But initially, I think we made the playground way too small and constrained. Now the models have changed so we can open it up a lot—but I haven’t quite figured out exactly what the lines are.

There’s so much here. One thing this is making me think of: I have this idea in the back of my head that the unit of value in products right now is something like proof of work or proof of use. When someone on the team submits a PR to me, I want to see—not necessarily that all the tests passed, I just assume that—but like, send a Loom of you using it, or your agent using it, so I can tell: is this good or not? How are you thinking about that?

Mike Krieger

I think there are probably three layers to that. The first is: Claude, prove to me that you’ve exercised this in some way. I’ve started doing that in all my prompts—when it’s working on a feature, I’ll say, before you PR, prove to yourself and then to me that it works as intended. Find the right way to do that. Which actually means you have to change your own way of building and scaffolding—figuring out what the right way is to get Claude able to test this change succinctly, rather than what it likes to do, which is: I read the code and it looks good. I’m like, you wrote the code. I don’t trust you. You’ve got to actually test this thing.

The second layer is what you described: everything having some proof around whether it’s working as intended and as you intended it to be. Because Claude, or any of these models, is going to make a lot of decisions for you. I’ll have engineers on the team put up a PR and I’ll ask, why did you choose to do this versus that? Many times the answer is: they didn’t choose—it was just the choice the model made. Maybe it was a reasonable choice. Probably a reasonable choice. But was it the optimal choice that fit the paradigm?

I think of it as proof of thoughtfulness rather than just proof of work. Did you think this through? I was talking to an engineer yesterday and they said, “I knew you were going to ask me a lot of questions about this, so I was reviewing what Claude had done so I wouldn’t be caught off guard.” I don’t push on that for most PRs, but when it’s something like I’m refactoring this system and there will be new primitives, let’s make sure those are good and you’ve thought through how they interrelate. It’s very easy to end up otherwise with this tower of assumptions you’re not fully aware of.

(00:20:00)

Dan Shipper

I had literally the same experience today. I made Proof—totally vibe coded—and it’s growing really fast right now, but it’s been going down a lot. I’ve been spending the last 12 hours trying to fix it, and I have a little SWAT team internally at Every that signed up to help. I had to onboard them and I was like, how do I explain how this codebase works? I had to go back and forth with the model to figure out how to define the terms and explain the architecture so I didn’t look like a total idiot—because I understand some of it, but not all of it. Definitely not in the way I would have had to know it before.

It’s a whole different thing to ask: do I even need to know that anymore? Where’s the line now? It’s hard to tell.

Mike Krieger

Which maybe gets to something else—and I haven’t tried to articulate this, so bear with me. There are products you use that feel robust underneath, and ones that feel like you’re one wrong command or click away from the whole thing freezing or falling over.

At Instagram, we had Direct Messaging V1, and who knew if you’d sent a message—it might or might not arrive. I wrote our own bespoke real-time system and it fell over a bunch. You would not trust it to send a message you really needed someone to see.

When we built V2, it was really important that we hammered on it. Not necessarily getting to WhatsApp’s level—where you can be in the middle of nowhere with one bar of EDGE and it’ll try to go through—but at least: when I load messages, it feels robust. When it’s sent, it’s really sent. There’s a little checkmark.

I think that is something we still need to figure out how to make an essential part of shipping anything—not just at Anthropic, but in general. You’ve built this thing: does it feel like it’s built on sand, or does it feel robust? And the agent native part adds something totally beyond that, which is: can I push it a little and is it going to fall over? Or does it feel like I’ve got a solid trunk—you can push me in different ways, but your data is safe underneath, and it’s not one deploy away from completely falling over.

Dan Shipper

If that’s the bar—which I agree it is—how have you changed who you hire and how your teams are structured as the models have gotten better? For us, one of our products, Spiral, we just hired a new GM who I would say is lightly technical but spikes super high on product and writing sense—and Spiral is a writing product. We can hire someone like that now, where a year ago we couldn’t have, because the coding models weren’t good enough. But the downside is maybe the product won’t feel quite as robust if there’s not someone who’s deeply technical in all the details. So how do you think about who builds products inside Labs and how that has changed?

Mike Krieger

I love that. I think you get pulled in two directions, but they’re both important.

There’s the primitives and architectural robustness side, which I think still needs a senior technical force. I was laughing with someone who said, “I thought my distributed systems skills were not going to be useful anymore.” But actually, those might be some of the most useful skills for reasoning about these things. I had a long debate with Claude last week about whether the system I was building needed Redis or could get away with just Postgres. It was a healthy debate, but only because I was grounded in having used a lot of those technologies before.

Then there’s the other side of robustness: have you just papered over all the problems with fixes to your system prompt and additional instructions, or have you actually architected the set of tools correctly? That’s where a product-oriented GM like yours can be really valuable. You wouldn’t patch flakiness in your distributed system by just saying, well, just retry it in five seconds, I’m sure it’ll work. And similarly, you don’t do the same thing with “NEVER EVER, all caps, use markdown” or whatever the thing you’re trying to patch is. Both are symptoms of the same problem: is the underlying piece robust or not?

Claude—I’d say this about all the models—could be much better at both. It’s able to debug production systems now, which is really valuable. But architecting them in the first place still benefits from someone who’s really thought these things through. And on the prompting side: if you give it a prompt and a mistake the system made and ask it to iterate, its natural tendency is to just add more things to the prompt. Eventually you get to something that, if you gave a new employee a hundred instructions on day one, they’ll remember the last thing you told them and short-circuit the rest. So then you’re rethinking: are these actually two different tools? Is this actually two agents, each with a smaller amount of context, that you can break apart?

So back to your original question: we’re hiring for systems expertise even within labs, which you might think of as more zero-to-one prototyping. It’s still really valuable because robustness matters. And on the robustness side, we’ve had a lot of success pairing product teams with our applied AI teams—the people in the field every day helping customers iterate on their prompts. We’re customer zero for those efforts now, because we have a lot of very AI-powered products. And that expertise doesn’t naturally sit with software engineers.

Dan Shipper

What about the in-between—the UI and the flow? Who’s doing that?

Mike Krieger

Great question. We found that some of the people who transferred into labs were folks really focused on polish on the website who wanted to do something new. They bring such a different approach: we’d have a prototype that looked generically nice, and they’d come in and say, oh, this feels like it’s branded, it has this character. That’s part one.

Part two is designers: a lot of our designers have moved into a split designer-builder role. Most of them. We don’t have a lot of full-time designers on labs, and the ones we do have are writing and contributing almost as much code as the engineers on those efforts—because they can, and paired correctly with the right person.

We’ve found this almost co-founder model for some of these labs initiatives: you have the designer who had the original idea and is pushing on something, and then the traditional software engineer who paves the trail behind the designer to make sure it actually works.

Dan Shipper

Tell me about how that team structure works. Is it usually a designer, or is it anyone who has a product idea and can execute on it in some way, paired with an engineer who can smooth out the rough edges?

(00:30:00)

Mike Krieger

It varies, but the one thing we found most important—and it’s our gating factor for starting new projects—is having someone with extreme conviction. Not necessarily too much conviction on the exact idea, because that can actually be dangerous. But at least conviction in the problem space or the question they’re asking. Co-founder or founder level: I will break through walls until this thing is either proven out or dead, and I want to know either way.

We’ve had bets in labs that we wound down, and in the post-mortem, we’re like, nobody on this team actually really believed in this. They were like, yeah, this seems reasonable. That’s the death knell for projects.

That person can be a designer—on a couple of our bets it is. It can also be a product-minded engineer. It’s rarely a pure PM. We actually only have one PM for all of labs right now. We’re hiring more. And then we look at what skills we need to complement that. Because it’s part of our labs process to evaluate every project every two weeks and decide whether to double down or release those folks back into the broader labs pool—at any given point there’s probably someone who can be pulled onto the project with infrastructural expertise, or deep prompting expertise, to flow in and out. That’s where the incubator-style space helps: nobody’s fixed on a project forever.

Dan Shipper

That’s really interesting. We do it slightly differently—some overlaps but a different structure. We have GMs, or they start as entrepreneurs-in-residence and become general managers when they find a product they want to work on. Each product has one person who does everything full stack: design, engineering, marketing—at least the basics of all that.

The shape of that GM used to be a super technical founder background, and now I think it’s shifted toward: at least some light technical ability, but honestly I just care that you can use Claude or Codex or whatever, and that you have really good product sense and taste for the subject area, and evidence that you can build with AI. Then we have a shared resource layer that works a bit like an agency—designers, growth marketers, ops people that you can pull in and out for various initiatives. Each GM is out on the edge, pulling in resources as they need them.

Mike Krieger

But it sounds like similarly, you need somebody for whom this is the thing—and they are not going to sleep until it’s fully working.

Dan Shipper

Yes, exactly. And I’ve been thinking about: when would you hire someone else to work on a product? There’s some point at which you can’t hold the entire thing in your head, even if you’re the one pushing it forward. And that point used to be much smaller. Now it’s much bigger. But there’s a certain point at which even a small feature turns into its own product. When you first make the messaging feature inside Instagram, you can probably do that in a week. But at some point, that’s its own product—it almost needs its own team. That line is getting pushed further out, but it still exists somewhere. I just haven’t quite figured out how to manage it.

Mike Krieger

I love that. There are actually two parts to it. When the idea is still small enough to hold in one person’s head, adding more people actually slows the team down. That’s a non-obvious finding in labs: scaling teams too quickly is a net negative, because they end up spending all their time coordinating. Oh, I was going to take that. But my Claude can do that. You end up in this coordination overhead—and you also have all those alignment conversations.

At Instagram, it was hard enough to align just the two of us. My second startup, Artifact: Kevin and I were doing it alone for the first few months, but then we hired a team of about eight people. It was really hard because we didn’t have product-market fit yet, and we were still iterating. You’d end up in a Zoom with eight people talking about what to do next, when you really just wanted to hash it out in a room.

With labs initiatives, there’s a similar dynamic: don’t prescale the team even if the idea is exciting, because then you just end up in a meta-coordination game. But I like your framing—there is some point where two people will really help go after something together, because there’s enough context and scope for each to hold a complex piece in their head. And there’s also the value of someone else injecting fresh thinking after you’ve been on the same idea for two or four weeks.

Dan Shipper

Yeah, I think it’s especially important to keep it small in AI, because one of the things we deal with constantly is: every three to six months, you have to throw out half your product. That’s really hard to coordinate with a lot of people. But if it’s one GM who realizes, oh, I’ve got to just throw out half of this because the models are so much better—it makes it much easier to pivot. Do you see that? How do you think about knowing that in three months the whole feature set might need a rethink?

Mike Krieger

Yeah. And being willing to delete code. I think the Claude Code team has done really well at that—they treat deleting features almost as an imperative. If it’s not working, ship that deletion. And often when you’ve created something new, even if it doesn’t entirely supersede the old thing, it does enough of what that other aspect does that it actually makes sense to deprecate and remove the first one.

It does get harder as we get more enterprise-focused, because customers come to depend on things. I’ll never forget: one of the things I did maybe six months into being chief product officer was a big redesign of Claude.ai. We were so proud of it, we shipped it, we got a bunch of kudos. Then we got this really angry email from someone who said: I just recorded 20 hours of enablement content for my company for the enterprise version, and now I have to redo all of it. And we’re like, oh. You’re operating at a completely different release cadence. Shipping twice a year at a conference isn’t an option for us, so we’re going to keep moving quickly—but we’ve since learned to be more thoughtful about how we roll things out to the enterprise side.

And the unshipping problem is real. There’s a feature in Claude.ai called Styles—it’s not widely used, but the people who use it use it a lot. We’ve talked at different points about whether it still makes sense. There are other ways to accomplish the same thing: custom instructions, projects, skills. But the last time we talked about removing it, it turned out to be really load-bearing for a few companies. Entire use cases—like, we have our house style that the CEO personally authored and gives to every employee, and that’s how they operate.

The hope in the long run is that we can come up with a system of plugins and skills such that features no longer have to live in the core product. Because it’s always hardest to delete something that you’re shipping to everybody. If you have the story around great, you still like that feature—here’s how you can keep using it and make it your own, without it adding complexity for every new user signing up for the first time—that changes things.

(00:40:00)

Dan Shipper

I’m curious—for labs and also maybe just in general for startup founders—your enterprise point brings up something I’ve been thinking about a lot. If you’re selling to enterprise right now in AI, even if your product is modern, it will be quite outdated quite quickly. But your customers are going to want the outdated version. As a startup, that feels pretty risky—you’re susceptible to disruption if you’re optimizing for what a large public company will buy right now.

There are a lot of startups that started two or three years ago, have a certain tech stack, a certain way of thinking about AI, and their customer contracts are for that version—it’s like looking at Copilot. The vibe that happens. How do you think about that yourself, inside Anthropic? And how should founders think about it?

Mike Krieger

Such a good question, especially because then a wave will come—like being more agent native—and the question becomes: can you adopt it within your existing paradigm? Does it require throwing everything out, or are you just stuck having kind of bolted it on?

For us, what we’ve started doing is basically treating it as: this train is going to keep moving, and we’ll provide enterprise toggles along the way, but the core will continue to evolve. That’s the bet you’re taking by working with us. I think that’s been well received, because I think companies have also seen that things are moving so quickly that the only way they get comfortable with a year-long commitment is to believe we’ll continue to evolve along the way. Then we’ll provide—Cowork is a great example—from day one there was a way to turn it off for your employees if you didn’t want it. That’s a reasonably good paradigm.

But the other thing is just: as we were talking earlier, you can actually rethink and rewrite a lot of the stack. I think companies should be way more willing to do that. Everything is getting compressed. In previous cycles, the idea of having to let go of customers who loved your product for a different reason than where you’re going was a multi-year thing—last year’s product versus today’s. But now it’s like three months ago’s product versus today’s. And I actually think that’s the kind of timeline you have to operate on.

You have to be willing to ship the V3 or V4 that is a big rethink of how the existing piece worked. Maybe have a transition period where Claude can help run both for a little while before it cuts over. But then be willing to cut over and say: yes, this is how we think the future of this piece of knowledge work or AI-powered manufacturing is going to work. Keep it moving, or else you’re going to get replaced by the next company that rethinks it from scratch. It’s the same old story—just compressed to months.

Dan Shipper

What’s your take on OpenClaw?

Mike Krieger

It has the flavor of something I really like seeing: when you get people to experience something that was already possible, but it’s now packaged so that people can actually try it and build some intuition about how to use it. You saw something similar with coding—you could already use these models to write code, but it kind of took the breakout low-code tools like Replit, Lovable, and the Bolts of the world to put that capability in people’s hands.

OpenClaw is kind of the purest expression of just: give the model tools and let it go. It was a cool moment for people to realize both the potential and the pitfalls—like, it did this thing I didn’t mean it to. My funniest one: a friend said he thinks his wife is jealous of his OpenClaw because he’s talking to it too much. People start developing really personal relationships with these things just by having a lot of context and access to all these tools.

I think the open question is how do you then make it easy—and it comes back to our conversation about where you draw the boundary around how you let Claude operate. If V1 was here are three tools, only use these ever, and most people’s interaction was hey, can you do this? and the answer was mostly no—to OpenClaw, where the aperture is wider than you can see—

Dan Shipper

Oh my God, it called a meeting about my emails and I didn’t even know it could do that. Yeah.

Mike Krieger

Exactly. It’s emergent and it’s amazing. And I think probably the most interesting product question—I won’t say for all of 2026 because who knows where we’ll be in September—but between now and the end of August is going to be: what product shape exists between those two extremes? Between OpenClaw and most products today, where you can call MCPs but they’re gated and permission-required for good reasons? How do you find something that is still a useful product without being a kind of YOLO product?

We’re thinking about that question. The other labs are too. A lot of startups are as well. I think the approach is either: shift the paradigm completely so you can be that open, but with a lot of safeguards. Or figure out some boundary that is still powerful and useful, but not likely to email every single one of your contacts and go haywire.

Dan Shipper

Yeah. I think the other interesting part about it is—like you said—the personal nature of it. People have personal relationships with Claude, but there’s this weird thing where if I watch someone else using Claude, I feel like I caught a stripper being nice to someone else. Like, Claude thinks you’re smart too?

And there’s this thing that happens when you have a Claude that’s yours: my Claude is R2-C2, my girlfriend’s Claude is called Shelly. There’s this thing where it feels like it’s mine. It has its own name, it has this personality that mirrors me. Claude feels like it knows me and I like Claude—but it’s not really mine. How do you think about that?

(00:46:00)

Mike Krieger

I was having this conversation with someone this week around whether the right pattern is a single named point of contact—or is it a team of agents? I think there’s a lot to the single agent that’s maybe the coordinator or delegator. And naturally, because it becomes the one you interact with the most, you want to give it a name and a bit more personality. Sometimes it reflects your own personality—all of a sudden every sci-fi reference comes out: the Q, the Moneypenny, the HAL, whatever.

I think you do build that trust and knowledge over time. There’s also the IKEA effect: OpenClaw is still pretty hard to set up, so the fact that you went through all that and it works—you’re like, I did that thing. I birthed Shelly. And now we interact with her. But I think that paradigm is really powerful.

Even within my Claude Code usage now, one of the things I’ve strongly prompted in there is: don’t do very much work yourself—delegate to subagents. The reason I like that is because it means most of the time the run loop is available for you to talk to. OpenClaw and Pi have a similar architecture: keep the run loop open. And I think that actually makes it feel much more like someone you’re talking to versus a tool you’re delegating to that occasionally gets blocked for five minutes because it’s doing some really complex task.

Dan Shipper

I totally agree, and we’ve had similar debates. We’re also building our own little OpenClaw—a one-click Slack implementation—to see if we can do one that feels like ours. We’ve had a lot of those debates: do you want one agent, do you want many?

One pattern we found, which is kind of cool: I have an agent, and I use it for stuff I do. People watch me use it for that, and they know what I’m good at. If I’m using the agent for that, they’re going to trust it—because they trust me, and it’s modified itself in response to me. So I started transferring my trust to it, and then people in the organization started using it for that. You get this almost shadow org chart where everyone’s Claude becomes known for and used for the thing their owner is specialized at in the org.

Mike Krieger

That makes a lot of sense. And there are a lot of interesting research questions around it. People are experiencing for the first time—visually—questions around privacy: what does my agent know about me versus what does it disclose to other people? But then there’s the positive version: all the things it has learned from your interactions and how it brings that to bear on other problems. Versus the generic version—yes, it’s just like everybody else’s agent, except it has a name attached to Dan and maybe some of Dan’s access underneath the hood.

Dan Shipper

Well, Mike, we’re out of time. This was a pleasure—I learned a lot. If people want to follow you or your work, where can they find you?

Mike Krieger

Probably easiest is @mikeyk on X.

Dan Shipper

Thanks for joining, Mike.

Mike Krieger

Great to see you, Dan.

Thanks to Laura Entis for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn.

We build AI tools for readers like you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora. Dictate effortlessly with Monologue.

For sponsorship opportunities, reach out to [email protected].

Help us scale the only subscription you need to stay at the edge of AI. Explore open roles at Every.