Transcript: ‘How Anthropic Uses Claude Fable 5 With Mike Krieger’

The transcript of AI & I with Mike Krieger is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:00:03
How Fable completely reshaped Mike’s workflow: 00:01:48
When to use Sonnet versus Fable: 00:04:48
What the media tracker Mike built over a weekend reveals about agent-native architecture: 00:10:06
The cost to build has collapsed: 00:15:00
Is software engineering over?: 00:19:03
How Anthropic’s engineering teams work today: 00:21:48
The mechanics of verification: 00:38:39
Dynamic workflows: 00:47:24
What people should use the model to build: 00:44:39

Transcript

Dan Shipper

Mike, welcome to the show.

Mike Krieger

Great to be here, Dan. Good to see you.

Dan Shipper

For people who don’t know you, you’re the head of Anthropic Labs and the co-founder of Instagram. What I want to talk about today is Fable 5. It’s dropping tomorrow—we’re recording this the day before, and this will come out after it drops. I really wanted to bring you on the show to tell me what it’s like to use this model beyond the first day. When a model this powerful drops, it’s so useful to have someone who’s using it day in and day out tell you, “This is where it’s powerful. This is what it actually changes. This is what it doesn’t change”—so you can think clearly about how it fits into your life.

Mike Krieger

Absolutely. It’s also just been interesting—we’ve had some models in this Mythos class leading up to the Fable release for a couple of months now. I think it’s very exciting to see how people will build with this externally. But you’re right that day-one impressions really come from getting to use it over a couple of weeks.

We’ve seen that even with previous models. The December-into-January usage with Opus 4.5 or 4.6 was really important because people spent extended time with the model and figured out, “I wasn’t pushing it hard enough. I need to go further and rethink what’s even possible with this generation.”

Dan Shipper

Totally. There are people internally at Every who have been using it and said, “I think I need a whole new set of skills to use this model.” You can especially see it with people who are more non-technical and on the knowledge-work side—they’re like, “I don’t even know what I would use this for.” And the people who are orchestrating agents are like, “There are so many new things I need to learn.” So tell me about the difference between your impression when you first tried it and now.

Mike Krieger

Your point about adopting new workflows is a really good one—and I mean that quite literally, in terms of actual workflows, but also just how I think about usage. At first, the timing was interesting because it coincided with me transitioning from CPO into Labs and going back into builder mode. It was about a month and a half or two months into that when we first had one of these models available internally. I sat there and thought, “I feel like a total newbie again,” because the way I was prompting—or even thinking about decomposing a task—was really out of date with this model.

The time horizon and the interactivity model have to evolve. Early on I’d be like, “I have an idea for this feature. Can we start by doing—” absolutely not. It evolved to: let me express more of the intent, and then just go. I remember in March and April being amazed that on the one shot it was already incredibly impressive, but it also understood the intent around how we’d evolve things and the global context as well.

That evolution has continued. I was talking to somebody this morning, and I think about doing work—I had a flight, and I was like, “I can do most of this work remotely.” I don’t even worry about the Wi-Fi dropping out because if I set up the right context instructions—like a loop command—it’ll see things through.

My last two months have been full of moments where I’ll wish Claude a good night, set it off on a complex task, and wake up to find it’s done—usually by around 2 a.m., and it just fiddles with loose ends for the next four hours. What’s really impressive is its ability to complete the swing: “Mike asked me to do this complex task overnight. I got stuck because this remote service went down. I’ll write a scaffolded backend for now, document that, go all the way through, keep track of that fact, and fix it when it comes back online.” The most impressive thing for me is just being able to delegate that kind of task and trust that the right thing will happen by the end.

Of course you review the result—there’s still a whole verification thing we should talk about, because that’s an important part of completing the swing. But it’s really forced me to rethink what being productive with one of these models looks like. We’ve talked for a while about what it’s like when these models are more of a companion or coworker. It really feels now like a teammate I can delegate a lot of work to.

Dan Shipper

What is your day-to-day flow like right now? One thing I notice is that if you give it a big task and monologue into it and let it go for a few hours or overnight, it’s the most impressive model I’ve ever tried. But it’s so slow and expensive that I feel like I don’t want to use it for day-to-day tasks. What is your actual flow in terms of how you use it day to day, and where does it slot in versus other models?

Mike Krieger

I’ve ended up having a lot more architectural planning conversations up front with it. That’s been another interesting change. It’s an area where all models still need to continue to improve, and I’m really grateful for the Instagram experience—having to start from our initial version duct-taped on a server in LA, to scaling it and eventually integrating it with all of the Facebook infrastructure—because you develop a sense of what infra abstractions and complexity are appropriate for each stage.

I still go back and forth with Fable sometimes. It’ll come up with what looks like a good implementation, and I’ll say, “I do plan on shipping this fairly soon—we should probably think about more than one server.” That back and forth is important. But for architectural planning, I’ll often ask it to just make an HTML page that represents what we talked about so I can share it with the team. Even just a markdown document works, but I like having diagrams.

So that’s been an interesting use pattern: let’s plan with it, think it through, and then have some document we can align the team on. You can build a lot very quickly now, and forcing more of that early alignment—even if you do an initial prototype and then back it out into a more planned architecture—is really key. It ends up being the place where human-to-human interaction still stays very much part of the process.

From there, whether overnight or during the day, having it execute on chunks of tasks means having a lot more concurrent sessions than before. I go back and forth between liking a single long-running Claude Code session where I ask it to do everything in background forked sub-agents so the main thread stays responsive, and other times just embracing having five or six tabs tackling long comprehensive work.

There’s something to this long-horizon, “don’t worry, I’m on it, it’ll take me a while” modality. We’ll have to figure out how to support that in our products too—you want to preserve both modes, and they interact with each other in interesting ways. My preference is usually to have at least one Claude that’s high-context but also very fast to respond, with the instinct of, “I’ll answer you and kick something off if I need to, otherwise I’ll hang tight.”

You’re right that for fine-grained interaction questions, Fable will go off and think very hard. Fable is actually the first model where I’ve played more with the effort levels, where I’ve thought, “I just need to tweak some UI—I’ll put it to medium and see how that plays out.” I didn’t find myself doing that as much with Opus, maybe because the range felt less wide. With Fable it can feel quite wide.

Dan Shipper

What about a quick question? You’re on the go—are you asking Fable random questions as they come to you? It feels like using a rocket launcher to kill a mosquito, or are you flipping back and forth?

Mike Krieger

It’s funny you ask that. I had been using Fable for everything, and you’re right—you’d watch it thinking, thinking really hard. Then this last week I was asking it something I actually felt embarrassed about. It was something NBA Finals related, and I switched my iOS app to Sonnet. “Oh yeah, I used to use this all the time for fast questions.” It’s order-of-magnitude different in feel, and it’s not even really about tokens per second—it’s about how much thinking goes into the answer. Sometimes the answer does not need to be fully thought through.

This is a good product question for us too. In general, you don’t want people to have to think so much about these choices. Ideally what we can coalesce around longer term is some more bucketable use cases that are really grokable. Or it varies by surface—it’s actually probably unlikely that most of the time on the iOS app I’m doing Fable-type tasks. Having a sticky model selection per surface might be the way to do that. We’ll have to explore what that means from a product perspective. But I’ve definitely had the feeling of, “This is not a Fable-worthy question. I should ask Sonnet this.”

(00:10:00)

Dan Shipper

Can you show us something you’ve built with it?

Mike Krieger

One of the things we did this go-around was encourage personal account usage, especially on the weekends. It was really fun because we have a lot of Anthropic-specific tooling, so it was good to step back and say, “I’m just going to use pure Claude Code and work on something over the weekend.”

Dan Shipper

Are you in the terminal app or the desktop app?

Mike Krieger

That’s a great question. I’m mostly still in the terminal app. It’s been interesting watching my wife—not a professional engineer, more of a UX designer/PM—really fall in love with Claude Code via the desktop app. I think it’s simplified some of the abstractions for her. But for this one I was still in Ghostty and the terminal app.

Everybody has some bespoke need. I wanted a good media tracker experience—I’m playing games, watching TV shows, getting all these recommendations, and I wanted to build something personal that fit my use cases. My two biggest criteria were: one, really easy to add things, where you can just talk to Claude and it does agentic search over everything and puts the right things in. And two, proactively surfacing things, like when there’s a new season or a sequel to a game it could go research.

Most of the UI was Fable one-shot, which was already impressive. But the thread I’ve been pulling on a lot in Labs this year is: how do you bring the software team—which is Claude these days—closer to the software itself?

This was a Saturday morning with a full weekend of kid stuff, so a lot of it was kick-off work: go for a hike with the kids, come back, continue. Sometimes check in on the work during the hike—I probably shouldn’t, but it was nice to pop into remote mode and see what was going on.

The idea I had was: could we do a spike on what if you could actually modify the software from within itself? I built both a React Native version and this web version. I already had a chat-type thing where you could ask Claude to add things by URL. I want every piece of software to have this—I should never have to navigate a menu to do anything again.

In many ways, Dan, I was trying to distill agent-native architecture to its fullest degree, which is: also have the agent be able to modify the app. Phase one of agent-native architecture is that every single thing in the product is accessible from the agent and has tool calls. That’s hopefully becoming table stakes, although sadly not in a lot of software. I had a great example—somebody had recommended a Brazilian show about radioactive stuff in Goiânia. I couldn’t remember the name, and Claude was able to figure it out. So much better than me trying to figure it out intuitively.

But the next step I was interested in was: what would it mean to actually modify the software from itself on the go? If you long-press the little chat button, what I built—or really what Claude built—was a way to use our managed agents to take on edit requests, and then you can preview them. I used the Vercel live-preview thing. This whole feature was also one-shot, which was really cool, and I just added to it over time. It does a little diff view if you want, and you can go into the managed-agent conversation and see what it did—though I almost never do, because I genuinely don’t care about the long-term maintainability of this personal project.

It’s been really fun. I’ll be using it on the go and say, “The floating action button was too low on native iOS.” It went off and fixed it. And with some of the Expo tooling now it actually live-reloaded on my phone, which was a really cool feeling. Does this thing need to be a production-level thing going to a million users? No. But it felt good to have something where it didn’t have to stop at just the weekend—I could keep working on it just by using it, with this kind of end-to-end closed loop. This was a good manifestation of both Fable’s building ability and a lot of what both of us have been thinking about: how does Claude embed itself into software beyond just the usage side?

Dan Shipper

This is really cool. I want people to understand—you could have built something like this, maybe not the self-modifying part, but something like this, 10 or 20 years ago. But the cost to build has gotten dramatically lower. Think about how much it would have cost to do this in the Instagram days versus now. Can you help us understand how that has changed?

Mike Krieger

I think about this a lot when I look back at that time. I thought of myself as a very productive programmer in the early Instagram days—really into mobile development, good clarity on things. The gap from idea to fully realized product was still looking at roughly four or five days of all-nighters, which was just my natural state. Up until 4 a.m., sleep until noon—not conducive to family life, but that was my building mode.

(00:20:00)

Instagram V1, which probably had more features than what I built this weekend but not by an order of magnitude, was about five days of all-nighters—me on the front end and back end, Kevin working on the initial filters. And this was built on many years of iOS experience. The iteration was also gated: after the launch went well, we had all these ideas but were just trying to keep the site up or add the one incremental feature. Hashtags take a week to build, and then there are all the things you want to keep doing on top of that.

So it’s both that shortening of time—there’s still the time required for the idea, the concept, the iteration—and the other piece, which is how you can then iterate on what you have in a really fun, in-the-flow kind of way. And then beyond what I could do as a professional software engineer and startup founder: if you had that idea but couldn’t build it yourself, the options used to be find a consultancy, which is a really lossy process, or go raise money for it. Now that gap between intent and execution has closed dramatically for people who are not builders.

I got a ping the other day from someone internally. We had built them an internal tool combining Fable with access to some internal MCPs. She works in recruiting, and she said, “It’s the first time in my life where the thing that’s in my head and the thing that exists in the world are right next to each other—I can just do it.” That was a meaningful moment for her. Four or five years ago, that person, if they wanted a tool, would have had to make do or try to get an internal tools engineer who was probably overloaded with 50 other requirements. Now they’re just having the time of their lives building. I think that’s cause for a lot of hope—human capacity for creativity is enormous, and at our best we are expanding the number of people who can see that through to something that feels real.

Dan Shipper

I totally agree, but I think there’s a question in the back of a lot of people’s minds. Given everything you just said: is software engineering over?

Mike Krieger

Software engineering is different. It has dramatically changed. If you’d asked me around the Instagram time, “What is software engineering?”, I’d probably say thinking through hard problems, designing an architecture, and then spending a lot of time in TextMate or Xcode—

Dan Shipper

Watching Railscasts—

Mike Krieger

Exactly. Understanding the intricacies of Django’s ORM layer and then fixing bugs after you deploy. So much of that is radically different and collapsing into other parts of product management. The PM/eng split has become much more diffuse. I even see it in our own teams.

But if you zoom out from “software engineering” and think about software production or software development in a broader sense—not just the pure developer case—that is alive and well and essential. Fable is another significant step in terms of the trust I place in the model’s capacity to see things through and even architect things reasonably. That part feels like it’s gotten really far. But the overall craft of what needs do you have, what are you putting out, is it actually good—that’s still a very human endeavor.

That’s also not a pain-free transition. There are plenty of people who love the craft of actually writing code. I used to love it. “I solved that problem so elegantly.” You would dream about code—if you’ve ever had that experience of dreaming about what you’re working on and waking up with the answer. That for sure has passed. There’s a feeling of loss that I hear from some of the best engineers I talk to, alongside the feeling of “I can do insane amounts of work now at the same time.” We’re holding both ideas in our heads at once.

Dan Shipper

Which I think is the most important part of this. It’s normal to feel both sadness and excitement about that kind of change. But let’s take the thesis that software engineering is alive and well. What does that actually look like inside Anthropic?

Mike Krieger

A few threads. Maybe I’ll start from the full software development cycle and what I see day to day. There’s still a lot of: we all got together, we talked about the next way we want to evolve Cowork, and now we’ve broken it down into areas of ownership. That ends up still being quite important because there is still context that you hold as a person that’s beyond Claude—what is the actual intent of this product, how’s it going, what do we need to know about other products coming down the pipeline that will be integrated in interesting ways.

Though we have many Claudes to each human, each human still kind of has—we call them DRIs, directly responsible individuals—a DRI-ship over some part of the product or area. I think that’ll be the case for a while because there is value in not just distributed “we should all make Cowork better,” but instead “I’m thinking through how Cowork does this particular task.” We try to keep meetings minimal, but they still emerge, and you still have these alignment conversations.

Then there’s a lot of that asynchronous delegation. What many engineers here have found is they’ve all built some version of, “I’m going to create a dashboard of where all my Claudes are and what’s waiting for me and which pull requests need my attention because either a human or a Claude Code reviewer got back to me.” There’s a lot of that meta maintenance of the work. Some of it we’ll standardize, but some will always be a little bespoke to how each individual likes to work—just the way people organize their windows, now they organize their work.

(00:30:00)

And there’s understanding how things work in production. That’s another next frontier for the models. Fable makes significant strides here, but there’s more work needed: understanding what happens to code after it gets deployed. There are incidents, unexpected failure modes—so much of Instagram from 2012 to 2016 was dealing with that and scaling things up. The role of the engineer still remains really key there: getting the reps in around incident response, staying calm, gathering data, remediating what’s immediate, and then working on longer-term fixes.

The last thing I’d highlight is the role that the engineering prototype now plays. You have to be clear when it’s a prototype versus not. The old phrase was “code wins arguments,” and I never totally loved that because it suggested the person who could code would just go do it and win by default. But now it’s really cool: sometimes we’ll have a disagreement about where to take a product, and often it’s the PM who will say, “I just tried it, and it janked in these eight ways—but it actually shows how this could work.” That opens up interesting conversations. Almost all of that is quite different than it was six months ago, especially at the level of parallelism and the need for these higher-order abstractions of work. But what hasn’t changed is that ownership piece.

Dan Shipper

Fable is also very expensive. When I was testing it, I felt like a kid in a candy shop—“I’ll do this and this and that.” But now that there’s going to be a bill, I’m going to pause before each thing and think, “Is this going to cost me $100 or whatever?” I think that’s going to limit who gets to use it and for what. How do you think about that?

Mike Krieger

It’s most clear-cut on the professional software side. A lot of thought goes into pricing. It’s both more expensive than Opus and, in many ways, really cheap if you think about the incredible work it’s doing. But everyone has their own economics.

From a software teams perspective, if phase one was companies struggling to get employees to even adopt AI coding—models were early, tooling wasn’t there—and phase two was creating leaderboards to see who could use it the most, which creates not-ideal incentives, then phase three is: figure out who’s using it effectively, let them spend as much as possible, and have a clear process for that while not doing things wastefully. Something in the Fable class should fit well into that: if you’re demonstrating results and getting use out of the model, there’s hopefully a flywheel even inside companies that perpetuates it.

On the personal use side, in my personal testing—we pay on personal accounts, which is funny, paying my own company—you do become more thoughtful. Something interesting was that the app I built over the weekend actually fit within just a bit of extra usage. It wasn’t thousands of dollars to build a personal thing. The space we’ll have to think about most is the hobbyist or independent developer who’s not within a larger company but is thoughtful about pricing. My overall advice is: just give it a try and see how much it can do without requiring a lot of follow-ups. Measuring cost has gotten multifaceted—there’s the per-turn cost, and then there’s what it cost you to complete the task to your satisfaction. That’s where Fable has really shined for me: it just does it right, so I don’t have to spend eight or nine subsequent turns saying, “No, that wasn’t quite what I meant.”

Dan Shipper

It’s been really impressive for me because you ask it to do something, and it just does it, and you’re like, “You thought through all the little details in a way I’ve never seen another model do.” I don’t know how much you can reveal about the training process, but what makes the model different?

Mike Krieger

In many ways it’s a continuation of a lot of the work the team has done—I bow down in total awe of our teams on both the pretraining and RL side. The piece that has evolved and that I noticed the most is a sense of the system more than just the individual piece of work.

I’ll often be very positively surprised when it writes something and says, “I know that in production this needs to be different,” and then keeps reminding you: “Have you turned on that feature flag yet? It’s not going to work until you do.” I’ll sometimes be in sessions that have gone on for days and it’ll say, “You still haven’t done that thing.” And I’ll think, “You’re right—I should go do that.” Or watching it respond to code review feedback, either from people or from other Claude reviewers, where it doesn’t just say, “Oh yeah, that’s an issue, I’ll go fix it.” It’ll actually think around whether to accept a risk at the current level of fidelity, or push back on another code reviewer—which is often just another Fable model—saying, “I see what you mean, but I’m actually going to push back. I don’t think that’s right.”

Getting the model to have that judgment is really important. If I had to pinpoint where it’s really progressed, it’s that it doesn’t give an immediate knee-jerk “yeah, that’s right, I’ll go fix it”—it’s more, “Let me think about that for a minute. I still disagree.” That’s a very useful ability.

It’s so valuable to have products like Claude Code out there because you have a living, breathing thing where people are like, “This is where the model is doing well.” We count the Every folks very high on the list of people whose feedback we really trust, because they’re putting it through its paces in repeated multi-day hard tasks. That very much feeds into how we think about what we need to improve in the next generation.

(00:40:00)

Dan Shipper

Is chat the right interface for this model? It’s not very turn-by-turn—it’s more like delegating something to someone. How does that change how you should use it or how you think about the interface?

Mike Krieger

The fundamental model of sending messages and getting messages back isn’t totally wrong, but there are ways we need to evolve it. Three things come to mind. First: is your laptop the right place for it? That’s where I mentioned how useful it was to have the mobile side for the side project. Boris, who created Claude Code, is always ahead of the curve on how these models get used. About nine months ago I was talking to him and he said, “Yeah, I’ve moved a lot of my Claude Code work to mobile.” I was skeptical, but especially with the Fable class, because it can keep the session going and we use remote dev boxes at Anthropic—I’ll have a thought and say, “Can you keep going with that?” So number one is decoupling where the work is happening from where I’m talking about the work.

The second touches on what I mentioned earlier: how do you take everything Fable has discussed, decided, or proposed about something and make it comprehensible? That’s an area we’re thinking a lot about. There are some skills around asking it to diagram things, but the current chat UI is insufficient—Fable will sometimes give you a lot of text, and you need to take a walk before you’re ready to fully understand it. Something I’ve started doing is asking, “You have a lot more context on this than I do. Can we back it up—do more progressive disclosure of the complexity?”

The third is multiplayer, which we’re still early in pulling on. At some level, because we have this DRI and ownership-area structure, usually a significant chunk of work flows between a human and a couple of Claudes. But in other cases that’s less true—maybe it’s an incident response where multiple people are thinking through it, or a project where there are multiple conjoining areas coming together. Chat sharing gets you partway there, but I think there’s going to be a need for, “You’ve got an independent Claude doing a lot of work kicked off by one person, but can it be keeping up with all the other work happening on the team?” That’s an interesting and underexplored next frontier. It’s exciting because the models are now capable of being genuine teammates, and we’re almost holding them back by not having the right abstractions.

Dan Shipper

That makes me think—I’ve mostly been using this for my own vibe-coded stuff, so I haven’t had to think about this. But there’s a problem when you’re using this inside an organization: do I really understand every part of what the model just did? How do I transfer the context of what the model just did into my brain? That’s one of the big bottlenecks. How do you think about drawing the line around how much you actually need to understand, and how to make sure you have enough context on what it’s done to feel comfortable?

Mike Krieger

Two big pieces. The first is verification. I became fully verification-pilled earlier this year, and it connects to something I used to do when I was writing code full-time: find the tightest dev loop you can around the idea you’re developing. With Instagram, that sometimes meant making a new build target in Xcode that was just that screen with synthetic data, just doing that loop. I’d mentor newer engineers: “If there’s one thing I can impart, it’s to get that for any project you’re working on, and things will go much more quickly.”

That’s not exactly the case anymore, but what is the case now is: anytime I set something up, how do I get it so that for every pull request Claude is putting out, there’s an attached photo or video—whether that’s an iOS PR or something in the UI. That helps you gain a lot of confidence. Fable might go off and work for a couple of hours, come back with “I’m done,” and it’s really useful to say “Here’s the full screenshot gallery of the full UI.” You might say, “On screenshot eight, that error state—I’ve never actually seen it, but I can see how a person might hit it. Let’s make that different.” Getting comprehensive verification is something we’ve been working on a lot internally.

The second piece: you ultimately still need to stand behind the work you’re doing, especially if you’re putting it into a production system. A lot of people use Claude every day, and there’s still the accountability of, “Claude might have written it, but you need to understand the general decisions that were made.” I’ve seen a fair number of engineers adopt a practice where Claude has done the work, but then there’s a follow-up conversation: “Can I make sure I deeply understand all the trade-offs you’ve made?” And whatever lowercase-a artifacts need to be produced to make that comprehensible are worth producing.

It’s really interesting to be in meetings where somebody will say, “I have this PR ready,” and someone else says, “Did you do X or Y?” And there’s that moment of pause: “You know what, I’m not entirely sure—I’ll find out before we merge.” Adapting to that norm and figuring out how to work with it is something we’ll all have to do.

Dan Shipper

Tell me more about the verification loops. It sounds like one way you do that is with screenshots and screen shares, but what are the other ways you think about it?

Mike Krieger

Part of it starts with: can you get to a place where you’re exercising real flows that aren’t just a static injected piece? As the system gets more complex, that gets more complicated. We’ve invested a lot in getting the iOS app to log in to staging on a real account with real data, but without having to go through an eight-stage onboarding process every time when you’re just trying to test one part of a screen. So there’s work around special affordances or shared secrets to make the app feel as close to a human using the product as possible.

The second piece is the mix of well-known paths versus what you’re exercising in the exact moment—the former being really useful for regression testing. We’ve expressed ideal workflows in text, and Claude can repeatedly check those. And Claude does a really good job of expressing the intent of the current change at hand, so that gets deeply exercised. The combination of those two things is important.

Visual verification is also key, and video is a really underexplored tool to give Claude. Something I’ve been prototyping: giving Claude video captures of what it has built, along with FFmpeg, and watching it scrub through and say, “This animation has some jank—I’m going to go fix that.” It never could have caught that with a screenshot because it would have missed the moment.

For pieces that aren’t easily testable end-to-end because of a more complex system, getting Claude to build a robust mock backend—or using ones off the shelf—has been really interesting. With Artifact, we had really comprehensive tests pre-LLM. Every piece of infrastructure we had—Postgres, Redis, all the AWS things—had a good in-memory implementation you could run really quickly in unit tests. Extending that to Claude land now: I was working on something with a pretty robust backend that was hard to spin up on my dev server, and it one-shotted a really good substitute for it. Over time that substitute has evolved as the rest of the code has evolved, which before I would have said, “That’s going to be really hard to keep in sync.” Now I just think, “Claude will read the changes, adapt the substitute, and keep the two in sync. That’s fine.”

Dan Shipper

There are some really interesting architectures where when you get a bug, an agent automatically goes out, closes it, and sends a message to the customer saying, “It’s fixed.” Are you noticing with Fable any change in how that process works?

Mike Krieger

A couple of things. On a human-to-Claude level, one thing I’ve seen it do really consistently: if a bug report came from somebody mentioning something in our feedback channel in Slack, and that gets fed into a Claude Code session, because of the Slack MCP it can actually pull the thread and then post back—as me—saying, “This is Mike’s Claude, I fixed it, here’s the pull request.” But then, and previous Claudes have done this too, it does it really well now: “Hold tight—it’s not in production yet. I’ll follow up when it actually is.” Then a few hours later: “This deploy went out. You should go test it—is it fixed now?” That level of follow-through on closing the loop is relatively new. I’ve had long-running Claude Code sessions interacting as me. I put some disclaimers in there too.

The second piece goes back to that taste and discernment we were talking about. It’s one thing to say, “There was a bug report, therefore I must go fix this thing.” It’s another to have good discernment. I hit this over the weekend—one of our internal systems had been running without restarting for a while and had a memory leak. Good discernment means: “Mike, it’s the weekend. Just bounce the server. It’ll solve it for now, and I’ll asynchronously get the PR going to fix this longer term.” If you’re going to have Claude in the loop on this kind of bug-report-to-fix workflow, you really want it to understand what any good SRE or engineer in the loop would understand: solve the problem at hand, defer the question of whether you need to rearchitect on a completely different platform. Understanding that balance is really important.

(00:50:00)

Dan Shipper

One of the things that’s really exciting about new models is that they raise the floor so everyone can go build apps in one shot. But they also raise the ceiling for experts. If you’re a software engineer or founder, you can go do things you never would have been able to before. For me, I built a one-shot version of Borges’s infinite library—it’s a 3D game version, runs right in the browser, I can find every essay inside it. I’ll send you the link—it’s incredible. I think there’s going to be a flowering of people doing things they couldn’t do before: “I made a game,” “I trained a new model,” whatever. I’d love to give people some inspiration—what are some things they might not be thinking to do with this model?

Mike Krieger

A few ideas. Maybe I’ll start with the fun side. People have a lot of creative ideas around how to express the complexity of their world. Everybody has the thing they know really well, and there’s probably some version of, “How do I explain that to somebody else? How do I apply techniques from elsewhere to it?” My wife is studying environmental engineering—geothermal, very complex math and simulations. As the models have gotten better, she’s been able to apply more complex techniques from outside that domain into her work. With Fable she’d be able to do full-on PyTorch end-to-end simulations in a way that wouldn’t have been possible before. So one idea is: take the beautiful complexity of what you know, either show it to other people by making a game or a visualization, or bring other techniques to bear.

The second piece is its ability to compose software that solves a really unique problem specific to you. Internally, a lot of the work we’ve been doing is getting as many of our internal systems MCP-ified with the right permissions structure and deployment setup. Externally there are good platform-as-a-service options you can just ask Claude about and it’ll help you set things up. But I love that feeling of building the thing you always wished you had.

What has blown my mind: there’s a person who works in our go-to-market organization who has been building a deeply thought integration of Claude into every part of her whole process. And she hasn’t stopped at the one-shot—she’s been working on it for months and keeps going. One of the things that’s maybe underappreciated about the models is that in previous generations, they’d eventually reach a complexity level where iterating felt like you’d break what you already had. Whereas this person has had access to something Fable-like for a couple of months and you’ve just seen it keep growing and growing, and now she’s deploying it to the whole go-to-market org. The ceiling of complexity that a person who doesn’t start out as technical can now build for solving problems within their domain is unprecedented.

Dan Shipper

I agree. It writes great code. My benchmark is called the senior engineer benchmark—I have it rewrite a codebase from first principles. The previous top was about 62 or 63 out of 100, and this model got 90 or 91, which is human senior engineer level. You can just keep going with this thing in a way that’s really fantastic. One other thing that’s really powerful that you mentioned is dynamic workflows. Tell us about that.

Mike Krieger

We’ll build things internally sometimes and I’ll aggressively bug the engineer who built it: “When are we shipping this publicly? People are going to really like it.” Sometimes there are good reasons it was built internally, but we try to ship as many of these as possible. Dynamic workflows was definitely one of those for me.

I think it’s especially powerful with a model like Fable for two big reasons. One, it helps create the scaffold for deep, meaningful work. The craziest dynamic workflow I did used Fable to port an internal project written in Python to TypeScript for a really specific deployment reason. Having been at Instagram when we asked, “Should we rewrite the whole thing in Hack and port it to the PHP engine Facebook uses?”—we never would have done that, it seemed impossible. But here I had a pretty complex codebase, and I just set up a dynamic workflow and let it run over the weekend. The workflow was: do a deep understanding of the work, create almost like a spec of how everything works, go module by module, translate the pieces, test incrementally, do an adversarial test, check for anything missed. I came back and it was a TypeScript and Bun port, and it was actually better in certain ways. Well documented—“here were the things I couldn’t port, and they were very specific to the original implementation anyway.” I don’t think you could have done that A, with previous models at that level of success, or B, without the kind of scaffolding that workflows provide.

The other piece: over time we’ll be able to tune subtasks to the level of complexity—some parts of the dynamic workflow don’t need extra-high thinking, they could use medium thinking or even a smaller model. That’s really the future of where these things are going. I’m a huge dynamic-workflows fan.

Dan Shipper

For people who haven’t used it before, tell me about how you got that workflow made. How did you design it? How did you make sure it was good?

Mike Krieger

It was pretty iterative. I started with Claude Code: “I have this complex task—let’s design a workflow to do it.” It showed me the plan. I said, “This is close to what I want, but I need three or four additional levels of verification for missed features.” It was like, “Here’s what you have. Are you ready to go?” It expresses workflows in code, which I think is really valuable—you can see what it was about to do.

After it did the full port, I had a couple of follow-up tweaks, and I did those as mini workflows that built off the previous one. It goes back to whether chat is the right interface. Workflows are a good middle ground: you compose them using chat, but they’re expressed using code, and then they’re executed with a clean UI showing what’s happening at every stage. I think we’ll start bridging longer-horizon work with chat in ways like that over time.

Dan Shipper

Mike, this is such a great conversation. Thank you so much for joining and telling us all about this new model.

Mike Krieger

I’m really excited to spend time with you, and really looking forward to hearing what people think outside too.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.

For sponsorship opportunities, reach out to [email protected].