Transcript: 'This AI Makes a Video Game World in 40 Milliseconds'

The transcript of AI & I with Dean Leitersdorf is below. Watch on X or YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:00:47
A demo of Mirage, the first real-time video-to-video model in the world: 00:02:38
How Mirage can take your vibe-coded game to the next level: 00:06:22
The new architecture of modern software: 00:08:45
How Mirage works so blazingly fast: 00:16:34
Inside Decart’s invention of a new “live stream diffusion” model: 00:20:33
Solving the error accumulation problem for real-time video: 00:21:17
How Dean thinks about inventing a new creative medium: 00:29:55
Dean’s take on the post-AGI world: 00:39:43
Why AI brings back the age of the generalist: 00:51:15

Transcript

(00:00:00)

Dan Shipper

Dean, welcome to the show.

Dean Leitersdorf

Thanks so much for having me. It’s been a while. It’s been a few months.

Dan Shipper

Thanks for coming. It has been a while. You’re one of my favorite people to talk to in AI. You're in this really interesting intersection of doing incredible new stuff at the frontier, but you're also a big philosophy nerd. So, we're just gonna talk about a lot of good stuff.

So for people who don't know, you are the cofounder and CEO of Decart. You can describe for us what Decart is in a second. But you do awesome. real-time generative video models. You just raised $100 million at a $3 billion valuation. Congratulations. Welcome to the show.

Dean Leitersdorf

So excited to be here! I read the newsletter when it gets to my inbox, and it's always fun to have these conversations. Yes, we have so many exciting things going on right now. These have been incredible the past few weeks, and we have tons of launches for the upcoming weeks—which we can get into a few of them here. But it's an exciting time. We're creating a completely new way for people to interact with AI, just to have fun with AI.

Dan Shipper

So just to give people a sense of what you do, you just launched this thing called Mirage, which is kind of crazy. Can you just show us Mirage?

Dean Leitersdorf

Yes, let me pull this up. We just launched Mirage a few weeks ago—like three weeks ago. By the time this goes up, it'll probably be four or five weeks since it was launched. Mirage is the only real-time video-to-video model in the world right now.

What that means is that you can take any video stream that you have—whether it's this conversation, your camera, or even a game—and just put it through the model and change it with a prompt. So let me bring this up right now so I can show it. Let me share my screen.

As you can see, this is the real me in the interview screen, and you have the Mirage version of me. So the current Mirage version of me is in the Versailles Palace. Or you can do some blocky kind of thing like this. Or we can do an anime version.

Dan Shipper

This is so amazing. I literally just saw this for the first time right before I started recording, and it was like an ‘oh my god’ moment for me.

For people who are listening, basically what's happening is you're on a webpage and you have your camera going. So I see an image of Dean on my screen. And then when you start Mirage, it just transforms you in real time into effectively a Pixar character—something that looks a little bit like a Pixar character, but with different themes and in different settings.

So right now you're on Jingle Bells. Now you're a wizard. You've got lights coming out of your hands. It's crazy. And now it's Lego. So how did you make this? I'm made of Lego. How does this work?

Dean Leitersdorf

Okay, let me show you one cool thing, then we can dive right into how this works.

But I love this one—the prompts I love. If you put it in the portal world, then you get physical objects, like blocky things. Like this is a tissue holder, then you can turn it into a gun.

And every once in a while, if you do this, then it shoots out something.

And one of the things I'm dying for the next version to have is so I can be like this and be like, okay, and then a huge portal pops up or something, imagine a real Rick and Morty gun.

Dan Shipper

Dude, this is so cool. So yeah, just for people listening again, basically you're in your portal world, you picked up a tissue box and the tissue box became a gun. And if you move the gun in the right way, or you move the tissue box in the right way, it shoots a little bit. So it sounds like right now the possibilities are real-time video games that you can just play. Is that where you're going with this?

Dean Leitersdorf

So I think the possibilities with what this kind of tech does is it actually enables a different way to interact with AI video. What did AI video do so far? It was great at taking a piece of text and giving us cat videos on TikTok. That's what we got out of the AI video. Now for the first time, computers can actually respond to us as we're moving things, changing things, and can change it in real time.

So it can be used for anything from gaming—we're going to have a mod coming out for Minecraft. We have alpha testing right now, and it'll be coming out in the next few weeks. You're playing Minecraft and through the game, you tell the mod, hey, change Minecraft to Barbie land, and everything becomes Barbie.

So you can take it to gaming. We're also integrating with lots of the game engines so that as you're developing a game, you can actually see this, or you can vibe code a game end to end. So imagine vibe coding with an LLM for the AI game logic and doing all the texturing with Mirage. That's one category. The real-time interactive gaming category is huge

Dan Shipper

Wait, let me stop you there for a second. So are you saying—you said it's video to video—so when you're vibe coding a game, how does that work? Is it generating video that you're playing, or is it also able to output like the 3D mesh that you might import into Minecraft?

Dean Leitersdorf

So one of the things that people have been going crazy over is you can just vibe code a game with low poly—just a bunch of blocks moving around—and turn that into something where you just have Mirage do the texturing of that. Oh, let's see if I can find an example for this.

Dan Shipper

Okay, let me see if I understand. So basically what you're saying is it's really hard to make a AAA blockbuster shooter game, let's say, but it's not that hard to make a 3D game, especially if you're vibe coding a 3D game that's just a first-person shooter where the gun is a rectangle or like a cube. And so basically what you're saying is you make that game, and then when your game engine renders that really basic game, Mirage takes it and turns it into a video that has all the textures and all that kind of stuff.

Dean Leitersdorf

Exactly. Spot on. And people have been playing with this online, and it is just mind-blowing to see what people came up with. You can actually—some people just use outputs from Unreal Engine and pass them through Mirage. Others actually just vibe coded games and added Mirage on top. I can send a few examples when you're editing this video, and we can put them in.

But it actually just lets you build a game very quickly with an LLM in just half an hour. You can be like, oh, add this tree here, add that there. And it doesn't even have to build a tree—it just builds a blob that kind of looks like a tree, and Mirage just comes and completely changes it.

So it's going to do something—I don't know exactly what, but it's gonna do something big for gaming. Either taking existing games where we were here at the office, we were playing GTA a few days ago and we passed GTA through a winter filter. And then boom, all of GTA 5 just turned into a winter version, and there's no winter in regular GTA 5. So you're gonna be able to get a bunch of varieties of existing games, but you're also gonna be able to create a game end-to-end, just fully AI-based—just vibe code it and texture it.

Dan Shipper

One of the things this is making me think of—and I'm not actually sorry for going to the philosophy stuff, but I'm not actually sorry—one of the things that it's making me think of is there's this really interesting thing happening where the underlying game still works with classical code. And then what Mirage is doing is sort of doing this emergent layer of texture, in the same way that maybe you can break down the world into atoms and molecules that work according to specific equations that we can figure out.

But then our conscious experience of the world is this emergent, textured, full quality type thing. And you kind of need—or at least the way that we understand the world is it has both, or at least the way I understand it. And it seems previously, all games had to be able to reduce down to that deterministic code, and it was very difficult and time-consuming to set that world up.

And now you have this sort of emergent layer that you can just vibe on top of the basic—I don't know—Newtonian or classical world that you can build with programming. Is that kind of what it is?

Dean Leitersdorf

I love that you're going into this. You got me thinking about this. There's some stuff that regular coding is great for and other stuff that AI is great for. What's AI great for? AI is great for doing stuff that's differentiable, that's continuous. Drawing a picture—that's kind of continuous. It doesn't matter if I get exactly one phase or something close. It's fine. What's AI definitely not good at? Reversing a hash function or something. Or even just implementing a hash function. Anything that is very discrete—that is not differentiable—AI would suck at.

Dan Shipper

Can you define differentiable for people? I know you're a math genius, and so I think I might have an idea of what you're saying here, but can you define it for us?

(00:10:00)

Dean Leitersdorf

Definitely. So think of a differentiable or continuous function—something where it's okay if you're close by. It's fine if I'm creating your image. It's fine if I get the exact image, but it's also fine if I'm missing a few things. If your shirt's a bit off, if the wall behind you is slightly different, if your glasses are slightly bigger, I don't have to exactly get what you're talking about. And that's the quality of a differentiable or continuous function.If you have a function that's not differentiable, not continuous, then it's very critical that you get it exactly right. Now regular CPUs are just great at doing stuff that needs to be exact. Regular CPUs and our computers—if we tell them to add five and five, they will always give us ten, and they know exactly how to do these deterministic operations. AI can do the non-deterministic operations where we don't need stuff to be exact.

Dan Shipper

I think this is something I've been playing with—a way that I've been formulating it is, and I'm probably not going to get it exactly right, but we can figure it out together. There's two ways of seeing the world. One is as this sort of countably infinite set where there's one right answer, and you can guarantee yourself that you're gonna get to the right answer if you just keep going through—like counting through the set. And that's the sort of classical way of seeing the world. And that's the sort of classical search function for the right answer for a program. Another way of seeing the world is there's this uncountable infinity of different solutions. And so you're not ever guaranteed to get to the right solution, but every solution is kind of meaningful. It's kind of right, or it's close—you can get further or closer to it. And both of those are really valuable, but you need to—at least my view of reality is you need to start with that kind of uncountable infinity where everything is full of meaning until you get to the right zone. And that's when you can start to solve with the countable infinities and count out the right solutions.I know I'm probably sounding very galaxy-brained to people listening, but we'll unpack this in a way. But I'm curious what that makes you think of.

Dean Leitersdorf

Oh, but I definitely agree with everything that you're saying, and I think that we need both. Some problems are just the big haystack with the single needle—you need to find that exact needle to solve it. Other haystacks have an entire area that's full of needles, and they're not all the same needle. Maybe one needle's better than the rest, but they're all an okay-ish needle. And then you just need to find the rough area where you have that bunch of needles concentrated.

Now let's take an example. So Google released Genie 2 recently, or they showed a demo of it—they didn't produce it—where they claimed, okay, well it's a world model. It can generate as you go. You just put in your keys and your mouse, you move them around, and it generates frames for you. Genie was trying to claim, well, let's replace everything. Let's just have AI create a completely new world for us, and we don't need any of the old software like game engines that existed beforehand. AI can just do everything. But what we're talking about with Mirage is a combination. You have the game engine doing the deterministic stuff. Oh, I need to remember that you have exactly 71 gold coins in your pouch, and I need to remember that you took this pickaxe and you put it in that chest five years ago, and if you go back to that chest, it'll still have the same pick axe you put there five years ago—having the game engine do that. But the more creative parts, where it doesn't matter if you specifically hit the exact needle but you get something right—have the AI do that. For example, texturing it or making it look different.

Dan Shipper

I think another good analogy that might unpack this for people is you need a skeleton to have your body work, and your bones define the hard limits on what you can do. But you have joints and you have muscles and you have tendons that your bones can pivot around, and that allow you this infinite flexibility within this rigid system that guarantees certain things. If you have bones, you can stand up. And if you have muscles and sinews and tendons, you can move around in all of these different ways that just bones wouldn't let you do.

And for a long time we've only been able to build computer systems that use bones. And now we have muscles and tendons that allow us to move around in all these different new ways, which is really cool. But you need the bones because if you're just tendons and muscles without bones, you're kind of like a jellyfish. It's just very hard to pin you down. You can do certain things, but it's hard.

Dean Leitersdorf

I love the jellyfish analogy, by the way. If I understand you correctly, you're saying that when we build Terminator, it's not just gonna be metal robots, but they'll also have—you know, they might have skin and bone surrounding them. But yeah, maybe. But no, I completely agree, and this is really what we're getting at—that we're enabling this from a technical perspective.We're creating video in real time that can touch everything from gaming, but also this Zoom call—or this is a Riverside call. Instead of having just you seeing me and me seeing you, we could be seeing something completely different. But it's touching the ability for people to take stuff that's in their imagination and then add it to an existing real world.

So we'll have this core that's the same—the world's the same, physics is the same. You can take your imagination and apply it to what you're seeing.How did you make this? That's a really fun question. So to crack real-time video, we had to crack two things. On the one hand, we have to write lots of very optimized GPU code. We sat and wrote lots of assembly for GPUs—for NVIDIA GPUs. It's called PTX, and it's like the thing that's below CUDA. So lots of people know CUDA, and that's like the NVIDIA software stack that lets you write good stuff for GPUs. We have to write in PTX, which is like the layer below CUDA. It's the actual assembly that gets written on the GPUs. So you had to do that to actually make it very efficient."

Dan Shipper

And did you vibe code this with Claude. Or how did you do it?

Dean Leitersdorf

Oh, no, no, no, no. Yeah, Claude's amazing. I love Claude Code, by the way. I use Claude Code a lot. Good job, Anthropic.

But no, it has no idea what it's doing on those levels of the stack. It's the dark abyss of AI that no one wants to touch, including themselves.Now, so yeah, on one hand we have to write really lots of assembly code for GPUs to get this to be efficient. You know, Mirage—the current version that you saw—is 40 millisecond delay. That's 0.04 seconds between when a frame comes in and when the model spits it out. The next version of Mirage is going to be 16 milliseconds delay—one six. And to do that, you really have to write very optimized assembly code for GPUs.

Now, on the other hand, you have to completely build a different kind of model. So all the video models we know today are what's called bidirectional, and we had to build an autoregressive one. What does it mean? All the video models today—if you go to Google Veo or a bunch of these—you put in a prompt and then generate, it spits out like a five-second clip and it does a lot of processing. So it thinks for like a minute and then you get your five-second clip.What we have to do here is if we're generating a five-second clip or an hour-long clip, doesn't matter. We're not generating the entire clip at once, but we're generating frame by frame. So what you're seeing with Mirage is it gets a live input stream and creates a live output stream. It just generates them frame by frame and not the entire video at once.

To do that, it's kind of a combination between a video model and an LLM. So you know how LLMs just generate the next token? So it's kind of like training a video model just on next frame prediction and not next token prediction. You just have to predict the next frame each time.

Dan Shipper

Okay, so what you're saying with the autoregressive stuff is what you're doing is feeding in—let's say you're halfway through the video—you're feeding in the frames of video that you've already generated to the model to produce the next one. I assume you're also feeding in like a source frame of the video that you want to generate. Yeah. And that's how it works exactly.

Dean Leitersdorf

So you feed it in two video streams. One is just the video stream that needs to be edited, and two, what is already generated, and it needs to predict the next frame.

Dan Shipper

And what is the advantage? You're doing it that way vs. just putting in the two frames, the last two frames you know, the last video generated frame and the last video input and doing that next frame vs. like a bunch of frames. What is the advantage of that?

Dean Leitersdorf

So you really need the longer context you have, the more you know of what happened. Because if, for example, I was wearing a black shirt in this stream and now I leave the camera and I come back, how do you know to put on the black shirt again—not to change it? So it's kind of like the model's memory to be able to see what was in the past. And also it's very critical for the model to see motion, because it's not just images. If, for example, you use the model to add a portal gun, it needs to be able to see me doing this action in order to be like, okay, now I need to create a portal. So it's very valuable for the model to see motion and not just static images.

Dan Shipper

I see. And is it like a diffusion model or what kind of model is it?

Dean Leitersdorf

Yes, it's a diffusion. We call it an LSD model—a livestream diffusion model. I have no idea why you're laughing. It's a very technical term. Yeah, so these LSD models—it's a livestream diffusion model. So it's a combination of a diffusion model with an autoregressive transformer, which really just lets you generate—predict the next frame every time that you're generating it.

Now, the hard part about this—do you remember like early LLMs, like even GPT-2, 2.5 days, or even when GPT-3.5 came out, they got stuck in loops? Do you remember those days?

(00:20:00)

Dan Shipper

Yes, I do.

Dean Leitersdorf

So for listeners, sometimes in the early days of LLMs, you'd talk to it and it'd be great for like a few back and forths. And then it'll just start repeating the exact same thing. It'll just say—repeat, get stuck on a word or in a sentence. Just keep on saying that exact same thing over and over and over again.

That same problem that LLMs dealt with a few years ago comes back when we try to do autoregressive video models. What happens is the autoregressive—like the model's great for the first few seconds. Mirage, our biggest challenge was we could easily get Mirage to be great for two, three seconds. But then it slowly starts to degrade and it gets stuck in this loop until it just gets stuck in a single color and your entire screen just becomes red or blue or green.

And solving that repetition problem was the hardest thing about Mirage. Now, if you use Mirage, it can do an infinite stream—it can keep on going with it for hours on end."

Dan Shipper

Is that from the Yann LeCun problem with LLMs, which is that errors start to compound and so the longer.

Dean Leitersdorf

Exactly, yeah. It's the exact same problem. It's the error accumulation problem that every frame you're generating, you're going further and further from the distribution that you actually saw. You get slightly more error as you keep on going until after like a hundred, 200 frames, you just lose everything and you're a static color.

So solving that took six months of research. We had an easy version of Mirage that worked six months ago—it just could hold for like two seconds. And so going back to your question, how do we build this? So on the one hand you have to do lots of GPU code to actually get this to run fast. On the other hand, we had to solve lots of very tough research engineering problems on the model layer. It required—I think at this point we ran several thousand experiments until we really got it right.

Dan Shipper

And is it one thing that you unlocked where it just fixed everything? Or is it a bunch of little incremental things that you fixed here and there?

Dean Leitersdorf

What you're talking about is a great point about AI. There were roughly seven or eight things that we had to unlock. And the annoying part is it's binary. If you—and until you unlock it, nothing works. And then when you unlock it, suddenly it's all better. And that's an annoying part about AI research.

You never know how close you are. You have no idea if you're gonna solve this problem next week or two weeks from now, or two months from now, or two years from now.

Dan Shipper

But how did you know, for example, on the second thing that you added—it's still not working, but how did you know that it was better and you should keep that improvement?

Dean Leitersdorf

So we broke this down to lots of different, smaller problems. First, be able to do this just with images or be able to do this just with certain types of conditions. As you know, for example, Oasis was the last thing that we launched that went insanely viral back in November. It could generate Minecraft.

Dan Shipper

Remind people what Oasis is.

Dean Leitersdorf

Yes, it's a model that could generate Minecraft in real time. Went insanely viral—everyone played with it. It was very fun for us. We just came out of stealth, and that was a real-time video model that worked just for Minecraft.

The reason that was actually an interesting step on the way towards Mirage is if you tried to create a video model that is great just for one single distribution—so it is just Minecraft, it doesn't know how to do everything in the world—it is much easier for the model to learn that. So for example, that's one of the problems that we use as a stepping stone.We would take larger and larger distributions.

So you begin with Minecraft, which is a very small distribution. It's all roughly the same. And then you did, okay, let's do just like street walks—take like a bunch of video of streets across the world and let's be able to simulate that, until you grow and you add more and more complex distributions.And each time you see like, okay, can I crack the next distribution? Can I actually get it to work with this additional challenge?

Dan Shipper

It's sort of like the way a child learns to run. It's like, first you sit up and then you pull yourself up and then—there's exactly, there's a bunch of steps, right?

Dean Leitersdorf

They have to use more and more muscles as the challenges get harder and harder. Yeah.

But all this takes me back to why we're even doing this. So I'm a huge believer that there's some entertainment, fun, creativity experience that we're gonna unlock in the next year that will completely change everyone's lives. And that's what—that's where we're here for.

And the way I like to think about it is this: what's the internet? What was the old internet before ChatGPT? The old internet before ChatGPT, three years ago, used to be four things. It was knowledge. It was creativity—like knowledge is everything Google-oriented. Creativity is Facebook and TikTok and Instagram, but also Netflix and Roblox. Everything you do when you're not doing your homework. Shopping and messaging. That's the internet. That's the entire internet.

Dan Shipper

There's nothing else you're forgetting.

Dean Leitersdorf

Just these four categories.

Dan Shipper

That's the four categories of the internet, but continue.

Dean Leitersdorf

That is an interesting point. Anyway, the internet used to be four things. Now—five for anyone wondering. Now the first thing, knowledge. Chatbots have just obliterated knowledge in the new internet. Anything we used to do knowledge-oriented, whether it was search or docs or anything—knowledge sharing or knowledge creating or maps—it all is going to chatbots.

Doesn't matter if OpenAI wins or Gemini wins or xAI wins, someone's gonna win that. But chatbots are replacing knowledge. What happens to creativity? Is there gonna be a completely new experience that lets us be creative, lets us have fun that's different from what we already know so far? And the way I like to think about this, again, in the easiest way possible, is if you go up to a 12-year-old today—a random 12-year-old, either in Kansas or in Germany or in Japan—and you ask them, what's ChatGPT?' What are they gonna tell you?

Dan Shipper

A chatbot.

Dean Leitersdorf

What do they do with it?

Dan Shipper

I do my homework with it.

Dean Leitersdorf

Exactly. Every 12-year-old on the planet will tell you the exact same thing. oh, ChatGPT? Oh yeah, that's the AI that does my homework for me. It helps me with my homework. And then you ask them a follow-up question. What AI do you use when you're not doing your homework? And they have no idea. They stare at you and like, I don't use an AI when I'm not doing my homework. I'm on Fortnite or Minecraft or TikTok. What do you mean by AI when I'm not doing my homework? And my entire belief is that AI is changing everything. It really is. And it can't be that AI doesn't create completely new experiences that we just get to have when we're not doing our homework.

Dan Shipper

Yeah, I mean, I definitely resonate with this. I think it's really clear that every new technology creates new content formats, creates new art forms. And we may have cracked some of that. You know, you can see Suno, for example, is gesturing in that direction. But there's probably—or definitely—a bunch of things that are on the horizon that we haven't quite figured out yet.

Because, for example, video games used to be really hard to make, now everyone can make a video game. That's a really interesting thing. What does that look like? And it's probably not gonna look like traditional AAA games. It will evolve into something else that has—for the medium that we've just invented.

I think my question for you is what do you think about the strategy of inventing a new medium like that? So you're obviously starting with the foundation model. An interesting question for me is, for example, OpenAI started with the foundation model, but it didn't really work until they created ChatGPT. It didn't really work until there was a form factor that was consumer-friendly. Are you all working on that?

What is your strategy for taking this foundation model and turning it into something that people actually use to make stuff?

Dean Leitersdorf

So right now there's no app out, but by the time this gets published, there will be an app because it's coming out in the next few days. And I completely agree that this great tech has been—there's this term called AI experts—it's people who know how to use AI, know how to prompt these systems and everything.You have to turn everyone into what's called an AI expert. A random kid on the street or a grandma living somewhere should be able to be an AI expert. And what we need to do is really find ways to put this in the hands of people, put it in an app, put it in something that's comfortable for them to use, and let them access this great tech that's been built over the past few years in audio and video.

I think it's a combination of what's gonna create these new experiences and really put this in the hands of people. I think we'll do a lot of things. We're gonna publish lots of different experiences through this app, and it remains a mystery to see which ones will take off and which ones won't.

And it's kind of an experiment we're gonna be running together, like we're all—as humanity—going to understand, okay, what is fun about these experiences that we can do?

(00:30:00)

Dan Shipper

Can you show us the app?

Dean Leitersdorf

I actually can't on this computer. It's a new computer. So let's do this—I can explain it here.

So the app's called Delulu.

Dan Shipper

It's very Gen Z of you.

Dean Leitersdorf

We all have to be a bit delusional, okay? It's what we're building. We're building the one place you go to when you're not doing your homework, which is Delulu. Now, what you're gonna get in Delulu right now—if you go in and download it—you're gonna be able to just upload a picture of yourself, either from your camera roll or take a photo, and the app itself will just create hundreds of variations of it instantly. So you can see yourself eating pizza on the Eiffel Tower on top of Mars, or you can see yourself—it's actually all the gender stuff that is really fun when it turns a guy into a girl and vice versa. So you can see yourself in a dress. You can use it to express your emotions. You can be—one of the things you can do is you can be angry at a friend or something, and you think it's a selfie and you tell the AI, okay, make me angry. And you'll suddenly get like devil horns or something and like smoke coming out of your ears. So what you're able to do is you can take any photo that you have and just express yourself in a way that you just couldn't before. Right now, if you download it, you're gonna get all these imaging tools. You're also gonna get, over the next few days, the same thing with video, and it's gonna be added as a camera. So you open up your camera and you just point it at something—you point at your friend, you're like, okay, turn my friend into Elsa. And suddenly your friend becomes all blonde and with a blue dress.

Or you can point at your friend and say, hey, add a little monkey on their shoulder, and you'll have a little monkey sitting right here and they can high five the monkey, and the monkey high fives back. So all that's in Delulu. And there are gonna be so many releases coming out over the next few weeks. Every week we're gonna have a release for Delulu with new capabilities that we're adding constantly on the video and the audio side.

Dan Shipper

And is it like a social network or are you posting this to Instagram or Facebook or whatever?

Dean Leitersdorf

First stage, you're gonna be posting this to Instagram and whatever, but what we really found interesting is trends. When someone finds someone here at the office playing around with Delulu and it turned their hair into grapes—and that was actually super cool—and they wanted to share that with someone else so that you could also turn your hair into grapes, because other people in the office were like asking, hey, how'd you do this? Can I do this as well?'

And so I think one of the key parts that we already find very interesting in this category is how do you have this place where you can discover new ideas for what the AI can do together with you, and try these out yourself instantly and see if you like them? See what they make you feel.

So I don't think it's gonna be a social network in the sense that we got used to, but it's gonna be something completely new, like adding in a feed there—I'm not sure it's the thing people will want, but we will want something else. We'll want the ability to find new trends, to find new cool stuff that's happening.

Another thing that we found was really interesting: you know how social media today is not really social like it used to be? It used to be all social graph and friend-based. You used to be able to do things on Facebook, Instagram, with your friends. Now it's all influencer-based. You see something cool, you see trends that are happening, but they're not your friends.

Dan Shipper

Thank God. Because that's my business model. I do miss those days. But yeah.

Dean Leitersdorf

But there used to be days when we had like our actual friends on social media, and we found that this was actually really cool with Delulu. People actually cared what their friends thought about the edits they were making. So I can make like a hundred different variations of myself, and it would be really cool to see—if my friends or my family, I'd send it over to them and they usually would like the ones that I hated. And that would trigger like a huge spark inside of you. Like, holy shit, they're seeing the world in a way that's different than I am. And it's fun to do that.

Dan Shipper

So I think this is very cool. It reminds me, though, of some things that have already come out. So one is Animoji from Apple. There was also a whole trend, maybe like two years ago—you know, I can't remember what it was called. It was like DreamBooth or something like that. Basically Stable Diffusion generations of people's headshots—basically it's you in a forest or whatever. So there have been some waves of this before, and those have all kind of burned—like gone viral and then burned out. So what is your—how do you think about that and how do you think about this as being different?

Dean Leitersdorf

So, by the way, you know, you bring up DreamBooth. We have one of the experts on the subject—the person who was last author on the DreamBooth paper. He just joined us a month and a half ago at our SF office. He used to be at Snap, at Google before that.And I was having this conversation with him as well. He's like, okay, what's happening here now? The tech has come so far over the past two years since DreamBooth came out, but back then it used to be a great proof of concept. Now it actually works. You look at the picture like, oh, wow, that really is me, and doing something insane, and it doesn't take any time to do that. The AI is creative enough also to come up with new scenes. So I think there's been a huge jump on the quality of these things that actually really gives you a different experience. But I agree that images are just a gateway to what's enabled with video and audio for Delulu. I think we've all seen these image editing apps, and here we're just making it much easier for people to access completely for free, and that creates a different dynamic. But it's still just a precursor to the things that are being added right now, which is video and audio.

When you do this with video and audio, when you do it in real time—I just take out my camera and point at something and ask the trash can that's next to me to turn into an elephant and it becomes a little elephant. The trash can just walks around the room—that is something completely new that we just never had before.If I ask the AI to add a little turtle here on my shoulder and the turtle starts crawling around my shoulder and like rests down here and goes to sleep, these are things we never could really do, and I think it's very interesting to see what people will do with it.

Dan Shipper

And how are you testing this internally? It seems like the idea is probably kids are gonna be using this a lot, and so do you have alpha or beta testers who are in that age category that you're doing this with? Or how do you make something that you think kids are gonna love if you're not kids yourself?

Dean Leitersdorf

So, first of all, we have a bunch of alpha and beta testers, and that's the thing that excites me the most—talking to all these people. And by the way, if you're listening in and you want to be an alpha beta tester, shoot me an email directly—[email protected]. Would love to be able to show you the tech before it gets released because that's the way that we're exploring this together.

I mean, we're creating great tech, we're using it for ourselves and it's fun for ourselves. But having people experiment with it and understand together what this is gonna enable humanity to do—we're all just building this together and we'd love to have everyone's input on this.

I think one of the critical things from our perspective is in two, three years, the world is going to have almost all its pixels and audio waves be AI-generated or AI-modified, and we have to put this tech into people's hands right now so that people also build the tolerance and understand how to use this and what this means for them in their lives.

Dan Shipper

I want to take this conversation in a totally different direction. What are your AGI timelines?

(00:40:00)

Dean Leitersdorf

I think there are only two interesting problems in the world right now. Only two. Nothing else matters, really. One is the race to AGI and two is the race for consumer dominance—that's it—for building the biggest consumer company in the world. Those are the only two races that matter. Nothing else really matters. Everything else is fun. Everything else is cute and maybe worthwhile, but not actually interesting.I think with AGI we have to split into two. There's a Terminator stage where we get to singularity and it's smarter than all of us, and it's a good question—what happens to humanity then? And the second question is, when you get economic AGI—when just machines are able to do any economic job better than all of us, or better than the vast majority of humanity—that just makes no more sense to do any kind of work anymore.

The second one, the latter, is probably a few years away. The former is a few years away. The latter, though—the economic AGI—could easily start keeping up on us in 12 to 18 months. That will slowly start seeing AI just be economically better than a lot of us at creating economic value. And that will have a dramatic impact on society. Now, I'm a huge believer that this is gonna be humanity's best golden age, best time possible. We're going into a new world. Sure, AI is getting better, it's getting stronger, it's getting smarter. But humanity's future is very vivid. And we have to understand how we build this, and at the same time, just build the best world possible for humans.

Dan Shipper

Okay, let's take that one at a time. I think the most important thing I want to talk about is if I understand you correctly, you think that most economically valuable work will be able to be done by AI in the next 12 to 18 months. Another way to say that is it will be less and less appealing to hire humans to do work and will be hiring AI to do work instead. What do you think are the implications of that? What are you really saying when you say that?

Dean Leitersdorf

I think that it's very clear at this point that much of the work, whether it's lawyers or accountants or lots of jobs that are very virtual-oriented—a lot of that work can be done really well by AI and it's getting better and better, and there's no reason to assume that it won't continue to get better.

Now, it does mean that a lot of us will have to adapt, just like we all adapted when the internet came out or when previous generations of technology came out. It also means that humanity will get a lot more time to think, to be creative, and to explore what's in our hearts and not just what's in our minds.

What I like going back to a lot—why did democracy first start in Greece? I was actually very intrigued by this. Democracy first started in Greece because people had the time to think. People didn't have to work the fields 24/7. Before that, they had to be—agriculture was very hard. You would be in the field every day for 14, 15 hours and you would just go to sleep and go back to work the next day.

As technology got better and people started getting some time to be able to take off and not work all the time, the first thing that we got was that the ancient Greeks became philosophers and they started thinking, okay, why are we even on earth? What are we doing here? What's the right way to govern people? And they created all these systems that we still use today. And we were able to get that only because humanity had more free time on its hands.

With more and more economic value coming out of AI, we will have more time on our hands to be creative and to use our imagination. And that's something I'm very excited for—that's something that we have to give an avenue for people to be able to use their added time to be more creative, to actually do stuff that makes them feel something.

Dan Shipper

So there's a lot here. And I want to go into the ancient Greece stuff, because I have a whole take on that too. And I think there's a lot of overlap actually. I would love to hear what you think, but there's so many parts here. But I want to start with just going back to knowledge work—you could pretty much be done by AI—because I'll take the opposite of that. I don't think in 12 to 18 months, if you come back on the podcast, that we'll have seen that happen. And there's a couple reasons I have, but maybe I'm wrong.

The first thing is, I think you're right. Let's take the lawyer's example. I think that you're right that if I think about the things that I asked my lawyer to do, given the right prompt, the AI is gonna be better and faster and cheaper than my lawyer. However, given the right prompt is like an incredibly difficult question. And I don't think we're 12 to 18 months away from self-sufficient AI that's able to know what prompt to give itself at the right time.And I don't think that's just a point that we pass. I think that's very spiky depending on the domain and the context and all that kind of stuff. So the way that I think about AGI, for example, is I actually think about it in terms of child development. When a child is first born, they are not independent at all from their mother. As they get a little bit older, you can leave them alone for a minute, five minutes, 10 minutes at a time until when they're 20 years old or whatever—18. Theoretically, they're adults, they can do whatever they want. And I think you can see AI following that same trajectory where if you take coding, for example, it starts with tab completes. It's like one step, right?

Dean Leitersdorf

It was just a tap. And this was just two years ago.

Dan Shipper

Exactly, but now it's like Claude Code is like 10 or 15 minutes, but only sometimes. So you can't use Claude Code on your deep down assembly stuff, right? But I think a better definition of AGI is when it is economically profitable to leave your AI on all the time. It's always working, it's always doing something.However, even in that scenario, which I think is pretty far away—like several years away at least—because I think that's a good definition because it will require a lot of things that we don't currently have. So, for example, continuous learning. I think the only way to get AI that knows how to prompt itself in the right situation at the right time is if it's good at continuous learning. And I don't think that better context engineering is actually gonna fix that. I think that's gonna be maybe part of the solution, but you have to be able to update your weights is my belief.So that's why I don't think it's 12 to 18 months away. I think it's much farther and it will be very domain-specific when it is totally AGI, totally independent. So maybe coding is first, but that'll still be only some kinds of coding. Like it's still not gonna be for your CUDA improvements necessarily, or PTX improvements, right?So that's kind of my AGI take. What do you think?

Dean Leitersdorf

So I love this because you said, 'When does it become economically viable, profitable to just keep your AI running all the time?' And clearly, by the way, we're at that point with computers—keeping my laptop on all the time is something that I don't even think about how much electricity it costs me because I just know the value I'm getting out of it is insanely higher than—I know, how much does the electrical bill for my Mac cost a month? $20 maybe. So it's definitely economically viable to keep your Mac on 24/7. And we're probably gonna get there with AI as well, that any price I'll pay for actually running this AI will just make itself back instantly.And you gave the Claude Code example. I completely agree that it's not gonna replace coders, because I've been using Claude Code over the past two weeks to code some stuff for our app. So I've been using Flutter—I have no idea how the UI works. I literally never wrote any CSS or HTML or JavaScript or any of these things. I can write assembly for GPU code—I love that—no idea how the UI works. But I was with a team working on the UI of the app. I was able to actually contribute a lot to that project via Claude Code without knowing anything about the syntax of these languages.And what the contributions were were that I had intuition of like, okay, you gotta be doing something wrong because there's no way you're storing the image here, but also somewhere else, and so you must have sent it through something else in between.

And having the intuition of what's the actual challenging part—you had to do moving the data around or whatever—and just letting Claude Code do the actual typing. That was something that personally I've been experiencing over the past two weeks and it's been really great, and it's been very valuable.

So I think that maybe a different way to phrase it is there's a chance in 12 to 18 months, AI lets us be so productive that we're able to create companies that are just way bigger than anything we saw so far. And I'll give a concrete example of this. Today we have ten trillion-dollar companies in the world, give or take, depending on exactly when you check and how Tesla stock does. But we have 10 companies in the world that are past a trillion dollars in market cap. Back in 2017, we had zero. We got our first one in 2018 when Apple crossed the line.

And that's 2018—that's seven years ago. Seven years ago we crossed the line of first having a trillion-dollar company and now we have 10 of these. A lot of it is due to just technology making things more efficient so that we could actually make a lot more money and bring a lot more value to the world.

And I think there's a very good chance in the next 12 to 18 months, AI just creates so much value for humanity that the entire stock market doubles in value."

(00:50:00)

Dan Shipper

That's an interesting one. So now I want to go back to Athens, because I think this actually dovetails with the point you're making.

So I'd probably say yes, I don't think 12 to 18 months—I think that's way too quick. I think it will take many, many years for it to—like even let's say you've invented this AGI, whatever our definition of AGI is, but you never want to turn the AI off. We will invent it tomorrow. I think in 12 to 18 months, maybe there'd be a bubble in the stock market, but it wouldn't be like a thing that would maintain true value. I think it'll take a lot longer, but I do grant your point that it will allow us to build much bigger companies than we could before. I also think it will do something else, which is allow us to build many more smaller companies that accomplish much more than you could as a small company.

And I want to talk about why. So back to your Athens point, I actually don't know why democracy arose in Athens. I remember like Solon's laws or whatever from my history of ancient Greek class, but I can't remember why he did it. But the thing that I love about Athens is it's a society of generalists, right? Democracy sort of requires that—direct democracy requires that—where you can be a statesman, you have to be a lawyer, you are a prosecutor, you are a juror, you are a warrior. You're everything. You need to be good at everything.

And so the question is, why did that stop working? And the reason it stopped working is because Athens became an empire. It became the equivalent of a trillion-dollar corporation. What's interesting about being an empire is you actually need specialists because you can't send some farmer to be the general on your Sicilian expedition. Like, you want someone who's got a lot of experience. And so specialization allows for collaboration across larger and larger organizations of people, but then you lose the sort of generalist thing. You incentivize being a specialist, which I think has basically just continued in Western society over the last couple thousand years.

And there's a lot of good things from it. But also if you're like me and you love being generalists—and I think what's interesting about AI is because you have this thing in your pocket that's like a thousand specialists in your pocket, it allows you to do a lot more for longer as a generalist. You don't have to specialize as much.

And so, for example, if we take Every—we've got 15 people, almost everybody inside of Every is a generalist and is doing multiple jobs. The lines between jobs start to blur. And they can do that because they have AI. And so my hope is maybe in a similar way to yours, we get back to—you start moving back toward more generalists and this sort of golden age of Athens type vibe because people can get more done with AI. They have all the specialists and they can coordinate across more people without having to specialize.

Dean Leitersdorf

I love this theory that we're going towards a world which is gonna be more—we will actually have space for generalists again. And it's gonna happen for exactly the reasons you're saying. If AI gives us tools to jump into fields we're not experts in, but the AI has seen that experience and we can, through the AI, actually manifest our general abilities—exactly like what I was doing these past two weeks with Claude Code and Flutter. I have no idea how to write iOS or Android apps, but I do have common sense. I can use that common sense combined with what Claude knows about how to write code to create this thing.

You know, my dad—he's retired. He used to be dean of medicine at one of the Israeli universities. He's now not officially at Decart, but he does come with us to—he hops by the office like twice a week and he likes being called the senior advisor for common sense. He's like, hey, I am not a lawyer, I'm not a coder, I'm not a business person. But I have common sense. That's what I have, that's what I can contribute to this situation.

And so I think that's really a huge point—what you're saying is, yeah, we'll actually be able to have generalists that can act as if they were specialists in each field. So that's really valuable. Something else that I think about a lot in terms of AI—Every is 15 people. Decart is roughly 70 people now, and I think you've done a tremendous job at having 15 people that almost all of them are generalists. That is typically incredibly hard to do, and it speaks for itself because you're actually able to create insane stuff that no one else can. Also, here at Decart, almost everyone at the company is a generalist and an independent thinker, and they just go ahead and they do stuff without being told. We're still 70 people and we're still completely flat. For many years, I spent around two years of my life doing this—I was thinking a lot about what organizations work and what don't.Now, lots of organizations—like you said, like the trillion-dollar companies today—just don't work. Google has 180,000 people. They don't really work. We can't get the best out of them. They're smart people, they're talented people, but Google has a hard time getting the best out of them. But it's not just Google. It's every big tech.And this got me thinking, what's the biggest bottleneck to actually running a company or a group of people and telling them, hey, you all have to go build that one huge thing together? What is the bottleneck? And by the way, what do you think?

Dan Shipper

So I would say there's one thing—there's a real information flow problem. So flowing information from the bottom to the top and from the top to the bottom, which AI solves. And then the other thing that I think is really important is that humans have different motivations depending on the circumstances that they're in.

And inside of a big company, they tend to be most interested in the thing that will get them promoted rather than just doing the right thing. And that's for a lot of different reasons. It's partly due to information flow, but that's I think a big thing. So I have no idea how to solve the second one.

Dean Leitersdorf

And I agree that's fine, but even if we did have tons of people who only really wanted to actually build insane stuff, I'm not sure we can fit a thousand of them in the same company and still actually get things done because of the first constraint that we have—the information flow constraint.

And when you see the organizations that function incredibly well and those that don't, that's really the key difference. So with an organization that's up to a hundred people, information will flow and it'll be fine because you can get everyone in the same room and they can all talk to each other and just move information that way.

There's also really big organizations that did work really well—the ancient Roman legions. You can have a hundred thousand people with swords and shields, and you'll tell them, okay, go conquer that hill and then go conquer that hill. Now, the reason that those organizations worked was because not a lot of information really needed to flow. You could just tell them, look, we're on this hill. They're on that hill. March and stab. That's the information that you needed to get to your hundred thousand people on the field.

And what I tried to do over a few years, and then the conclusion I got to was—organizations that work vs. ones that don't, it's how many generalists do you have inside the org or how many slots do you have for generalists? If you have only 20, 30 generalists in the organization, that's great. They'll think of crazy stuff and they'll go ahead and they'll pull it off. But if you have a thousand generalists, they will get stuck on communicating with each other and understanding, wait, can I do this? Can I not do this? Is it okay that I do this?

And that's where humanity has never been able to construct an organization that gets a thousand creative people and lets them all be creative at once. So any organization you're in can only enable maybe 20, 30 people to actually be creative. Whether it's a 20-30 person startup or a Roman Legion—you had 20 creative generals and a hundred thousand foot soldiers who didn't have to be creative at all.

(01:00:00)

Dan Shipper

Really? Interesting. And why is that?

Dean Leitersdorf

It's a good question why that is, but I think that when your job doesn't require creativity, it's rather easy to define. Okay, your job is to go ahead—you're working at a Ford factory, your job is to pick up this tire and put it on the car.' And it doesn't matter if you do it exceptionally well or you do it average. If you do it average or exceptionally well, maybe it's 50 percent difference.

But if you have to be creative—okay, go build a completely new experience for people or build some breakthrough technology—that is a role that you need to be able to take risks, you need to be able to do stuff that's much more complicated. It's hard to define. And for people, the way that we have information bottlenecks and that we can't communicate too much—I can't take everything that's in my mind and just telepathically send it over to you.

And so we have to be able to dumb things down and explain stuff in very simple terms. Okay, your job is just to do this. And I can explain it in three bullet points. And so when something's creative, it's in its early stages—not that easy to define.

Dan Shipper

I think I agree with this. I think the way that I would describe it is in any kind of work, there's at least two phases and they're kind of fractal within each other. But we can just for now say there's two phases. There's the explore and then there's the exploit phase. In the exploration phase, you're trying to figure out like, what do I even want to do? And the exploit phase, it's just an execution problem. You're just solving puzzles that you like—the frame of the puzzle is set.

And we currently use a lot of humans for the exploit thing, and we may be able to use a higher ratio of humans for exploring now that a lot of the AI is gonna be able to exploit. Because AI is currently very good at problems you can define, but there are surprisingly few problems you can define that are valuable compared to all the problems in the world.

And actually I think this goes right back to—this is a really good bookend to the podcast because this goes right back to the first thing we talked about, which is one needle in a haystack or this sort of infinite forest that you can go down and every path you can go down is meaningful and you kind of have to explore that infinity and there's no right answer.

And so I think we can, in a world where AI is quite good at finding the needle in the haystack or something like that, we can spend a lot more time exploring. And AI is also a good tool for exploring, but it ultimately comes down to those two phases of creative work maybe—is always a way to say what you're saying.

Dean Leitersdorf

And AI is going to leave the creative stuff to us—that's gonna do the stuff that is well-defined. And that's great. And not only that, because it overcomes lots of communication bottlenecks, you can actually see everything we're doing. In a 1,000-person organization beforehand, before AI, no one—not a single person—knew everything that was going on in a thousand-person organization. No one read every email and heard every phone call. Now we actually can.And so it can help us communicate better and help us each be creative and let us explore until we find something that's valuable to go ahead and exploit, and then it can just do the exploitation on its own. I think that's a great way to sum up what I think will happen over the next few years—that AI will do the stuff that's more well-defined and will leave it to humans to be creative.

Dan Shipper

Dean, this is a great place to leave it. This was an amazing conversation. Thank you so much.

Dean Leitersdorf

Thanks so much, Dan. This was amazing and we should do it more often.

Dan Shipper

I would love that.

Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Write brilliantly with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.