Transcript of ‘Building AI That Builds Itself’

The transcript of AI & I with Yohei Nakajima is below.

Timestamps

Introduction: 00:00:59
BabyAGI and its evolution into a more powerful tool: 00:02:26
How better models are changing the way Yohei builds: 00:05:00
Using code building agent Ditto to build a game of Snake: 00:08:10
The ins and outs of how Ditto works: 00:13:24
How Yohei gets a lot done in little time: 00:19:21
Yohei’s personal philosophy around building AI tools: 00:21:50
How Yohei experiments with AI as a tech-forward parent: 00:33:13
Demo of Yohei’s latest release, BabyAGI 2o: 00:39:29
Yohei’s insights on the future of AI tooling: 00:51:24

Transcript

Dan Shipper (00:01:00)

Yohei, welcome to the show.

Yohei Nakajima (00:01:01)

Thank you. Good to see you again.

Dan Shipper (00:01:04)

Good to see you too. For people who don't know, you are the general partner of Untapped Capital. But maybe more importantly, you're one of the coolest online AI tinkerers in the whole AI tinkering space. I just feel like every day I go on X and I see you releasing something new. You famously built the first open-source, autonomous agent, BabyAGI about a year ago.

Yohei Nakajima (00:01:31)

Yeah. Last March.

Dan Shipper (00:01:32)

Last March. And I did an interview with you on Every, I think, around that time too. And you just have this incredible array of tools that you've built yourself to make your work and your life better using AI and I'm just really excited to have you on the show to talk about tinkering.

Yohei Nakajima (00:01:51)

Thank you. You're too kind. I'm just really lazy and whenever I'm working on something that I don't like doing, I'm always asking myself how can I cut this out of my work? So even before AI, I was a pretty heavy no-code Zapier user. But then LLMs just unlocked so much. It's been so fun to just tackle one task at a time—try to remove it.

Dan Shipper (00:02:14)

Yeah, it's really awesome. I want to talk about— I feel like the arc that you've been on is you started off with this BabyAGI autonomous agent thing. And that honestly kicked off this hype wave about agents. So talk to us about BabyAGI and then tell us about the kind of the arc that you've been on since then, and sort of what you're building and what you're thinking about and where you think the future of this kind of thing is going.

Yohei Nakajima (00:02:39)

Yeah. So BabyAGI was, I guess, I’ll introduce essentially the idea of looping through an LLM. Part having an LLM generate a task list and parsing that by code and then tackling the tasks one by one—at that point, just using an LLM. But I think the 100 lines of code, the simple pattern, inspired a lot of people. I think the reason it was so popular is because of everybody who saw it. They could think of ways that they would make it better. And I think the simplicity is what kicked it off. Since then, from our fund, I've been able to invest in a handful of companies like E2B, AutoGrid, and a few more in the agent space. So it's been incredible working with those founders on thinking through how to build more reliable agents.

Dan Shipper (00:03:19)

Were those people explicitly inspired by BabyAGI?

Yohei Nakajima (00:03:23)

I think Autogrid and Cognosos were. I'm pretty sure they built something right after BabyAGI launched. Some of them were.

Dan Shipper (00:03:23)

That's so cool. It's like incepting. It's like manifesting.

Yohei Nakajima (00:03:36)

It's kind of an incubator, but more like a public open-source incubator where you're just incubating ideas publicly. And simultaneously, investing in the space, meeting founders, I've been building my own BabyAGI and kind of iterating on it. I took the first original BabyAGI and did about seven iterations last year called Baby Bee AGI, Cat AGI, Baby Deer AGI—with animal names all the way to Baby Fox AGI. And each time I was introducing a new design pattern. It was really a way for me to share ideas on what I think could make autonomous agents better. And then this year I started from scratch with a new idea and was kind of mulling around, playing around for the first six months and then something clicked last month and then I built out BabyAGI-2o, which is a framework for self-building autonomous agents, which has been my theme for this year, which is the idea of an autonomous agent that can build its own capabilities to improve itself. But yeah, that's kind of where we're at.

And then this week I released another kind of small script called Ditto, which is, I feel, more similar to BabyAGI because it's a super simple 500-line script that can build multi-file apps, kind of a little poor man's Devin. And then last night I figured out how to incorporate that design pattern into BabyAGI. So I'm pretty excited about it. I guess I'm going to call it BabyAGI-2o because Sully on Twitter suggested I name it that.

Dan Shipper (00:05:02)

I feel like OpenAI's naming convention is just polluting the whole AI world. Okay, this is really cool. I really want to do a demo of BabyAGI-2. But before we get into that, how has the advent of o1 and sort of better reasoning, how have you incorporated that? How has that changed what you think is possible with these kinds of things? Yeah. Tell me about that.

Yohei Nakajima (00:05:32)

So I've been coding with AI since DaVinci-002, which was very poor. I went to 003 and so on. Every time the model gets better, my projects get more complex because they can handle more. With o1-preview, I find that it's incredible at handling multi-file edits. Until o1, I never really worked on any multi-file projects. So when I released BabyAGI-2o, which was a full framework with a front-end as well, I don't think I could have done that with that o1-preview, at least not in the time I allocated to building.

Dan Shipper (00:06:07)

And what about the capabilities of these agents?

Yohei Nakajima (00:06:10)

The agents themselves are also getting much better. The coding agent I mentioned—Ditto. When I build, I do this thing where I often build with 3.5 Turbo to get the framework working. And then when the framework doesn't error, I'll upgrade the model. But yeah, when I use Sonnet 3.5 on it, I mean, it does so much better at coding than any of the prior models.

Dan Shipper (00:06:33)

And have you tried putting in o1 or are you afraid it'll take over the world?

Yohei Nakajima (00:06:38)

So o1-preview, specifically, the way I've designed the most recent projects, I'm using tool-calling and o1-preview doesn't support that. So I haven't played with it. So I've only used 4o or Sonnet 3.5, but given what I know about the strength of o1-preview, once I can integrate it, I suspect it will be even that much better.

Dan Shipper (00:06:57)

Yeah, they're going to do tool calling, they said, by the end of the year. So hopefully we'll get to see that drop. Cool. Can we see a demo? I want to see BabyAGI-2.

Yohei Nakajima (00:07:07)

Cool. So BabyAGI-2is a framework. Actually, you know what, honestly, I actually think it would make more sense to start with Ditto. So let's start with Ditto. Let's pull up Ditto—named after the Pokemon that changes into whatever Pokemon it's facing.

Dan Shipper (00:07:23)

Oh, sweet. And just to remind everyone, Ditto is the self-building coding agent, right?

Yohei Nakajima (00:07:29)

Yes. So think of it like the Replit agent or or Devin. I'm going to pull up Replit, which is where I build everything. As an Amateur developer building autonomous agents, I find that playing in a sandbox with a stop button feels very, very safe for me as opposed to running it on my computer.

Dan Shipper (00:07:51)

Like the big stop button?

Yohei Nakajima (00:07:53)

Yeah. There's just a stop button at the top. If something goes wrong, if I see a whole bunch of red, I can just push stop and it, and Replit will stop there for me. And that feels very safe. So I build and run everything in Replit.

Dan Shipper (00:08:03)

That's good. I love Replit. I use it all the time, especially when we do courses and stuff. And just for building these little one-off things, especially, or sharing them with people—like collaborating. It's really cool.

Yohei Nakajima (00:08:15)

Yeah, it's great. So, again, you see this main.py. It’s 500 rows of Python. When I run it, it'll ask me what I want. It just builds a form. So I can say—

Dan Shipper (00:08:33)

So for people who are watching, basically, you've got Replit open. You pressed run. Now we have a little website that says “Flask App Builder,” and it says, “Describe the Flask app you want to create.” And you're typing in “a game of snake.”

Yohei Nakajima (00:08:45)

And then I'll just click submit. And I think it's amazing that you can do this with a single file, right? So what it did was just created the routes—static and template folder. It's actually going through and building out— How do we make this easier to read? So it's using some tools to create routes, create directories, and then it went in and started building out the file. So if we go into main.py, we'll see the a couple of routes here. So it's trying to— Oh, okay. So this means it's finished. So what I need to do here is actually need to stop it and I'm going to just run this again. I haven't done anything except ask for a game of snake. And when I open it up, it should be a game of Snake. I'll just open it up in a new window to see if it takes keyboard controls. It does not take keyboard controls, but I mean, it's a frontend, backend, and it has worked previously before.

Dan Shipper (00:09:51)

That's really interesting. Okay, so basically, you said give me a game of snake. And now we have— I assume it's sort of some sort of HTML5. We don't really know what it is actually doing, right? But it built a JavaScript-based snake game.

Yohei Nakajima (00:10:10)

It gave me an index, it gave me JavaScript, it gave me styles, and it gave me a main.py, which serves the route, I guess. And then it, it kind of stored all the history of how it built it right here. And all of this was done by this single-file Python script.

Dan Shipper (00:10:26)

Can we make something else?

Yohei Nakajima (00:10:28)

Yeah. So now to run it again, I have to go and delete everything. It’s a one-time use script.

Dan Shipper (00:10:26)

Do you make copies of the—

Yohei Nakajima (00:10:40)

The way I usually use it— Actually, I think I'm in the Replit one that I shared publicly. I should actually be forking this anyway. Alright, let's do that. That's the right way to do it. What should we try to do? It should be simple though, because this is not as good as Replit. I don't know. I've done a to-do list app. It's a pretty simple one. It's a classic.

Dan Shipper (00:10:57)

Well, you tell me if this is too complicated, but this is the app that I have in my head that I kind of want to build, but I don't have time to build. I often have people that want to do a meeting with me. But scheduling that meeting all the time is really hard. And what I just want is a list of people who want to talk to me with their phone number, where when I have 15 minutes during the day, I can just pick them off the list, call them, and then check it off. I don't know how you want to express that, but I just want to be able to input names with phone numbers and then check off names on a list.

Yohei Nakajima (00:11:37)

For you to call.

Dan Shipper (00:011:38)

Yeah.

Yohei Nakajima (00:11:47)

I'll just try that. So, what did you say? You said an app to track the name and phone number of friends with a checkbox to track when I called them last, which isn't exactly, but it's close.

Dan Shipper (00:011:55)

It's about right. Yeah.

Yohei Nakajima (00:11:58)

And so first it created the routes, the static, the templates. So this is a Python Flask app builder. It says it created the directories. Now it's going to implement the HTML. So it has an index of the HTML that it created. It created the routes for it, I guess the backend, and then it's creating a CSS for it. And then it's creating a script.js, which is right here.

Dan Shipper (00:12:31)

And this all powered with Claude Sonnet?

Yohei Nakajima (00:12:34)

So I just updated to Sonnet 3.5, yeah. And it's creating a friends route. This is the backend route for tracking friends. So it's using just a dictionary as the database, and it says done. So if we close it and open it back up.

Dan Shipper (00:12:56)

Whoa. That's so cool. So basically, it's a track-your-friends app. It has you put in your name, you put in a phone number, you can add it and then you can—

Yohei Nakajima (00:13:10)

It doesn't do anything aside from that. But I mean, it's more or less what we described in a generator. It's a multiple-file app. And again, what's amazing is that can do this with a single loop through an LLM with five tools.

Dan Shipper (00:13:26)

Well, actually help me understand conceptually what's happening. I noticed, for example, that it's basically doing different iterations. So it takes the prompt and then it goes through an iteration of trying to turn that prompt into code. But it does that like five times or six times or whatever. What are the iterations doing?

Yohei Nakajima (00:13:36)

So when the app starts, it sets up the Flask app. It checks to see if an index.html exists. This is what triggers the form. If it exists, it'll just serve the app. But if it's not, it goes to the user input form. Once we have a user input, we send it to an LLM call. I'm using LiteLLM, which allows me to route between OpenAI and Anthropic.

And then it decides if it wants to use a tool. And it's actually the LLM at that point deciding if it wants to use a tool. If it says, yes, the tools it has is: create directory, which is creating a folder, create file—and the create file tool asks for the code to go in it too. So in using this tool, it actually just generates the code to put into the create file update. If it errors out. It can use an update file to update a file, or if it wants to make sure everything looks right, it can use fetch code to fetch a code file to review it. And then if it deems that it's complete, it'll call a task completed tool, which actually just exits the loop.

And then if it uses a tool, that gets sent into a second LLM tool prompt, essentially to come up with the response based on the tool call. When you use a tool call, there's this kind of a second kind of back and forth, which includes a tool call. And then I update a history and then the history I store as an array that I feed back into the prompt so that it constantly knows what it's done historically. And then it kind of just loops through until it deems the task is complete.

The transcript of AI & I with Yohei Nakajima is below.

Timestamps

Introduction: 00:00:59
BabyAGI and its evolution into a more powerful tool: 00:02:26
How better models are changing the way Yohei builds: 00:05:00
Using code building agent Ditto to build a game of Snake: 00:08:10
The ins and outs of how Ditto works: 00:13:24
How Yohei gets a lot done in little time: 00:19:21
Yohei’s personal philosophy around building AI tools: 00:21:50
How Yohei experiments with AI as a tech-forward parent: 00:33:13
Demo of Yohei’s latest release, BabyAGI 2o: 00:39:29
Yohei’s insights on the future of AI tooling: 00:51:24

Transcript

Dan Shipper (00:01:00)

Yohei, welcome to the show.

Yohei Nakajima (00:01:01)

Thank you. Good to see you again.

Dan Shipper (00:01:04)

Yohei Nakajima (00:01:31)

Yeah. Last March.

Dan Shipper (00:01:32)

Yohei Nakajima (00:01:51)

Dan Shipper (00:02:14)

Yohei Nakajima (00:02:39)

Dan Shipper (00:03:19)

Were those people explicitly inspired by BabyAGI?

Yohei Nakajima (00:03:23)

I think Autogrid and Cognosos were. I'm pretty sure they built something right after BabyAGI launched. Some of them were.

Dan Shipper (00:03:23)

That's so cool. It's like incepting. It's like manifesting.

Yohei Nakajima (00:03:36)

Dan Shipper (00:05:02)

Yohei Nakajima (00:05:32)

Dan Shipper (00:06:07)

And what about the capabilities of these agents?

Yohei Nakajima (00:06:10)

Dan Shipper (00:06:33)

And have you tried putting in o1 or are you afraid it'll take over the world?

Yohei Nakajima (00:06:38)

Dan Shipper (00:06:57)

Yeah, they're going to do tool calling, they said, by the end of the year. So hopefully we'll get to see that drop. Cool. Can we see a demo? I want to see BabyAGI-2.

Yohei Nakajima (00:07:07)

Dan Shipper (00:07:23)

Oh, sweet. And just to remind everyone, Ditto is the self-building coding agent, right?

Yohei Nakajima (00:07:29)

Dan Shipper (00:07:51)

Like the big stop button?

Yohei Nakajima (00:07:53)

Dan Shipper (00:08:03)

Yohei Nakajima (00:08:15)

Yeah, it's great. So, again, you see this main.py. It’s 500 rows of Python. When I run it, it'll ask me what I want. It just builds a form. So I can say—

Dan Shipper (00:08:33)

Yohei Nakajima (00:08:45)

Dan Shipper (00:09:51)

Yohei Nakajima (00:10:10)

Dan Shipper (00:10:26)

Can we make something else?

Yohei Nakajima (00:10:28)

Yeah. So now to run it again, I have to go and delete everything. It’s a one-time use script.

Dan Shipper (00:10:26)

Do you make copies of the—

Yohei Nakajima (00:10:40)

Dan Shipper (00:10:57)

Yohei Nakajima (00:11:37)

For you to call.

Dan Shipper (00:011:38)

Yeah.

Yohei Nakajima (00:11:47)

I'll just try that. So, what did you say? You said an app to track the name and phone number of friends with a checkbox to track when I called them last, which isn't exactly, but it's close.

Dan Shipper (00:011:55)

It's about right. Yeah.

Yohei Nakajima (00:11:58)

Dan Shipper (00:12:31)

And this all powered with Claude Sonnet?

Yohei Nakajima (00:12:34)

Dan Shipper (00:12:56)

Whoa. That's so cool. So basically, it's a track-your-friends app. It has you put in your name, you put in a phone number, you can add it and then you can—

Yohei Nakajima (00:13:10)

Dan Shipper (00:13:26)

Yohei Nakajima (00:13:36)

Dan Shipper (00:15:17)

And I assume there's a planning step. The first LLM prompt that you've got up there, it creates a plan of here's what I'm going to do. And then it checks off items on the plan—or how does that work?

Yohei Nakajima (00:15:28)

Yeah. So previously with BabyAGI, I had a separate planning LLM call, but now that the token links are long enough, I kind of just include in the same prompt that loops through like first planning out everything. I think it's in the, we can pull up the prompt here. Again, it's a single LLM loop, so we can just look at one prompt to basically understand the whole thing before templates, and static assets that need assets, and this really the only line that goes through.

But it says: “Understand the requirements, plan the application structure, implement step by step, review and refine, ensure completeness, do not modify main.py,” that's important because then it'll stop working and then finalize. And then it kind of describes that application files must be in template static and routes: “route should be modular, index.html will be automatically served to make sure to create it, don't use placeholders, don't ask the user for additional input.” And then it kind of describes the tools. So it's pretty simple. It's easy to improve. You can add a couple of things and it'll make it work better.

Dan Shipper (00:16:31)

That's really interesting. Okay, so you made the original one. You made the next one. For you, what is driving this experimentation? Obviously this isn't your day job. I guess it's related to your day job, and maybe you could even have made it your day job if you wanted to. So, what's driving it? And what's keeping you from just doing just this?

Yohei Nakajima (00:17:00)

That's a good question. I love being a VC. Early on in my career, when I got introduced to the startup ecosystem, I realized that founders are my favorite people in the world. And my north star has always been whatever role I can find where I can engage often and most with founders—and VC, I found, is a great role for that. So that's definitely what I enjoy the most as a job. This is more like an incredibly fun hobby. The books I've read historically have been around the brain mostly, and this feels like a weird kind of mix between how I've been thinking about and understanding myself and the brain combined with the current state of technology. And it just is this great mix where it feels like a hobby, but I also know that it helps my day job. So, to some extent, some VCs will learn things and write blog posts about it. This is my version of that. But instead of blog posts, it's code.

Dan Shipper (00:18:04)

Honestly, that's differentiated. I like it. Anyone can write a blog post. And I guess with what you're building, anyone will be able to build a tool pretty soon, but I think you're a little bit ahead of the curve.

Yohei Nakajima (00:18:20)

And also, I feel like I'm good at it, so I can't not work on it. But yes, I like being a VC too much to not be a VC.

Dan Shipper (00:18:27)

I feel like the core underlying idea that seems to be fascinating to you is loops and self-reference. Tell me about that.

Yohei Nakajima (00:18:37)

I mean, that was probably an accident, right? When I built BabyAGI, I wasn't actually trying to build an autonomous agent per se, but I was challenging myself to prototype an autonomous founder and that was inspired by HustleGPT, where people were using ChatGPT as a cofounder and doing whatever ChatGPT told them to do. And I was thinking to myself, I mean, if they're doing whatever ChatGPT says, why can't that be an AI as well? And so that brainstorming led to—and I have the original chat request that built BabyAGI-2, which I sent you—I can queue that up with the original prompt. But when I shared a demo, other people noticed that it could do more than be an autonomous founder. And then that led to BabyAGI. And when BabyAGI blew up, I think the loop happened to be in there. And that was what I became known for. So that's going to become my brand to some extent.

Dan Shipper (00:19:33)

Should you rename your firm Loop Capital or something?

Yohei Nakajima (00:19:38)

No. Untapped Capital is a great name.

Dan Shipper (00:19:42)

It is a very good name. So I guess the other interesting thing to me here, which I imagine would be on people's minds is: This is not your day job. You have kids. How are you finding time to go do the tinkering? When are you doing it? And how are you able to do it so consistently given all the other things you have going on?

Yohei Nakajima (00:20:10)

I only build at night, mostly weekends, sometimes weekdays after I get my kids down, which means it's like 10–12 or 11–1. It's my short period. I code in bursts—I kind of do a little quick burst and I might even do it while I'm doing dishes where I ask o1-preview a question and I'll go do something for five minutes, come back, copy-paste it, interrupt, let it run. And if there's an error, I'll just cop- paste that error in the Replit and just go do something else for five minutes. And so I code like that a lot especially combined with mobile, where I have ChatGPT and Replit on my phone. I also copy-paste things back and forth between ChatGPT and mobile. And so that expands my ability. If I'm picking up the kids and I get to school five minutes early, I can sit there for five minutes iterating on a project I've been building.

Dan Shipper (00:20:56)

I love that. I've actually found that too. The place where I found it is in Devin. I don't really use Devin for big projects beause it's not ready for it yet, but if you have a little idea that you want to test out, or you have a little feature you want to build, you can just spin up four different Devins and have them working on all these different things. And they're going to get stuck or they're going to need your input every once in a while, but it's the kind of attention where it's like when a coworker pings you on Slack and they're like, hey, can you like look at this for a second? And you can just pop in and be like, here's what to do. And then you pop out, it turns coding into a task that only requires that level of attention, which is like a totally different thing than it used to need to be, which is you're in a deep flow and there's no distractions for several hours or whatever. And yeah, it sounds like that's how you've been able—at least to some degree—to incorporate this into your life. You can build with fragmented attention now.

Yohei Nakajima (00:21:59)

You can be a manager, right? I choose when my code needs me. I can be gone for 20 minutes and I'll come back and you'll still be exactly where I thought you'd be. I'm going to ask you to do something else. And then you're going to go do it while I go do something else. It's exactly how a manager operates. It's it's fun and weird.

One thing I wanted to bring up, which I think is relevant to the whole podcast and how I view these tools is: I often think about tools as an extension of oneself. I should probably look it up, but I think there's been studies on how if you take somebody who's been who's been using a hammer for their whole life and you just scan their brain while they're using it, they almost treat it as part of their extension. And I think we do that with many tools we use regularly. And I think when we're applying for jobs, when we say we can do something, we're talking about what we can do, assuming we have access to the technologies that we're used to using. And so I often do think of these AI tools that I'm building as an extension of myself. And so when I'm working on these tools, to some extent, I am also thinking of it as like working on myself because the tools I use are an extension of me.

Dan Shipper (00:23:08)

That's really interesting. And how have these tools changed you—changed who you are or how you think of yourself?

Yohei Nakajima (00:23:16)

Well, I can do more in parallel, right? I have an AI due diligence tool. I use calls that will generate 20–40-page industry reports for me in 30 minutes. So, if there's an industry, I want to learn about our company, I want to dig in on, I can ask it to go do something and then go work on something else. And then 30 minutes later, I can pull up this report and scan through it. And yes, I am using an external tool, but if I think about it I just think about how much can I personally get done, then having AI do things in parallel is just an extension of what I'm capable of doing.

Dan Shipper (00:23:52)

Right. That makes total sense. I think what I'm asking about is okay, so Yohei today, because you can parallelize all these tasks and you can get work done with fragmented attention is capable— You're a more powerful person. Yohei, to get done what you need to do in the world vs. who you were maybe three years ago, even though maybe neurologically or biologically, not that much has changed. The only thing that's changed is your ability to interface with these tools that are now around and obviously interfacing with tools changes your brain, all that kind of stuff. But what I guess I'm trying to get at is if I told you like four years ago before any of this stuff happened, you were going to be able to have these powers where you're parallelizing all your tasks, you can get way more done, I think you would probably have an idea of what that would feel like and what it would be to be that kind of person. And I'm wondering how that feels to you now that you're now that you're in it and it's you it's like part of your daily day-to-day reality.

Yohei Nakajima (00:25:04)

On one hand, it feels extremely empowering because I'm just getting more done. On the other hand, it feels overwhelming because my brain is not used to the throughput of information and tasks that I'm getting done. So I do spend a lot of time slowing down and rethinking or organizing just because I can't just impel to do too many things if I can't follow up correctly. If I can't if, I can't. So there's a lot of balancing there. So those are, I think, the two kinds of opposing ends of how it is feels.

Dan Shipper (00:25:38)

You said that working on this stuff is like working on yourself.

Yohei Nakajima (00:25:44)

Yes.

Dan Shipper (00:25:45)

What are you trying to change about yourself?

Yohei Nakajima (00:25:48)

Well, I guess in this case, specific to BabyAGI and this autonomous agent, I feel like we're close to being able to build something that can truly be helpful, remember everything we're working on and be able to handle a lot of my new tasks and truly be an assistant that can help me with everything I need to work on. If I can get that working, I feel like I can really increase my throughput significantly. And then all the patterns related to making it work, I think, are relevant to places and spaces I should invest in. So there's also that benefit on the work side right now. It's not going to directly answer your question, but a lot of the ways I think about building is looking at when it can't do something, thinking about how I would have solved it, trying to abstract the highest level pattern. So it's not too specific and seeing if I can provide that into the prompt or system architecture so that it won't run into the same issue in the future. And so a lot of it is: watch something, do something, and then if it doesn't work really thinking through how I would do it and then trying to extract that pattern and then feeding it back into it. But through that, I'm also just understanding myself better because I have to reflect on why do I solve it this way when this doesn't?

Dan Shipper (00:26:55)

Do you have an example of an interesting pattern—and I guess these are sort of meta skills—an interesting meta skill that you've learned about yourself from trying to do a task that an AI was failing on?

Yohei Nakajima (00:27:14)

This seems kind of obvious. But in earlier versions of BabyAGI, I started giving it tools and web search using SerpApi, which is a Google search tool, and then web scrape, which was go to the website and grab all the data were separate tools. And I found them sometimes not working well together. For whatever reason, it would do a search and then would do a search again afterwards without scraping the sites. What I realized was that anytime I use Google Search, I would always click on a site and read it. The scrape and the search, when I used it, were two separate kinds of tools. But they're wrapped into a larger tool, which is a search and scrape until I find information I need. So I realized that the actual tool to provide the agent was this wrapper tool around two smaller tools that the agent had figured out how to combine. In my case, I figured out how to combine the two together and I used it the same way. So that was an interesting realization about how we have these kinds of core skills that then we combine into bigger skills, which combine into bigger skills. And if you want to break down Google Search into more skills, I mean, typing on keys and moving the mouse are also skills that are part of, in our case, a Google Search skill.

Dan Shipper (00:28:33)

Yeah, that's an interesting one. And I'm trying to unpack the meta of that, which is sort of how you got from the problem to the solution. The problem is that the AI is trying to use two separate tools that need to be connected and normally are connected. But that's different from my experience with them, where they are wrapped together and so what I need to do is wrap them together.

Yohei Nakajima (00:28:58)

I think I know where you're going. And I think in this example, I didn't do it, but when we figure out how two tools work together, it might take a couple of iterations, but once we figure out how to get it working together, we can repeat that process easily. We learned from our earlier iterations, that wasn't happening. I think that that was really the issue, right? Ideally, I wouldn't have to wrap it myself, but it would figure out out on its own, hey, actually when we do a search we need to scrape afterwards. And when we do that, we complete the task. Therefore, in the future we will do it that way.

Dan Shipper (00:29:35)

The thing I'm trying to get at is a skill like that. It's actually really hard to put it into words what that is, because any way that you put it into words is too specific. Does that make sense? It's almost like some subsymbolic activity like intuition or reason that can't be—it's hard to logic it. And therefore it's kind of hard to put it directly into a model, but seems to maybe arise as the model scales. Do you see what I'm getting at or am I being crazy?

Yohei Nakajima (00:30:11)

Yeah. I mean, if you're talking about the model, I do think there's going to be much more kind of constant fine tuning of models as we use them. And eventually I feel like that's going to get personalized where whatever AI system you're using, your usage of it is going to fine tune the underlying models that the AI system is using, but when I use it, it's going to be fine tuning a separate model so I feel like that seems like a pattern that would come out.

Dan Shipper (00:30:43)

No, I like that. I mean, This is exactly why I like AI stuff, or it's one of the reasons why I really like it. I think they're very good mirrors for ourselves and that happens on like different levels. On one level, it's just really good at being, here's what I see you saying, and here's what your personality is and all that stuff. It actually reflects back facts to you in this really nice way—or perspectives.

But then the other way that I think it's really interesting and the way it relates to the self is: We have this really long history of using technology as metaphors for who we are and how our minds work. So one of my favorite examples is, Plato sort of likened the mind to a wax tablet where memories were things that were inscribed on the wax. And that's because wax tablets were all the rage back in 400 BC. Another really good one is a lot of early psychoanalysis—like Freud time stuff. When you talk about repressed emotions, the model is if you repress your emotions enough, they come out in all these other weird ways. You push them down and there's pressure and the pressure kind of comes out and that's based on pneumatics. The steam engine was the metaphor. And I think even today, for pre-AI computing, we have a lot of metaphors that we will sort of in our minds to those kinds of computers. So it's like. I don't know: I ran, I didn't have enough bandwidth, I crashed. You know? I had a little bug. And I think that language models give us this new metaphor by which to understand how our brains work. So I already hear people saying, oh, that's not in my training data or—

Yohei Nakajima (00:33:00)

Sorry, I just hallucinated that.

Dan Shipper (00:33:10)

Yeah. All that kind of stuff. And, and the reason I love that, the reason I think that's such a cool thing is that previous iterations of those metaphors which were really scientific—like computers or the steam engine or whatever. They’re typically really rational. The reason we like computers is because they're very step-by-step.

Yohei Nakajima (00:33:19)

They do exactly what you tell them to do for better or worse.

Dan Shipper (00:33:24)

And I think language models are the first technology we’ve ever created that's not like that. And that by its very nature operates in this way that's almost sort of human intuition where you can't really totally explain it or totally predict it. But for the first time, having a metaphor for that’s out there, that's a tool I think might help us understand more about like how our own intuition works and why our own intuition is a really valuable partner to our rational minds. So that's my hope. And that's how I see it has sort of changed my own perspective.

Yohei Nakajima (00:34:01)

That I agree with that. I feel like building BabyAGI does feel like a self-reflection process, trying to teach, trying to get it to figure out how to learn to do new things. And then simultaneously I have three kids—7, 5, and 2. So I'm trying to do that with bio-agents at the same time.

Dan Shipper (00:34:23)

That's one way to put it.

Yohei Nakajima (00:34:25)

They're better prepared. They don't need me as much, but there's interesting parallels. Are they using this stuff yet? No, not agents, but where they're playing, we play with AI together. Early on, we started doing DALL-E sessions where, specifically with the kids, we would all come up with a theme or something we'd want to see in the picture.

And we'd combine that into one image. So it could be “unicorn,” “ninja,” “rainbow” would be the three different words that the three kids threw out. And then we would get one image that combines this way. So that was fun. And then we started doing that with Suno as well. So for music, they'll request music and I'll have them each come up with some topics and then we'll generate a song that combines them. What do they tend to gravitate to? Unicorns, princesses, zombies, ninjas.

Dan Shipper (00:35:17)

I love it. Your kids sound awesome. I think because one of my questions— I don't have kids, but I have a nephew, he's 2. One of my questions is—and if we're on the topic of changing ourselves— we're mostly baked as people. Obviously there's neuroplasticity, you change to some degree or whatever, but kids are way different. And I'm always kind of interested in what it will be like to grow up in a world that's like this, where any question can get answered immediately in the way that you want it. It just seems very different. And I feel like parents are kind of at the bleeding edge of watching that happen. What have you observed?

Yohei Nakajima (00:36:03)

don't hand it off to them. When we play with AI, especially. I mean, it's usually me holding it and guiding the experience. So I don't know if I have too much to say to that. But I remember talking to some parents who were talking about that they took the Alexa way—and I've heard this, I'll read this elsewhere too—just cause the kids using Alexa we're so used to just making commands and it was just a bad habit to pick up where they don't say “please.” They don't say please, just ask for things and get it. And they felt like it was not the greatest thing.

Dan Shipper (00:36:37)

That's so interesting. They got to make an Alexa that only responds if you say, please.

Yohei Nakajima (00:36:43)

Yeah, you need a kid's AI—you should do that. The kids have to be polite. They should be.

Dan Shipper (00:36:48)

And you said there's parallels between building BabyAGI and children. I'm kind of curious what the parallels are. Where are the overlaps or what have you learned that's sort of similar or maybe different?

Yohei Nakajima (00:37:06)

That's an interesting question. I don't know if I've actually thought about it that much. I mean, it's definitely different with kids. Kids feel much more unpredictable. As unpredictable as LLMs are, kids just feel more unpredictable. We have three kids, but they're all completely different. So there's some base that's very, very different. But it's less parallel, but I guess it's more me trying to learn from my kids and seeing how that would apply to BabyAGI. And seeing information repetitively becomes something that, if you see two things that often together, those things are more strongly correlated in your mind, right? And then we see that all the time with kids. With the current BabyAGI, that's not necessarily the case. So how do we bake that in? So that leads me there. That kind of thinking led me to thinking of graph databases as a good data structure for storing knowledge, because then you can start adding weights to edges or relationships between objects. So, it's not like a direct parallel, but it's an example of something I noticed that then turned into an idea to try with BabyAGI.

Dan Shipper (00:38:16)

That is interesting. I mean, but it also reminds me of the training process. In order to train one of these things, it has to see the same stuff over and over and over again. And that's like one of the knocks on transformers is that they're only good for things they've seen over and over again. But if you look at kids, kids are repeating stuff all the time constantly, which is kind of an interesting thing. And I think your point about the chaos of kids is an interesting one too. It makes me wonder if we're not going to get AGI until AIs are allowed to just try lots of dumb stuff and they're not allowed to be curious in the way that kids are curious, where they'll just do something completely ridiculous. You give them an object and they'll do something that you'd never thought could be done with that object. And and that seems like it it's such an important part of becoming intelligent that I don't think AIs are like doing right now. I don't know. What do you think?

Yohei Nakajima (00:39:19)

Totally random but related idea. I would love to see an island somewhere where we can run and test an autonomous robot society.

Dan Shipper (00:39:30)

Did you see did you see there's this thing— I ran into this random Twitter thread the other day where there's some Discord apparently for AI researchers where they've let loose a bunch of AI bots on the Discord and you can just watch them all interacting.

Yohei Nakajima (00:39:46)

Yeah. Is that the Truth Terminal goat nonsense?

Dan Shipper (00:39:51)

I'm not fully following it, but, yeah, they started mimicking goatse or whatever that guy from the 2000s was. And then Marc Andreessen—

Yohei Nakajima (00:40:04)

There's some meme coin with some large value where the AI holds a big wallet. I mean, I'm sure it's orchestrated to some extent on the backend, but it's pretty fascinating.

Dan Shipper (00:40:12)

But Marc Andreessen staked them like $50,000 to try to escape or something like that. I don't know. We'll put it in the show notes, but that kind of thing is, I mean, it's wild, but also, I think that's really interesting. And I do wonder right now, for example, BabyAGI, it has a max number of iterations on the loop. And it has a specific goal that it's trying to achieve and I do wonder to the extent that you just start allowing these things to just loop forever, if they start doing things that, on the 10,000th iteration, you didn't think were really necessarily possible.

Yohei Nakajima (00:40:45)

Yeah, actually, the first BabyAGI usually did not have max iterations. I mean, that was probably the biggest criticism of it. I had to press stop and replicate to get it to stop because we just keep trying to think of things to do. But it wasn't ready because it couldn't take action. But this new one— You want to see BabyAGI-2? Might as well do another demo.

Dan Shipper (00:41:01)

I definitely do.

Yohei Nakajima (00:41:01)

This was inspired by Ditto, which I shared two days ago. And then people commented on it, asking if the tools are dynamic. And I was like, no, but that's a great idea. And then last night I was in bed about to go to sleep and I was like, I gotta try this. And I snuck downstairs and tried it and it worked. So this has is single LLM loop with, I think, three tools: create or update tool, install package, and task completed. And so, I think what I tested was “scrape Techmeme and tell me what you find.” We'll start with that and then I'll let you—

Dan Shipper (00:41:48)

Okay, so let me just set the stage here. So you've got this new version of BabyAGI—BabyAGI-2o. And BabyAGI-2o has three tools: create or update tool. So it has a tool that allows it to make a tool. That's the sort of recursive part of it. And if you just scroll up for a second, then it has an install package, so it can get software packages. And then task completed, so it can check off tasks that it's given itself.

Yohei Nakajima (00:42:20)

So the task completed is to say when to say when the entire task is completed.

Dan Shipper (00:42:24)

Oh, okay. Got it. Okay. That makes sense. And so you gave it a task, describe the task you want to complete: “Scrape Techmeme and tell me what you find.” And then it basically did—

Yohei Nakajima (00:42:33)

Yes. What it did was calling create or update tool. The create or update tool is what it called and it tried to create a tool called “Scrape Techmeme.” And it registered the tool and it created it and then in the iteration two it tried using the tool, but the result was empty. So it said it didn't go. Well, let me try to adjust the tool so “use the create or update tool to update Scrape Techmeme” and it still didn't work the second time and then iteration five it did it again, but this time it seemed like it worked. So Scrape Techmeme with errors. So this one actually—there's a result of script Scrape Techmeme. Iteration six, it did it fix itself. And then in iteration seven, it summarized it for me.

Dan Shipper (00:43:22)

That's really cool.

Yohei Nakajima (00:43:23)

And what's fascinating is that this scraping tool—I did not give it to it. It wrote its own scraping tool and then rewrote it twice until it worked.

Dan Shipper (00:43:34)

That's really cool. That's the sort of meta thing that I think is so interesting. It seems like the first version had a set of tools that could be used, and this one, the tool is you can make a tool.

Yohei Nakajima (00:43:49)

How meta can you make it, right? And the more meta you make it, the simpler it becomes. It's fascinating actually. It can actually build Flask apps because it actually initiates the Flask app and accidentally runs it. But create a Flask folder and build a to-do list with HTML, CSS, JavaScript. So this is a similar request to the earlier Ditto. Again, that was twice as many lines of code because I had to give it specific tools to create a directory, create a file. What I believe this will do is it'll create its own create directory, create file skills.

Dan Shipper (00:44:29)

Interesting.

Yohei Nakajima (00:44:30)

It actually didn't generate a folder, but it is creating the different content. It does similar stuff.

Dan Shipper (00:44:36)

So this can give itself—

Yohei Nakajima (00:44:39)

It didn't work it, but yes, it can give itself its own tools.

Dan Shipper (00:44:42)

Okay. What about a thing that can make its own AGI—make its own agent? Isn't that sort of the next level of meta or are we already there?

Yohei Nakajima (00:44:54)

No, I think so. I think how do you get it to self-improve is another one, I think. Right now, when I ask it to do something, it creates tools, it uses them, but then it throws those tools away, which isn't the right way to do it. The right way to do it is if it creates a tool and it works— Like in the case of Techmeme, by the third time it worked, we should be storing the one that worked and then reusing it next time there's a similar request.

Dan Shipper (00:45:19)

And not only that, there should be a public AI tool library that any other version of this can scrape from and get tools that have worked for anyone ever.

Yohei Nakajima (00:45:30)

Exactly. Which actually does take me to BabyAGI-2. This is not 2o, but 2, which is a full framework, which I had mentioned earlier. So 2o was a little side project inspired by Ditto. But because of what you just said, it makes sense to show 2o at this point. So, 2o is a framework for storing and executing functions from a database. So it comes with a dashboard, which is more for management. You don't need to actually use it, but it comes with a whole bunch of functions to actually interact with the actual database, such as add function, create function, display functions. And so these are all functions that can be used as tools by an LLM. And it comes with a description, input parameters, output parameters. Functions can call other functions, so we have dependencies. This also has import imports and installation baked in, and you can have functions triggered by other functions, which are called triggers. So the BabyAGI framework is more about just storing the functions themselves with an admin dashboard to be able to look at the code, look at logs. But going back to what we said, if you look at BabyAGI-2o, if it actually stored the function in this database, then it would automatically have logs for every single function call, it'll track errors, it could reuse code that I wrote previously. So I think my next step is going to be taking BabyAGI-2o, which I think is the right loop structure, but then leveraging this framework for storing and executing code functions.

Dan Shipper (00:47:12)

I think that's really cool. The thing that I like about this is that if you look at the success of AI on different coding tasks, I think pre-o1, on coding benchmarks, it's like 20 or 30 percent or something like that. Maybe it's getting better and better, but for a lot of coding tasks, it's definitely not a one-shot deal. And in general, I think when you run them in a loop, because it's stochastic, the further down the loop you get, the more and more likely it is for it just to go off the rails. And what I think is really interesting about this is that it has the property of allowing AIs to learn from experience, where they only ever have to get it right once. And then once it's right once, it can do it again. And for something like this, if you just let the agent run long enough in parallel, it's going to figure out what to do. And once it figures it out once it'll just be able to refer to it back to it. That seems exciting.

Yohei Nakajima (00:48:22)

Yeah. I think so. I think that's where we can get. And just to layer on, earlier we talked about how web search and web skill are skills, but then there's another skill that wraps it. So a lot of our skills actually depend on a whole bunch of other skills. My Google search skill depends on my typing on a keyboard skill, right? So there's a lot of skill dependencies. And so I actually realized that graphs are a fantastic way to track this. So what you're seeing here is all the functions you saw earlier, but with all the dependencies mapped out.

Dan Shipper (00:48:57)

Do you think this is the right level of the stack to solve this on where it's all this stuff is explicit vs. just making a smarter underlying foundation model?

Yohei Nakajima (00:49:08)

I don't know. I mean, ideally, I feel like the coding models are getting so good that I can see a future where I don't need this framework and 2o is enough, where if I ask you to generate skills on the fly, it always just generates the right skill. I feel like that's preferred. That being said, it does seem inefficient to run inference, to generate code every single time when you have the option of storing existing code that works.

Dan Shipper (00:49:35)

Right. That's interesting.

Yohei Nakajima (00:49:37)

Right. So from a cost perspective too, I do think a framework like this is still helpful. And it'll be faster, right? Because you can skip the time it takes to write the code. You can skip the potential error that might happen. And it should make it faster and cheaper if you can give it good memory.

Dan Shipper (00:49:54)

Basically, what I'm thinking about is what it reminds me of is in the ‘50s through the ‘70s and the sort of first wave of AI in the symbolic AI time period— I don't know, I assume you're familiar with it. But a lot of those approaches were about looking at the heuristics that people used to solve problems and then equipping a basically a reason or a program that had the ability to manipulate logical symbols with a set of tools that mapped to heuristics that people would use to solve problems and then seeing if those tools could end up becoming AIs, basically. And what they found was it could work in limited domains, but when the domain wasn't limited the the number of possibilities that it had to reason through would get so big that there wouldn't be enough computing power in the world to ever solve the problem. And so and that's why deep learning is really interesting because you kind of you sort of solve that problem because instead of going through every single logical possibility, you just generate a probabilistic, pattern-matched guess based on previous context that you've seen. And so what I'm thinking through is part of the thing that I think made symbolic AI brittle is having to have everything be explicit and things are really good when they're explicit in particular circumstances, but you have to narrow in the circumstances in a way that makes being explicit possible.

Yohei Nakajima (00:51:29)

I think I know what you're going for. I think internally you want the architecture to be as flexible as possible. Again, at a very high level, iInternally architecture has to be flexible enough that it's not designed for anything in particular, but I think where it makes sense to be more deterministic to some extent, right? Or deterministic is when the agent interacts with the external world, because we've agreed as a society to communicate in certain ways, right? Whether it be an API call or the size of a nut that you need for a wrench, right? Having this, it's actually right. The reason we need a specific size bolt is not a wrench but because we, as a society, have agreed that it makes sense to have standards. So we should be deterministic about the size of bolts. So I think that's where the deterministic parts come in. And I think most of the tools that I save tend to be external facing, right? Like using an API. And I think for those, it does make sense to be more deterministic again. It’s so that we're not randomly coming up with ways to try to talk to each other.

Dan Shipper (00:52:36)

Yeah, no, I love that. I think that makes a lot of sense.

Yohei Nakajima (00:52:39)

It’s interesting.

Dan Shipper (00:52:40)

Making my brain go in a lot of different directions. Yeah, is there anything else that you wanted to cover? I think for me, the thing that I'm interested in before we go is like where you see things going over the next, I don't know, six months, two years. Obviously very, very hard to predict further than that. But, what are you interested in? What did it mean? I think you have a really good sense of what is interesting now and what is going to be interesting. So, tell me about where you're exploring right now and where you think that'll lead in the next six months to a year.

Yohei Nakajima (00:53:21)

I think today from a business standpoint there's so many low-hanging fruits on things that can be automated that just we haven't— I mean, there's so many businesses out there. Most businesses could probably lower costs by figuring out how to use Zapier well, but they don't. So I think there's going to be inherent slowness in terms of full adoption. I think in the meantime, there's a lot of opportunities to solve problems for specific people. And that tends to be more deterministic, more focused on kind of common workflows. And I think businesses building those are going to be able to generate revenue faster. But I like founders who are thinking about that at the same time, thinking about how we solve every problem for this customer in the future. So as you're building workflows using AI to build work vertical apps to solve a specific problem that maybe leverages AI. And I like founders who are thinking about building those in modular ways and constantly thinking about, hey, actually these two parts of this workflow, if we combine it with another tool, can create another workflow that solves another big problem for this customer. And I think that feels like the right way to build businesses right now, which is again, customer-focused, problem-focused, but building it in a modular way that I can eventually, you can eventually see treating all these tools as being dynamically put together by an AI when kind of agentic architecture gets more flushed out.

Dan Shipper (00:54:50)

Do you have a specific example of someone that's doing that well right now?

Yohei Nakajima (00:54:59)

I mean, well, Wokelo AI, the one I mentioned, the AI due diligence tool, I think they do a great job of generating reports. Most of its deterministic in the flow, but they are building those modularly and building out tools where I can ask a specific question about a company or just maybe combine just tool to tools. I think Augie is another portfolio company of ours that generates these kinds of videos from scratch. I can give it a prompt. It'll generate the transcript for the video. If I give it a pick-a-voice, it'll generate the voice for it. It'll chunk the transcript into sections, and then for each section, we'll either go find a video clip or generate a video clip and then stitch it all together and basically provide a video for you. I mean, that's an incredible workflow, but it is very much like deterministic right now, but I think it's another example. They're very aware that you might jump into the workflow. Maybe you already have a transcript, or maybe you already have a media library that's your own. And so building them modularly allows them to kind of dynamically create more and more tools and functionalities that AI can put together for them.

Dan Shipper (00:56:07)

That makes sense. When you're evaluating companies like this in this space, how are you thinking about investing in companies? I assume they're either at the application layer or they're sort of at the application layer and sort of in the like infrastructure layer. How do you think about who's going to win in markets where the underlying technology is moving so quickly and whether or not the foundation model, for example, should be doing that job vs. the end user application? What do you think about the competitive landscape?

Yohei Nakajima (00:56:42)

I mean, the space is moving really fast. So my thoughts around this are constantly fluctuating. one of the ways I've approached it is just for context, we are pre-seed investors. We tend to invest very early. I tend to even invest sub-$10 million in valuation. One of the things I've figured out, for us as a pre-seed investor and thinking about where the right pre-seed bets are, I have found that the most obvious ideas don't seem too attractive for me and the reason is if it's an obvious AI mean, a coding agent is a great example. It's hard for me to invest in that space. It's almost too obvious an idea that I expect is going to be a lot of great teams building it. As a matter of fact, there's going to be some experienced founders who are going to go and raise a $5–10 million seed round to start companies and there's gonna be more than one of those. And on top of that, you have the risk of big model companies also tackling the space if it's too obvious of an idea. So for us we tend to invest a little bit outside what I would say the obvious areas, maybe a little bit more forward=thinking where not everyone's tackled it yet or not really thinking about it quite yet or slightly niche ideas.

Dan Shipper (00:57:55)

Yeah, is there anything else that you wanted to talk about before we end?

Yohei Nakajima (00:58:00)

I mean, we can do this again. There's plenty of always fun stuff to talk about, but no, I think this was great.

Dan Shipper (00:58:07)

I had a great time. I really appreciate you coming on to show us all this stuff. Where can people find you if they want to see more of your projects?

Yohei Nakajima (00:58:15)

Thank you. Probably the best place to find me is on Twitter or X, @yoheinakajima. I do have a build-in-public log, which is like my blog of stuff. I build at yohei.me. That’s M-E. And then if you're actually very specifically interested in autonomous agents I will soon be announcing autonomous agent-specific rolling funds. So that'll be a fun track. So keep an eye out for that one.

Dan Shipper (00:58:43)

That's awesome. Well I will be keeping an eye on it myself. But thank you so much for doing this with me. It's always a pleasure to talk to you. And I can't wait to do another one of these when o1 gets tool-calling. I would love to see where BabyAGI goes at that point.

Yohei Nakajima (00:59:02)

Awesome. Let's do it.

Dan Shipper (00:59:03)

Cool. Have a good one.

Yohei Nakajima (00:59:04)

You too.

Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex.

Subscribe to Every