Transcript of ‘Building AI That Builds Itself’

‘AI & I’ with Untapped Capital's Yohei Nakajima

Like 12 Comments

The transcript of AI & I with Yohei Nakajima is below.

Timestamps

  1. Introduction: 00:00:59
  2. BabyAGI and its evolution into a more powerful tool: 00:02:26
  3. How better models are changing the way Yohei builds: 00:05:00
  4. Using code building agent Ditto to build a game of Snake: 00:08:10
  5. The ins and outs of how Ditto works: 00:13:24
  6. How Yohei gets a lot done in little time: 00:19:21
  7. Yohei’s personal philosophy around building AI tools: 00:21:50
  8. How Yohei experiments with AI as a tech-forward parent: 00:33:13
  9.  Demo of Yohei’s latest release, BabyAGI 2o: 00:39:29
  10.  Yohei’s insights on the future of AI tooling: 00:51:24

Transcript

Dan Shipper (00:01:00)

Yohei, welcome to the show. 

Yohei Nakajima (00:01:01)

Thank you. Good to see you again.

Dan Shipper (00:01:04)

Good to see you too. For people who don't know, you are the general partner of Untapped Capital. But maybe more importantly, you're one of the coolest online AI tinkerers in the whole AI tinkering space. I just feel like every day I go on X and I see you releasing something new. You famously built the first open-source, autonomous agent, BabyAGI about a year ago.

Yohei Nakajima (00:01:31)

Yeah. Last March.

Dan Shipper (00:01:32)

Last March. And I did an interview with you on Every, I think, around that time too. And you just have this incredible array of tools that you've built yourself to make your work and your life better using AI and I'm just really excited to have you on the show to talk about tinkering.

Yohei Nakajima (00:01:51)

Thank you. You're too kind. I'm just really lazy and whenever I'm working on something that I don't like doing, I'm always asking myself how can I cut this out of my work? So even before AI, I was a pretty heavy no-code Zapier user. But then LLMs just unlocked so much. It's been so fun to just tackle one task at a time—try to remove it.

Dan Shipper (00:02:14)

Yeah, it's really awesome. I want to talk about— I feel like the arc that you've been on is you started off with this BabyAGI autonomous agent thing. And that honestly kicked off this hype wave about agents. So talk to us about BabyAGI and then tell us about the kind of the arc that you've been on since then, and sort of what you're building and what you're thinking about and where you think the future of this kind of thing is going.

Yohei Nakajima (00:02:39)

Yeah. So BabyAGI was, I guess, I’ll introduce essentially the idea of looping through an LLM. Part having an LLM generate a task list and parsing that by code and then tackling the tasks one by one—at that point, just using an LLM. But I think the 100 lines of code, the simple pattern, inspired a lot of people. I think the reason it was so popular is because of everybody who saw it. They could think of ways that they would make it better. And I think the simplicity is what kicked it off. Since then, from our fund, I've been able to invest in a handful of companies like E2B, AutoGrid, and a few more in the agent space. So it's been incredible working with those founders on thinking through how to build more reliable agents.

Dan Shipper (00:03:19)

Were those people explicitly inspired by BabyAGI?

Yohei Nakajima (00:03:23)

I think Autogrid and Cognosos were. I'm pretty sure they built something right after BabyAGI launched. Some of them were.

Dan Shipper (00:03:23)

That's so cool. It's like incepting. It's like manifesting.

Yohei Nakajima (00:03:36)

It's kind of an incubator, but more like a public open-source incubator where you're just incubating ideas publicly. And simultaneously, investing in the space, meeting founders, I've been building my own BabyAGI and kind of iterating on it. I took the first original BabyAGI and did about seven iterations last year called Baby Bee AGI, Cat AGI, Baby Deer AGI—with animal names all the way to Baby Fox AGI. And each time I was introducing a new design pattern. It was really a way for me to share ideas on what I think could make autonomous agents better. And then this year I started from scratch with a new idea and was kind of mulling around, playing around for the first six months and then something clicked last month and then I built out BabyAGI-2o, which is a framework for self-building autonomous agents, which has been my theme for this year, which is the idea of an autonomous agent that can build its own capabilities to improve itself. But yeah, that's kind of where we're at.

And then this week I released another kind of small script called Ditto, which is, I feel, more similar to BabyAGI because it's a super simple 500-line script that can build multi-file apps, kind of a little poor man's Devin. And then last night I figured out how to incorporate that design pattern into BabyAGI. So I'm pretty excited about it. I guess I'm going to call it BabyAGI-2o because Sully on Twitter suggested I name it that.

Dan Shipper (00:05:02)

I feel like OpenAI's naming convention is just polluting the whole AI world. Okay, this is really cool. I really want to do a demo of BabyAGI-2. But before we get into that, how has the advent of o1 and sort of better reasoning, how have you incorporated that? How has that changed what you think is possible with these kinds of things? Yeah. Tell me about that.

Yohei Nakajima (00:05:32)

So I've been coding with AI since DaVinci-002, which was very poor. I went to 003 and so on. Every time the model gets better, my projects get more complex because they can handle more. With o1-preview, I find that it's incredible at handling multi-file edits. Until o1, I never really worked on any multi-file projects. So when I released BabyAGI-2o, which was a full framework with a front-end as well, I don't think I could have done that with that o1-preview, at least not in the time I allocated to building.

Dan Shipper (00:06:07)

And what about the capabilities of these agents?

Yohei Nakajima (00:06:10)

The agents themselves are also getting much better. The coding agent I mentioned—Ditto. When I build, I do this thing where I often build with 3.5 Turbo to get the framework working. And then when the framework doesn't error, I'll upgrade the model. But yeah, when I use Sonnet 3.5 on it, I mean, it does so much better at coding than any of the prior models.

Dan Shipper (00:06:33)

And have you tried putting in o1 or are you afraid it'll take over the world?

Yohei Nakajima (00:06:38)

So o1-preview, specifically, the way I've designed the most recent projects, I'm using tool-calling and o1-preview doesn't support that. So I haven't played with it. So I've only used 4o or Sonnet 3.5, but given what I know about the strength of o1-preview, once I can integrate it, I suspect it will be even that much better.

Dan Shipper (00:06:57)

Yeah, they're going to do tool calling, they said, by the end of the year. So hopefully we'll get to see that drop. Cool. Can we see a demo? I want to see BabyAGI-2.

Yohei Nakajima (00:07:07)

Cool. So BabyAGI-2is a framework. Actually, you know what, honestly, I actually think it would make more sense to start with Ditto. So let's start with Ditto. Let's pull up Ditto—named after the Pokemon that changes into whatever Pokemon it's facing.

Dan Shipper (00:07:23)

Oh, sweet. And just to remind everyone, Ditto is the self-building coding agent, right?

Yohei Nakajima (00:07:29)

Yes. So think of it like the Replit agent or or Devin. I'm going to pull up Replit, which is where I build everything. As an Amateur developer building autonomous agents, I find that playing in a sandbox with a stop button feels very, very safe for me as opposed to running it on my computer.

Dan Shipper (00:07:51)

Like the big stop button?

Yohei Nakajima (00:07:53)

Yeah. There's just a stop button at the top. If something goes wrong, if I see a whole bunch of red, I can just push stop and it, and Replit will stop there for me. And that feels very safe. So I build and run everything in Replit.

Dan Shipper (00:08:03)

That's good. I love Replit. I use it all the time, especially when we do courses and stuff. And just for building these little one-off things, especially, or sharing them with people—like collaborating. It's really cool.

Yohei Nakajima (00:08:15)

Yeah, it's great. So, again, you see this main.py. It’s 500 rows of Python. When I run it, it'll ask me what I want. It just builds a form. So I can say—

Dan Shipper (00:08:33)

So for people who are watching, basically, you've got Replit open. You pressed run. Now we have a little website that says “Flask App Builder,” and it says, “Describe the Flask app you want to create.” And you're typing in “a game of snake.”

Yohei Nakajima (00:08:45)

And then I'll just click submit. And I think it's amazing that you can do this with a single file, right? So what it did was just created the routes—static and template folder. It's actually going through and building out— How do we make this easier to read? So it's using some tools to create routes, create directories, and then it went in and started building out the file. So if we go into main.py, we'll see the a couple of routes here. So it's trying to— Oh, okay. So this means it's finished. So what I need to do here is actually need to stop it and I'm going to just run this again. I haven't done anything except ask for a game of snake. And when I open it up, it should be a game of Snake. I'll just open it up in a new window to see if it takes keyboard controls. It does not take keyboard controls, but I mean, it's a frontend, backend, and it has worked previously before.

Dan Shipper (00:09:51)

That's really interesting. Okay, so basically, you said give me a game of snake. And now we have— I assume it's sort of some sort of HTML5. We don't really know what it is actually doing, right? But it built a JavaScript-based snake game.

Yohei Nakajima (00:10:10)

It gave me an index, it gave me JavaScript, it gave me styles, and it gave me a main.py, which serves the route, I guess. And then it, it kind of stored all the history of how it built it right here. And all of this was done by this single-file Python script.

Dan Shipper (00:10:26)

Can we make something else?

Yohei Nakajima (00:10:28)

Yeah. So now to run it again, I have to go and delete everything. It’s a one-time use script.

Dan Shipper (00:10:26)

Do you make copies of the—

Yohei Nakajima (00:10:40)

The way I usually use it— Actually, I think I'm in the Replit one that I shared publicly. I should actually be forking this anyway. Alright, let's do that. That's the right way to do it. What should we try to do? It should be simple though, because this is not as good as Replit. I don't know. I've done a to-do list app. It's a pretty simple one. It's a classic.

Dan Shipper (00:10:57)

Well, you tell me if this is too complicated, but this is the app that I have in my head that I kind of want to build, but I don't have time to build. I often have people that want to do a meeting with me. But scheduling that meeting all the time is really hard. And what I just want is a list of people who want to talk to me with their phone number, where when I have 15 minutes during the day, I can just pick them off the list, call them, and then check it off. I don't know how you want to express that, but I just want to be able to input names with phone numbers and then check off names on a list.

Yohei Nakajima (00:11:37)

For you to call.

Dan Shipper (00:011:38)

Yeah.

Yohei Nakajima (00:11:47)

I'll just try that. So, what did you say? You said an app to track the name and phone number of friends with a checkbox to track when I called them last, which isn't exactly, but it's close.

Dan Shipper (00:011:55)

It's about right. Yeah.

Yohei Nakajima (00:11:58)

And so first it created the routes, the static, the templates. So this is a Python Flask app builder. It says it created the directories. Now it's going to implement the HTML. So it has an index of the HTML that it created. It created the routes for it, I guess the backend, and then it's creating a CSS for it. And then it's creating a script.js, which is right here.

Dan Shipper (00:12:31)

And this all powered with Claude Sonnet?

Yohei Nakajima (00:12:34)

So I just updated to Sonnet 3.5, yeah. And it's creating a friends route. This is the backend route for tracking friends. So it's using just a dictionary as the database, and it says done. So if we close it and open it back up.

Dan Shipper (00:12:56)

Whoa. That's so cool. So basically, it's a track-your-friends app. It has you put in your name, you put in a phone number, you can add it and then you can—

Yohei Nakajima (00:13:10)

It doesn't do anything aside from that. But I mean, it's more or less what we described in a generator. It's a multiple-file app. And again, what's amazing is that can do this with a single loop through an LLM with five tools.

Dan Shipper (00:13:26)

Well, actually help me understand conceptually what's happening. I noticed, for example, that it's basically doing different iterations. So it takes the prompt and then it goes through an iteration of trying to turn that prompt into code. But it does that like five times or six times or whatever. What are the iterations doing?

Yohei Nakajima (00:13:36)

So when the app starts, it sets up the Flask app. It checks to see if an index.html exists. This is what triggers the form. If it exists, it'll just serve the app. But if it's not, it goes to the user input form. Once we have a user input, we send it to an LLM call. I'm using LiteLLM, which allows me to route between OpenAI and Anthropic.

And then it decides if it wants to use a tool. And it's actually the LLM at that point deciding if it wants to use a tool. If it says, yes, the tools it has is: create directory, which is creating a folder, create file—and the create file tool asks for the code to go in it too. So in using this tool, it actually just generates the code to put into the create file update. If it errors out. It can use an update file to update a file, or if it wants to make sure everything looks right, it can use fetch code to fetch a code file to review it. And then if it deems that it's complete, it'll call a task completed tool, which actually just exits the loop.

And then if it uses a tool, that gets sent into a second LLM tool prompt, essentially to come up with the response based on the tool call. When you use a tool call, there's this kind of a second kind of back and forth, which includes a tool call. And then I update a history and then the history I store as an array that I feed back into the prompt so that it constantly knows what it's done historically. And then it kind of just loops through until it deems the task is complete.

Create a free account to continue reading

The Only Subscription
You Need to Stay at the
Edge of AI

The essential toolkit for those shaping the future

"This might be the best value you
can get from an AI subscription."

- Jay S.

Mail Every Content
AI&I Podcast AI&I Podcast
Monologue Monologue
Cora Cora
Sparkle Sparkle
Spiral Spiral

Join 100,000+ leaders, builders, and innovators

Community members

Already have an account? Sign in

What is included in a subscription?

Daily insights from AI pioneers + early access to powerful AI tools

Pencil Front-row access to the future of AI
Check In-depth reviews of new models on release day
Check Playbooks and guides for putting AI to work
Check Prompts and use cases for builders

Comments

You need to login before you can comment.
Don't have an account? Sign up!