Transcript: 'How Two Engineers Ship Like a Team of 15 With AI Agents'

The transcript of AI & I with Every's Cora engineers Kieran Klaassen and Nityesh Agarwal is below. Watch on YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:01:16
Why Kieran believes agents are turning a corner: 00:03:18
Why Claude Code stands out from other agents: 00:06:36
What makes agentic coding different from using tools like Cursor: 00:11:58
The Cora team’s workflow to turn tasks into momentum: 00:15:20
How to build a prompt that turns ideas into plans: 00:23:07
The new mental models for this age of software engineering: 00:34:00
Why traditional tests and evals still matter: 00:39:13
Kieran ranks all the AI coding agents he’s used: 00:42:00

Transcript

(00:00:00)

Dan Shipper

Kieran. Nityesh. Welcome to the show.

Kieran Klaassen

Thank you.

Nityesh Agarwal

Thanks so much for having us, Dan.

Dan Shipper

I'm psyched to have you. So for people who don't know, both of you work on Cora, which is everybody's AI email assistant. Kieran, you're the GM. Nityesh, you're an engineer. And beyond the fact that Cora is a really cool product and I'm really excited to bring that to everybody who listens to this show or watches this show, I wanted to do an episode with the two of you because I think that you're figuring out a new way to do engineering because really Cora has two people on the team, but it really feels like there's 15 because you've got you've got agents who are pulling down PRs and working on branches, and then you're pushing them up and other agents are reviewing, and it's just this kind of crazy thing. It's a new way to build software.

And Kieran, you said something the other day that really stuck with me, which is, you're figuring out how to do compounding engineering. So with each piece of work you do, you're making it easier to do the next piece of work. And I just think that it's really important to bring what you guys are learning to everybody that watches the show because we have new tools and so we need new principles and new workflows for using those tools. And so I'm really excited to talk to you about that.

Kieran Klaassen

Yeah, thanks. It's really fun to build Cora, but being part of Every, and being in an environment where you get access to tools, access to thinking, access to exciting new ways to work really helps us rethink how we build. So it's really an experiment. We're building a product, Cora, but at the same time we're figuring out how we should build. And that's super interesting. And right in the middle where people say, what do you think of this new model? How do we use this research tool? And we're just trying things out and Nityesh and I, we've been really feeling a shift in the last weeks, I would say, where we're like, holy shit, things are changing and we're not the only ones. We hear other people say that as well. But not a lot of people. And what we've learned is a lot and we want to share a little bit of what we learned and also what we know is we're just barely starting. We're scratching the surface of this. And it's a big shift that's happening right now by new models, by how people think, by MCP. It's great to talk about that from different perspectives.

Dan Shipper

I agree and I think it is so special to be at Every, because every day there's someone new in the Discord who's like, I built an AI agent, do you want to use it before we launch it? And so we get access to OpenAI models before they come out and sometimes Anthropic models. And so we have this early edge and then you guys are so good at figuring out how to actually incorporate them into a production process. So you said Kieran, that something changed. So I guess I want to get a sense of what you think changed. Draw the broad strokes of what the workflow is that's starting to emerge for you guys.

Kieran Klaassen

Yeah, for me, obviously it's everything coming together, but I think the biggest thing is a realization in myself that coding with AI is more than just the coding part. And it's really about utilizing it for research, for workflows, for everything. It should be used for everything. And we're now at a point where the agents are good enough that they can actually do everything. So we need to rethink again, hey, Cursor, Windsurf, the old school way of coding was great. More of the vibe coding. That was one step. And then now, it's the realization, oh, actually we can just give a task and it will do it, but still the work needs to be done by like, what do we do? How do we do it? And just a realization that we should lean into that more and really go deep and. It’s Claude Code, it's just good coding agents or agents available that actually start to work with new models like Claude 4 that are really good at following directions and instructions and it's all of that coming together that I realized, oh, we're here. The future is here. This thing we've been talking about that was going to be the agentic evolution, suddenly it works, and it's working in real world non-experimental playing. It is just we're building an app and it's working, building the app.

Dan Shipper

So what I'm hearing is it's not just about developing with AI. It’s all the things that go into developing that you're using AI for, and that the thing that you're using the most for this is Claude Code. Is that right? And if it's right, tell me— For people who don't know Claude Code or haven't used it. Give us a little introduction to Claude Code and then tell us about exactly how you're using it.

Kieran Klaassen

Yeah. Claude Code is basically the coding agent version from Anthropic that uses Claude under the hood and it runs in your terminal as a CLI tool, which is kind of a—

Dan Shipper

Do you want to share your screen and show us?

Kieran Klaassen

So Claude code is a tool that you use in your terminal. And I know for non-technical people this is scary. But I've converted friends who were not technical to use Claude Code and they're like, oh, this is great. It's really simple. You just hit you start your terminal, you say Claude and an interface will pop up.

Dan Shipper

And basically for people who are listening instead of watching, he's in his terminal. It's the classic black screen that feels like you're using DOS or something. And he just typed Claude and then we just got a thing that says, welcome to Claude Code. And there's a little text box for him to type in any command.

Kieran Klaassen

Yeah. And why is this different or what makes this different? This has access to the directory or the computer, so it can look through files on my computer. Already it can run things on my computer. It can take screenshots of websites, it can search the web, but way more tools than available in a normal Claude version. That's important because engineering work, like building stuff, you do need more tools than just the basics. You need GitHub to see what you need to build or what the status is or what the CI pipeline does. Do the test fail, like having all these things available in one coding agent. Actually makes it possible for me to have a workflow or a thing I do actually done by an agent. And that's the important thing, really the compound word comes in by doing more than just coding because lots of if you talk to an engineer, like most of the work is maybe coding but maybe it's actually 20 percent, maybe 80 percent of the work is figuring out what to do next, or understanding what people, what their feedback is and how to interpret it. And what you can do here is, for example, a fun way to use it, say let's say what did we ship in the last week? So it knows stuff. So I'm asking it what we shipped and it will most likely look at the Git log because that's how we track what we did ship. And yeah, so it looks through the Git log. It looks at what we merged to main and that's a fun way to use it. And for example, we can use this for product marketing. And it says, oh, these are the bug fixes. a brief skip functionality, chat panel, state email summary, XML text, major features brief health monitoring, time zone, auto detection. These are all things we released. And now I can say, and it's written in a nice writeup that anyone can read.

Dan Shipper

It's more technical. And it's actually a lot for two people, there's what, six major features and five important bug fixes and three infrastructure updates? That's a lot.

Kieran Klaassen

Yes. It's a lot. and this week we've been really leaning into, let AI do the work for us and we're just managing the AI one. One other thing, for example, is if you have someone come to you like, oh, what is the status on this? Or what are you going to ship next week? Let's see what it will do. You see what is in the pipeline and what will come out soon.

Dan Shipper

So, this is awesome. Nityesh, while this is going if you, if you want to jump in at any point, feel free to at some point. I'll lob it at you, but also feel free to jump in.

Kieran Klaassen

I don't know if it has project access, but you get the gist, if you have the information connected to the agent, it's very easy to use it and it's very important to use a tool you're familiar with. And at this point, I think Claude Code works the best for me. It is the most flexible because it doesn't only solve coding issues, and that's important.

(00:10:00)

Kieran Klaassen

Lots of these coding agents are made to code, but I want to do more than coding. I want it to be like a support in engineering in general. And I think the Claude team really thought about that. They made it not too specific and they kept it general while actually being really good at solving things and looking at what it did, thinking about the mistakes it made and self-correcting. So that is stuff coming together that's very hard that makes it possible to use now.

Dan Shipper

What's the difference between coding Cursor and agentic coding?

Kieran Klaassen

Maybe Nityesh, you want to take this one?

Nityesh Agarwal

Yeah, definitely. Claude Code is such a simple departure from the Cursor and Windsurf that we're used to, both of those have agent tech coding capabilities, but Claude just takes it one step further by simplifying it by a factor of 10. So what Kieran was telling earlier about how Claude Code may feel intimidating because it is a terminal, but in reality it is so much simpler than the Windsurf and Cursor because. There is nothing except a text button text box here. There's no command K, no shortcuts, no accept, delete, reject, remove, there's nothing. It's just a text box. And it works because the model, the underlying Claude model, it's so much more capable now. So it's able to use its ability to work for longer and do tool calls. So it. it's a simpler UI, which makes it at the same time more powerful even though the underlying model behind Cursor and Claude Code is the same.

Kieran Klaassen

And an example of this is, this morning I was pulling some metrics. I was like, why didn't we get any responses to this form?

Dan Shipper

And then, for context, basically we have a form that we ask people how disappointed they would be if they could no longer use chorus. So we can tell how well we're doing. And you noticed we have a weekly meeting where we go through all the metrics and we know you noticed that no one had filled out that form. So you're going into Claude Code and you're asking, hey, why is no one filling out this form?

Kieran Klaassen

Yeah. There has to be something this form was not sent. And I asked, hey, 14 days ago something went wrong. Can you see what went on and what it did? It made a checklist to do things like fetching recent log changes to the controller searching the code base. So it looked through what changed around that date and it found. We removed a piece of code that adds people there, which is here it says, hey, actually you just need to add this. And I said, okay, do it for me, create a pull request. And it did that. And I said, oh yeah, by the way, I'm also going to create a script that will then add everyone that we missed to migrate and that was it. And the fun part was, it didn't cost me any energy. it was as easy as me writing it down in GitHub to look at later. I don't need to. I just ask it and it does it immediately, which is really nice. It's inbox zero. Does it take less than five minutes? Do it.

Dan Shipper

I think the thing that people may not fully realize is that a task could take anywhere from 30 minutes to a couple hours without AI. And it's not just that it would require you to focus on it and put aside time to sit down and do it. And now you just sort of send off a request and then you can send off another one and another one. And you have a bunch of these sort of working in parallel. So give me a snapshot of what that looks like concretely, what your actual workflow is. What are you actually doing, how many tabs do you have open? Are you actually doing any hand coding yourself? Do you have five in parallel? Are you just using Claude Code? Give me a sense of that.

Kieran Klaassen

Yeah. I'll show you my screen as well. Maybe Nityesh, you can tell what we did before, when we got early access to Claude, we were excited about what we did. I'll share my screen.

Nityesh Agarwal

Yeah. So this is one day before the Claude live stream was scheduled, we were like, okay, tomorrow coding is going to change. We'll have a much more capable model, in which we'll be able to one-shot everything that we want. We are basically going to get a coding genie for us. So the best, most productive thing for us to do today, instead of doing our regular studio programming, we should just jam for a two hour call where we make a massive list of issues that we want in the future, like tomorrow's superior model to solve. And we did that. We created like 20 issues in terms of what we want to fix, what were the things that we were planning to work on and prepared the system for the new Claude model.

Kieran Klaassen

Yeah. And it was funny because Nityesh had prompted ChatGPT to say, hey, tomorrow we have a, we reached AGI can you help us come up with everything we need to do and like prepare AGI to solve everything we did? And then we fed that into the prompt improver of Anthropic. And then we used that as a prompt and we created a—

Dan Shipper

Wait, before you move on. For people who are listening, so basically you have this sort of Trello board type thing instead of a GitHub Kanban board. And for each thing that you've identified as what you want to do, it looks like you have a document that lays out in detail. If it's a feature or it's a bug fix or whatever lays out in detail what it is and how to actually do it. Can you open up one of them? So a feature if you want to generate you want to have AI generate synthetic data. And this document has everything from a problem statement to a solution vision to all the requirements. And all the technical requirements and a bunch of stuff. But it seems like it has implementation steps with day counts and stuff like that, which is funny. So this is one day, one second.

Kieran Klaassen

Okay. So we use Claude Code and we have these custom prompts that we generated to create these because it's a lot of work to create these and even a lot of work to create these, even with ChatGPT, there's a lot of steps. You need to look at all the codes. You have to think about it a lot. You have to think about it. There's a lot of thinking, so it's really hard to do well, so we created a command in Claude Code. A command is kind of a custom prompt that you use a lot and ours is, hey, there’s a feature.

Dan Shipper

Sorry, this is a command in Claude Code or a command in Cursor? Because you have a Cursor open.

Kieran Klaassen

Yeah, I have Cursor open because that's how I edit files. But it's Claude Code so you can see where you're in Claude. And I can use this command by hitting CCY, which is Claude Code. And then I say something like, I have a bug, a problem, anything. So it's very low friction. So I have this CCI command and Nityesh and I were just jamming. We're like, oh, what if we do this? Oh, that sounds cool. And then voice to text and it starts. So let's see how this works. And then while it's running, we can go over the thing. So I want an infinite scroll in Cora, where if I am at the end of a brief it should load the next brief. And it should go until every brief that's unread is read.

Dan Shipper

So yeah, I just want people to understand. Kieran almost never types anything and does all voice to text. So he was just doing voice to text into his, into his internal, into Claude Code with I believe an internal, as of yet, unreleased internal, Every incubation called Monologue, which he is the number four biggest user of but is still under wraps. But a little preview is coming soon. And basically what it seems like it's doing is taking that, is it turning that into the document that we were looking at earlier? Or is it actually going and executing it?

Kieran Klaassen

Yeah, so what it does is it will insert whatever I set here in the future feature description, and then it will follow all these steps. And these steps are research, research best practices. So one is grounding itself in the code base, so researching what exists. Then it's researching best practices, so it's searching the web, finding open-source patterns. So it's grounding it in like best practices in general. Then it will present a plan. And when I say it sounds good. I like to review humans in the loop for the plan because sometimes it. It’s wrong, but most of the time it's right. Then I say, sounds good, and then it creates the GitHub issue and it will put it in the right lane and all that.

(00:20:00)

Dan Shipper

Oh, interesting. So it's that whole Kanban we were looking at in GitHub earlier. You've created a way for you to speak your feature into Claude Code and then it does all the research to create that long document and then just adds it into GitHub issues. That's really cool.

The transcript of AI & I with Every's Cora engineers Kieran Klaassen and Nityesh Agarwal is below. Watch on YouTube, or listen on Spotify or Apple Podcasts.

Timestamps

Introduction: 00:01:16
Why Kieran believes agents are turning a corner: 00:03:18
Why Claude Code stands out from other agents: 00:06:36
What makes agentic coding different from using tools like Cursor: 00:11:58
The Cora team’s workflow to turn tasks into momentum: 00:15:20
How to build a prompt that turns ideas into plans: 00:23:07
The new mental models for this age of software engineering: 00:34:00
Why traditional tests and evals still matter: 00:39:13
Kieran ranks all the AI coding agents he’s used: 00:42:00

Transcript

(00:00:00)

Dan Shipper

Kieran. Nityesh. Welcome to the show.

Kieran Klaassen

Thank you.

Nityesh Agarwal

Thanks so much for having us, Dan.

Dan Shipper

Kieran Klaassen

Dan Shipper

Kieran Klaassen

Dan Shipper

Kieran Klaassen

Yeah. Claude Code is basically the coding agent version from Anthropic that uses Claude under the hood and it runs in your terminal as a CLI tool, which is kind of a—

Dan Shipper

Do you want to share your screen and show us?

Kieran Klaassen

Dan Shipper

Kieran Klaassen

Dan Shipper

It's more technical. And it's actually a lot for two people, there's what, six major features and five important bug fixes and three infrastructure updates? That's a lot.

Kieran Klaassen

Dan Shipper

So, this is awesome. Nityesh, while this is going if you, if you want to jump in at any point, feel free to at some point. I'll lob it at you, but also feel free to jump in.

Kieran Klaassen

(00:10:00)

Kieran Klaassen

Dan Shipper

What's the difference between coding Cursor and agentic coding?

Kieran Klaassen

Maybe Nityesh, you want to take this one?

Nityesh Agarwal

Kieran Klaassen

And an example of this is, this morning I was pulling some metrics. I was like, why didn't we get any responses to this form?

Dan Shipper

Kieran Klaassen

Dan Shipper

Kieran Klaassen

Yeah. I'll show you my screen as well. Maybe Nityesh, you can tell what we did before, when we got early access to Claude, we were excited about what we did. I'll share my screen.

Nityesh Agarwal

Kieran Klaassen

Dan Shipper

Kieran Klaassen

Dan Shipper

Sorry, this is a command in Claude Code or a command in Cursor? Because you have a Cursor open.

Kieran Klaassen

Dan Shipper

Kieran Klaassen

(00:20:00)

Dan Shipper

Kieran Klaassen

Yeah. It's an important step because it is different from Cursor coding because in Cursor, normally you skip this step because the tool is not really made for it. The tool's made to code. Yes, you can create Markdown files and all of that, but let's lean into an issue tracker. It exists and it works well and people use it and it already hooks into existing patterns. We can give this to a developer and they can implement it.

Dan Shipper

Yeah. And one of the things that just to point out is, you're running this and I think one of the special things that when we saw Opus for the first time, we were like, holy shit, is that it just runs forever without any intervention and then gives you a pretty good result, which we've had sort of agentic type things for a little while. But it's just a way different level of autonomy and quality than we've ever had before. And it's just checking things off of this to-do list in a way that I think other agent loops are just going to be a lot less thorough.

Nityesh Agarwal

Me and Kieran have a fun thing going on where we're trying to see who can have Claude Code running for the maximum amount of time. And is stopping the list right now. I think he ran it for 25 minutes. I'm only at eight minutes right now.

Dan Shipper

Oh man. How did you get it to go so long? Kieran?

Kieran Klaassen

Yeah, it's just a very, very complicated long plan and also includes a lot of tests and just makes sure that it runs all the tests and fixes all the tests and Interesting. It goes pretty long.

Dan Shipper

I want to understand how you make that prompt that creates the prompt or the prompt that creates the research document? So how did you know which elements to put in? Did you just do the same thing where you use the Claude prompt improver to make that, or how did you think about putting that together?

Kieran Klaassen

This is part of the compounding effect. It’s having an idea that has a lot of a lot of outcomes. So this was what Nityesh sent me. He said, we just got AGI that was delivered and we can write software. That this was your initial prompt, which is kind of fun, it's very dramatic. And then ChatGPT said, I'm ready. Okay, so now do this. That’s fine. But do you know the— They changed it a little bit, but this is great because basically you paste in a prompt or something like that and you can say we have thinking and you can generate and it will improve the prompt automatically. And you think, how good can it be? It's pretty good because it's also very low friction. So it's very easy to just take a minute to see if something comes out, if it works, if it doesn't work, delete it. Doesn't matter. We were just jamming and we were, well, we're going to come up with 30 research tasks, so we better have a prompt. So I just copied this prompt and that became the document here and changed the arguments. And then you can trigger those in the Claude by doing slash. And we have these two custom problems here.

Dan Shipper

And then I think that actually gives me a much better idea of what you mean by compounding engineering. Because what it says to me is what you did first is spent time building a prompt that effectively builds other prompts because those research documents are effectively prompts for Claude Code. And so now that you have a prompt that builds prompts every time you want to make a new feature, you have to specify less. You just say the little feature and then it'll go do the research to build it out into a big document vs. before every single time you have to do a feature, you have to say, at first, I want you to research it, and then I want you to think through all these different corner cases or the ways that I like things built or whatever I think that's so cool. And what's also really interesting to point out is it's working while we're talking, and that's just a different way to code. We were on the phone together like last week or the week before. And we were testing this out together and I shipped a feature that went to prod while we were talking, which I'm not in the code base at all. So it's kind of crazy that it actually happened and it's a more social way to code, we're coding right now, building stuff, which was not possible before.

Kieran Klaassen

Yeah, absolutely. And oh, while we were talking, we did the research and we created this issue, which is cool. And we had six or seven running at the same time because we were just like, new idea, let's go, new idea, let's go. And what we also did, we went through, we used feedback, we read emails, everything we could, we gathered and we were just brainstorming. And it's really fun because if you're in this brainstorming place. You can just kick off agents and see what comes up, what they come up with, and take another time to then review. So what we do also is to agree with you on it's really fun to do this together on a call because that's where magic happens and there is still a human review step here because we found that we want to look at it, see if it makes sense, if anything is missing. This is having taste, experience, intuition and the bug I solved earlier with the email not going out. Nityesh did the same with his Claude Code, but it didn't give the right answer. So there is still a human touch of intuition. I hinted at, look at the history. That actually made it think into the right direction. And then Tesh didn't look at the history and then it said, no, everything works fine. So there is still intuition and like it's still a skill. It's still a skill. It is a skill for sure. It's absolutely a skill. There's no magic prompt that does everything. It is about using it the right way and using it to its strengths.

Dan Shipper

Yeah. Nityesh, how have you found this all? Because I know Kieran is a long-time Rails expert who's just an incredible programmer, and I think you're a little bit earlier in your programming journey. So what has that been to come to every start working on Cora and start working on it in this way?

Nityesh Agarwal

Yeah, this has been incredibly eye-opening, I would say because honestly like my experience with programming is that two years ago when ChatGPT came out, I thought, okay, now it's perfect for me to teach myself programming and build that SaaS application. And that I always wanted to do so I taught myself programming using ChatGPT from the very first day. So I have gone through all the transitions. I went from ChatGPT. Then when Cursor came out, I shifted the workflow to Cursor, and then when Windsurf got better, we shifted into Windsurf. And I was always thinking, okay, I am at the forefront. I don't know any of my friends who are doing so much with AI and I'm at the forefront. Then I joined Every and started working with Kieran and Kieran is at a whole other level. He's in our meetings, he is never writing code, he's never typing, he's always speaking into the computer and, so I was like, okay, I need to adopt that into the workflow. And then even when Claude Code came out. Kieran actually pushed me into using it. And clearly it is now the way to program like me and Kieran, both of us, we haven't even touched Windsurf or Cursor in the last three weeks or so. Or even if we touch it, it's usually just because we want to read something. It is basically like we're using it because we don't have VS Code on our computer. It wouldn't matter if it was VS Code, the older VS Code article itself because all the AI stuff is happening with Claude Code now and it's really fun to have been in this position where the entire coding landscape just changes completely every three months and you realize nobody's at the forefront.

Dan Shipper

I’ve got to say I'm jealous of you learning to code right when ChatGPT came out because I learned to code from books 20 years ago.

Kieran Klaassen

For Dummies.

(00:30:00)

Dan Shipper

Yeah, Sams Teach Yourself BASIC or whatever.

Yeah. And it's so funny for you to say I thought I was sort of at the forefront of AI coding. And then I joined Every and started working with Kieran because it just reminds me of, I don’t know if there's this scene in Star Wars the prequel episode one where they're under the water and they're being attacked by a sea monster and it looks like they're going to die. And then another bigger sea monster comes out and just eats the one that's killing them and there's always a bigger fish. And yeah, Kieran is the bigger fish.

Kieran Klaassen

But I feel the same as you say that about me, but I have no idea what I'm doing. I need to run behind. We need to do a million more things. So that's just the reality of the landscape. There is always more. But it's really about practice. You should practice using AI, you should push yourself every day. If you don't, you'll miss very cool stuff.

Dan Shipper

Yeah. Well, I guess I'm curious, personally and also for people in the audience, what are the problems with this? So basically it sounds like you're moving to a form of coding where you don't touch the code. You're one level above. And so what are the problems that come up with that, and how are you solving them? What are the new engineering practices that you need to incorporate in order to make sure that things go well?

Nityesh Agarwal

For me, the most important realization for me has been this thing that I always keep going back to especially with short code. I read this in a management book like High Output Management, which the Intel CEO wrote like 50 years ago and in the first chapter he mentioned something about how, in any production process, you should fix any problem at the lowest value stage. And I just can't stop thinking about that statement because because AI and Claude Code can now do so many things for us, it has become really important to focus on the earliest part of things. So what I mean by that is when we see that you know, when we are using the workflow that just showed to create a GitHub, a very detailed GitHub issue then it's very tempting to start another Claude Code to ask it to just, hey, go now work on this GitHub issue and fix it. But that's actually going to be a problem because there are chances that you know, the plan that Claude was able to give in that issue, it wasn't the direction that you wanted to go. And you want to catch that before you ask Claude to go and implement the solution, and then you want to fix it over there.

Dan Shipper

That makes perfect sense. I really, really like that idea. The thing it reminds me of is just. All this stuff is a lever and like the further out you get on the lever, the more power you have, but also the more power you have to go in the wrong direction. Every little inch makes a big difference at the end. And so trying to catch it earlier, I think is the thing that makes sure that you're not shooting off into space or this lever metaphor is totally breaking, but, you know what I mean? If you point a rocket at the moon, one inch means thousands of miles of difference. And so I guess the same thing is true with AI stuff, and I think that's actually a good lesson for me because I tend to want to rush through the planning stuff. It's just hard for me to look at a document like that. the thing that Claude is writing and concentrating on it, so how have you guys found that?

Kieran Klaassen

It's kind of boring to read most of the time, but you can make it more fun. You can say just minimal. This is too much. But then the thing is, then it misses things again. So it's actually important. So for code, I like it too. Focus on user stories or asking questions and answering them. So let's say, hey, what are some questions? A good PM would ask about this. That we should consider and give two options. It's more fun to read that than week one, we'll do this. Week two, we'll do that. PRDs are boring and you can make them a little bit more fun or give more examples or you can shape that research and that's normally what we do in the human review step. It's, do we see any red flags? Do we need more stuff to be added? Because it will save so much time.

Dan Shipper

That actually reminds me of something that we're finding in another part of the business. So Danny, who's been on this show, is the GM of Spiral. And inside of Spiral we're building a writing agent. So you can think of it as sort of code, but specifically for writing tasks. And I think there's something similar about that where sometimes you want that writing agent to shift into an interview mode where it tries to understand more about who you are and what you want, rather than just spitting out a bunch of stuff that you then have to read through. And it sounds there's maybe something missing here in Claude Code or these sort of coding workflows where it would be really nice instead of having to read that long document, it's finding ways to ask you questions so that the thing it outputs is more likely to be right without you having to read through the whole thing.

Nityesh Agarwal

Yeah, absolutely. Just an interesting idea for a custom command click here and we should totally try that.

Kieran Klaassen

Yeah, for sure. This is something we should automate and make better for sure. And at the same time, it knows a lot because it has access to your code base and your style. And that's very powerful. So you have the code base and it's actually pretty good doing it. In addition to making it very good at the beginning, I think it's just boring. Traditional tests and evals are very important as well because how do you know what you did is actually working well? You can open a command click through it, but why just have it tested, write a test for it, just the bare minimum. Smoke tests are great where you just see does it work? Because otherwise it does way too much, but it's a very good way to have it iterate and fix things by itself. And we haven't tried it as much yet, but we use the Figma MCP where we say, hey, implement this from Figma and now, you can have Puppeteer take a screenshot for a mobile version and then say, compare the two. We haven't really tried it out, but we want to try more of that out. So there are these checks in place, tests in place that you normally do manually. And the same for prompts like evals for prompts.

So I kind of think of an eval as like a test for code. An eval is a test for a prompt and what I've seen last week as well, I had Claude Code run an eval and then say, actually it fills four out of 10 times. I said, run it 10 times. Does it always pass? No. Four times it doesn't. I said, oh, look at the output. Well why didn't it call that tool? It was a cool tool called Test. And it says, oh yeah, it wasn't specific enough. And I say, okay, just keep going and change the prompt until it's passing consistently all the time, and it did it. I just walked downstairs, got a coffee, walked up, and that was it. So evals are also very powerful because they will tell you if a prompt works, and similar to writing code, a test says your code works. So leaning into those more boring traditional ways is also very powerful. Does that make sense?

Dan Shipper

I have a thought and because one of the things I think is really special, and I think Nityesh, you're in this boat too, so tell me if I'm wrong, but one of the things I think is really special about you, Kieran, is that you just test everything. So you've tested every single agent, Nityesh, have you used a lot of the agents as well?

Nityesh Agarwal

That's Kieran.

Dan Shipper

Okay, well, I think we could still do this. I think it'd be kind of fun. I want to spend five minutes with Kieran doing a s-tier through f-tier ranking of agents. And so what I'm going to do is I'm going to share my screen. And I'm going to call out an agent and then you tell me where it ranks. Are you game? Yeah, let's do it. Okay, cool. Let's do Cursor.

Kieran Klaassen

Yeah. So it's fun because Cursor—what Cursor? Is it Claude four? Is it Max?

Dan Shipper

Cursor on the best possible settings.

Kieran Klaassen

Is it the background agent or is it the Cursor traditional best possible setting? That's the confusing part about Cursor and Windsurf. There are a million versions of it and, why don't you just have the best version? And that's what I love about certain agents. They just say, look, this is the best agent. So that's why it wouldn't be the best I would say A. Cursor is very good with Claude 4.

(00:40:00)

Dan Shipper

Okay. All right. Windsurf?

Kieran Klaassen

C. Because they don't have Claude 4. It's ridiculous because three weeks ago they would be A and now they're not. I switched from Windsurf to or from Cursor to Windsurf a few months back but I switched back.

Dan Shipper

Okay, so we've got Windsurf is a C. Cursor is an A. Let’s see. Devin?

Kieran Klaassen

It's a B.

Dan Shipper

Why?

Kieran Klaassen

It's not as integrated. It's a little bit hard to set up, and the code quality is not as well rounded as Cursor or Claude Code. I don't know if they use Claude for in the background, but it's not as usable as the others.

Dan Shipper

Charlie.

Kieran Klaassen

Charlie is for code reviews. So we use Charlie for code reviews mostly. So, I haven't really used it as an agent as much. I think Charlie, as an agent, is B, but as a code reviewer, I really like the code reviews it does. So that's interesting. It’s really good at something.

Dan Shipper

And then what about Friday?

Kieran Klaassen

I put Friday, higher than Cursor maybe. Between S and A. And it's funny because they don't even use Claude 4 yet. They're still working on how they really make it work. Well, it's 3.7, but why I like it there, it's definitely different from Claude Code, but Friday has a very opinionated way of working and I love their opinions and it really works well. And it just does it you give, give an issue. They make a plan, you approve and it does it. It creates a pool request and I've seen it do this stuff that I couldn't do with Claude Code. For example, implement this Figma design. It just showed a Figma design for the assistant and I've seen moments where there were multiple moments like that where it did things where, okay, I taste the future, which is really unique and it's a small team as well. So really cool.

Dan Shipper

That is interesting. Codex?

Kieran Klaassen

This is B for me. Codex is a B.

Dan Shipper

Copilot?

Kieran Klaassen

I haven't used Copilot.

Dan Shipper

You’ve never used GitHub Copilot?

Kieran Klaassen

No. I mean, I used it three years ago, but, no. Let's be fair. I tried it maybe a half a year ago, and after one second I stopped using it.

Dan Shipper

Where do you rank it?

Kieran Klaassen

D. It was not agentic, but I mean, I should try the new version for sure.

Dan Shipper

We have not tried the agentic Copilot, so that's not totally fair. Are we missing anything? I feel like we.

Kieran Klaassen

Claude Code.

Dan Shipper

Obviously Claude Code, but I assume it's s-tier.

Kieran Klaassen

Yes. We have Factory as well.

Dan Shipper

Oh, yeah. Where do you rank Factory?

Kieran Klaassen

It's interesting. Factory with certain things is better than any others, but it's not my style. Factory is for more enterprisey people that are very nerdy and want absolute bangers of code and it's actually good, multi-repo stuff like that. It's a little bit hard to use because it's on the web, but also local. So I rate it a B. Maybe a little bit below Codex and Devin. But there is a use for it for sure. There's something good there. There is something. It's not my thing.

Dan Shipper

Amp.

Kieran Klaassen

So I would put it S-tier under Claude Code between Friday and Code.

Dan Shipper

Whoa. Another S-tier. Holy shit.

Kieran Klaassen

Yes. It's very good at just getting work done. The ergonomics are pretty good, good tools already. People use that tool to build it. They're dogfooding. You can feel from Claude Code and their developers that they love agents and they're just building the best thing and they're trying new things out. So yeah, that's it.

Nityesh Agarwal

This is exactly why Kieran is the big fish.

Dan Shipper

I mean you’re stringing them together. You're using Claude Code and Friday and other stuff all at the same time, which is really cool.

Kieran Klaassen

I'm thinking about it more if you're interviewing for a role and you find a developer to solve a certain problem. I think it's similar to coding agents. Friday is good at doing UI now. So if I need UI work, I'll go to Friday. If I need to do research, I go to Claude Code. And, if I want a code review, I use Charlie. It's fun and agents work together. You don't need to have one agent. We have Claude Code

Dan Shipper

And that’s because Charlie works in GitHub, so you can just CC Charlie and Charlie will do the code review on the PR.

Kieran Klaassen

Yeah, so we use GitHub and pull requests and normal developer flows. Humans can hook in. So we can hire someone that's very good at specific things and review codes and then Claude Code will just do the work. But it's very powerful because it is just an ecosystem that we refined over 20 years or whatever and it works. So let's lean into that. And that's probably why Copilot will probably be fine since it's in there already.

Dan Shipper

Wait, you actually did that recently? We had some infrastructure things where we handled tons and tons and tons of emails at Cora, so we had some infrastructure issues to work out. And I think you brought in someone who's like a real expert and then worked with them in a specific agentic way that you got what you needed from them, but it was less work for them.

Kieran Klaassen

Yeah. So there was no issue yet, but we wanted more visibility in delivery of the most important things. And I'm not very good at it or I know stuff, but let's bring in someone. And what we did, we just had a conversation on our call and I recorded everything, and at the end I just fed that into Claude and said, okay, can you make it? Two issues, research issues from this. And 10 minutes later I said, okay, here are the issues. You view them. And he was like, holy, what? This guy, he's not an AI skeptic, but he's very good at what he does and normally what he does, AI's not good at yet because like there are things AI is not as good at yet. But he was very impressed with it and he liked it very much and commented on it to iterate over it. And what we basically did, we just iterated more quickly through ideas because we had something to talk about. And then I said the next day when we, he was like, did the human review? I just used Claude Code to implement it, and we sat down and did the code review. So it's just accelerated. What would've taken two weeks maybe is now in like a few hours, which is really cool.

Dan Shipper

I love it. Well, there you have it. You've got your tier list of agents. Claude Code takes the cake. We've got Amp coming up in second. And GitHub Copilot, unfortunately, bringing up the rear, but with room for improvement once we try out their agentic capabilities. Anything else you guys want to say or talk about before we end today?

Kieran Klaassen

Everyone should use Claude Code or try it out. Even if you're not technical, subscribe for their max or pro plan. It's only $100 dollars per month. You have unlimited access. If you're skeptical about being technical about that. It's very easy. And I've seen people, a friend of mine, he used Cursor and I said, just use Claude Code. It's better. How much better can it be? And he said, yes, it's better. And he rebuilt everything he did with Cursor vibe coding into Claude Code. And he's like, yeah, this is great. He felt that next step, and everyone should try it and really push tools.

Dan Shipper

Nityesh, any other words of wisdom?

Nityesh Agarwal

Just be sure to check the AI's work at the lowest value stage. You want to catch those problems early.

(00:50:00)

Dan Shipper

Yeah, that's a great one. And also use Cora: cora.computer. Check it out. It's pretty awesome. We're shipping you things all the time. Thank you both for coming on. This is a true pleasure. I cannot wait to see what else you cook up over the next couple months and we'll talk soon.

Kieran Klaassen

Thank you.

Nityesh Agarwal

Thank you so much.

Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Deliver yourself from email with Cora.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.