Transcript: ‘He Built An AI Audience Simulator. It’s The Future of Customer Research.’

‘AI & I’ with prompt engineer Michael Taylor

5

The transcript of AI & I with Michael Taylor is below. Watch on X or YouTube, or listen on Spotify.

Timestamps

  1. Introduction: 00:01:32
  2. AI can simulate human personalities with remarkable precision: 00:04:30
  3. How Michael simulated a Hacker News audience: 00:08:15
  4. Push AI to be a good judge of your work: 00:15:04
  5. Best practices to run evals: 00:19:00
  6. How AI compresses years of learning into shorter feedback loops: 00:23:01
  7. Why prompt engineering is becoming increasingly important: 00:27:01
  8. Adopting a new technology is about risk appetite: 00:44:59
  9. Michael demos Rally, his market research tool: 00:47:20
  10. The AI tools Michael uses to ship new features: 00:55:03

Transcript

Dan Shipper (00:01:30)

Michael, welcome to the show.

Michael Taylor (00:01:31)

Yeah, good to be here, Dan.

Dan Shipper (00:01:33)

Good to have you. So, for people who don't know, you describe yourself as a recovering agency owner. So you had a marketing agency that grew to 50 people and then you sold it in 2020. And then you just dove into AI stuff. You wrote the prompt engineering book for O'Reilly. And you are also a regular columnist on Every. You have an amazing column called “Also True for Humans.” I'm psyched to have you.

Michael Taylor (00:02:02)

Yeah, it's good to be here. And I've been watching a bunch of these episodes and just kind of thinking about how other people use AI. So it'd be interesting to see how I differ.

Dan Shipper (00:02:10)

Amazing. So, I'm excited to have you. I think one of the reasons I really love your work is you have a tinkerer's mindset and you're just always playing with new things and getting psyched about them and exploring the limits of what the current technology can do. I think that's why you're a really good prompt engineer and why you think about it so much. I think you're really good at building little workflows for yourself to automate things. And I'm always into or interested in what people like you are currently playing around with because I think like what you're into right now, other people are going to be into in six months or a year. And the thing that you're working on that I'm most excited about is you're thinking about using AI to do simulations of people. Let's start there. I want you to lay the groundwork for me. Explain what that is and why you think that's interesting. And then let's talk about what you're doing with that.

Michael Taylor (00:03:05)

Yeah. So, I studied economics and then went into a career in marketing, but always was interested in this idea of being able to predict behavior. I think that's what got me into economics. Because with microeconomics, you could somewhat predict behavior and then get into growth marketing that way, because you could run A/B tests and you could see how to predict behavior. Would they click on this ad or that ad? And I guess this is kind of natural that when I got into AI would start messing around that stuff too. And I use roleplay a lot, right? That's pretty obvious. The number one prompt engineering tactic that everyone tries first is, as a researcher in this field, give me an answer. And it actually really does change the answer. And because the training data is so wide, it can kind of roleplay as almost any character. And the logical leap there is, well, if it can roleplay as one character, why not multiple? And you can have it be a whole focus group for you. You can have it be an entire audience. And then you can test things and kind of look at the assumptions that you're making and see in a risk-free environment, whether your idea would work or not before you actually put it out there in the world.

Dan Shipper (00:04:25)

Right. I want to stop there because I think people may not even really understand the full weight of what you're saying. There are a lot of studies that show if you tell an AI to adopt a certain personality, it will make a lot of the same decisions with very high overlap with someone else who is actually a real person who has that personality. So one thing that I did early on this is maybe a year ago-ish. It was when GPT-4 was still good, which should tell you it feels like forever ago.

Michael Taylor (00:04:52)

Yeah, it's like 100 years ago now.

Dan Shipper (00:04:55)

But one thing that I did was— Because I was sort of, I sort of knew about this research too. And I was kind of into it. And I got GPT-4 to look at my tweets and then, based on my tweets, write a personality profile for me. And then I gave that personality profile to another GPT-4. And I was like, be this person and adopt this personality. And then I had to take a personality test as me. So I took the personality test and it took the personality test to see what the overlap in the scores were. And the overlap was extremely high. And then I was like, I wonder if someone who knows me really well in my life would be able to do what GPT-4 just did just based on knowing me. And so I had my girlfriend at the time and my mom pretend to be me, and take a personality test and try to think about what I would pick. And GPT-4 is better than both of them, just from my Twitter account, which is, it's crazy. It's really wild.

Michael Taylor (00:05:53)

Yeah. Maybe they don't read enough of your tweets, I guess.

Dan Shipper (00:05:56)

Yeah. It's really not that GPT-4 is good. It's just an indictment of my ex-girlfriend.

Michael Taylor (00:06:00)

No, I mean, I've done the same thing. Very early on, I was doing a lot of writing and I was like, why don't I just automate my entire job? And then actually I did it and people couldn't tell and then people would ask me like, hey Mike, I really liked that thing you posted on LinkedIn. And I was like, you're going to have to remind me what it was.

Dan Shipper (00:06:25)

I'm too busy and important and popular now to even know what I tweeted or what I put on LinkedIn.

Michael Taylor (00:06:30)

Must be how famous people feel when they have someone managing their account. But yeah. I always thought it’s actually surprisingly good at this. And then I saw, like you said, a bunch of these studies. There was one that jumped out at me. I'd have to find the link, but they basically said that if you do a two-hour interview with someone and you take the transcript and then create a personality profile you get 80 percent accuracy in big five personality traits, personality tests, the decisions they make in economic games and how they vote as well. And the crazy thing is that 80 percent is already pretty amazing, but it's actually as accurate as if you interview the same person again the second time. So there's always drift. When you interview if you interview you today and interview you next month, Dan is only going to agree with 80 percent of what Dan said today. So virtual Dan is as good as that.

Dan Shipper (00:07:32)

I contain multitudes, Mike. Yeah, where you're going with that is if you're a founder and you want to interview your customers or know what your customers want, and it's in an industry that you're not as familiar with, a good place to start is just talking to ChatGPT or talking to Claude, which I've done. It really works. It's crazy how well it works just from basic prompting. If you're thinking about it, I want to build a product in X industry. Let me figure out how that kind of person would think and what their day is and all that kind of stuff. It's really, really good. But you're taking it a step further. Tell us about what you've been building behind the scenes.

Michael Taylor (00:08:15)

Yeah. So this will kind of stem from a post I did for you guys, actually—about personas of thought—my last one in the column, where I had like a prompt I was using or I said, okay, let's generate a bunch of people who are relevant personas who could answer this question first and then kind of fill in the blanks of what that person would say in answer to the question. So I had that, I was doing that all in one prompt. But then I wanted to automate it, so I wrote a script and then I think we were talking, you were like, help me solve a debate. yeah, I think you, you liked the— You were trying to describe—

Dan Shipper (00:08:55)

How should we describe Every? A “meta media company” or a “multimodal media company”? It was the question. And Kate, our editor in chief was like, I don't like meta. And I was like, but I like it. So we were like, well, let's test the wisdom of the crowds. Let's see what would happen. And then you put it into the script.

Michael Taylor (00:09:20)

Yeah, I ran the script. And then the thing that kind of convinced me to get working on this a bit more was that you changed your mind. It's very rare that I see a CEO change their mind about anything. I was like, okay, this might be something here. And then I was using it for everything. I was like, oh, I'm, I'm actually changing my mind too. And it was an unexpected result. I didn't expect it. I thought people would be really skeptical of the AI responses. But I think because each AI is answering itself it gives the reasons. And then when you summarize those reasons, then you can kind of look at them and go, oh, I agree with those reasons and therefore I can change my mind without being too stuck in my ways. I don't know. How was that? Is that kind of a similar thought you had at the time or—

Dan Shipper (00:10:08)

Basically. I mean I think my basic way of being as a CEO or writer or whatever is, as I talk, I'm paying attention to how people respond and I'll like get a sense for like, ooh, this little thing I just tried worked really well and then that's how I write headlines or that's how I figure out ideas and all that kind of stuff is I'm just constantly trying stuff. And so feedback was very important to me. And that's why I love using X because I can just put something out and lots of people will interact with it or not, or doing sales meetings, the same thing, all that kind of stuff. And so this feels like an extension of that where I can take way more risks. And it's a little more quantitative. It's a little bit more like I can hit one thing vs. another. Whereas in real life, people remember what you just said. So you can't just wind back the clock and be, what if I had said it another way? How would you respond? It's hard to do a really fair test.

And so I was immediately like, wow, that makes a lot of sense. And also, I do think having the thoughts of the model is really helpful because I think they were like, meta just sounds like Facebook to me or whatever. And that's what Kate was saying. And I was like, I don't know, Kate, I'm the CEO. I don't think of Meta when I think of it or whatever, but other people were thinking. And I was like, okay, well that makes a lot of sense. And then I changed my mind and yeah, I'll do anything to get this to be part of our bundle because I love this product. I think it's so cool.

My workflow now is when we do an announcement or I'm writing a post or whatever, I will use Spiral, which is our content repurposing platform to figure out what I want to tweet or what the headline should be or whatever. And then I'm just pinging you or you have a little demo of the little MVP that I've been using and I can just put it in there and get some wisdom from the crowds and I love it. And what's interesting to me too is I think where people are going,where people's heads might be going is, okay, cool. You can just automate everything. And you'll always have the right answer to every question. And you're going to make super, super, crazy, amazing content, every single time or whatever. And yeah. I think what's interesting about this is there's so many different variables to it. So, for example, one variable is, what is the audience? And you can spin up different audiences with different demographics and different viewpoints and personalities or whatever. So we have an Every audience. And then we have like a Hacker News audience that we can test and all that kind of stuff.

And then also the results that you get are extremely dependent on the question that you ask. And so, if I'm pitting one thing vs. another, it's not like going and finding what is the best possible way to phrase this? It's just pitting one thing vs. another. And maybe there would be a way to automatically try tens of thousands of messages or whatever. But even then you start to realize that the kind of space of possibilities is so huge. It's so huge that anything— You can only ever test a small portion of it and different people are going to start in different places in the landscape. And so, yes, these tools are super powerful. And it makes my process way more efficient to have it available. But I think It's not gonna do the thing that people think where it's like, oh, we're just gonna turn into zombies and everyone's marketing messages are gonna be the same because the space of possibility is really big.

Michael Taylor (00:13:50)

Yeah, exactly. There’s those two things combined, right? The space that you're testing it in and the personas you're using and then. Because a Hacker News audience is gonna be very different from your Twitter audience or whatever. But that was something that we did a lot of in the agency. We were testing tons of creative ideas with Facebook ads and you could just never have enough budget—even the biggest brands don't have enough budget to test everything. So yeah, this kind of expands the space in which you can play in, but it's still your duty to play. You still have to come up with good ideas and AI can help you come up with good ideas too. I tend to use it in the brainstorming phase, and then I use it in the testing phase, but I'm in the middle. I see quite often it's better at judging than it is at finalizing the copy. And this is something I saw a lot of as a prompt engineer, working over the past few years. It's just that LLMs are actually pretty good at judging the results of tasks even if they can't do the task very well themselves which kind of led me down this path in some respects as well.

Dan Shipper (00:15:00)

That's really interesting. I want to push you there because I've had experiences with Claude, for example, where I'm putting in like an essay or whatever. And I'm like, grade this essay. And it just, always gives you a B-plus the first time or A-minus the first time. And then if you make some revisions, it always moves up to an A or whatever. So yeah. Where are the tasks where it is good at judging and where are the tasks where it's kind of going to give you like that kind of [bleep]? Oh, it's an A-minus or whatever. And here's and the next turn it's an A.

Michael Taylor (00:15:32)

Yeah. It's really bad at grading. So you can't grade things on a Likert scale very easily. You can't give it the stars out of five. It tends to always overestimate the middle. So, it's almost like being too nice to you. Everything's a four out of five.

Dan Shipper (00:15:47)

Yeah. I'm like, be mean!

Michael Taylor (00:15:50)

I know. Yeah. And there are tricks for getting it to be more mean, but actually what I use it for a lot in terms of grading is I'll have something that is a one or a zero. So I'll have some criteria of, for me, a good article is one that is as concise as possible. And so I have a one and zero on the concise scale. Is this concise, right? I'll have another one which has a compelling hook? And I'll kind of build up maybe 20 of these for any tasks that I'm automating. And then I'll run my testing suite and each one is an LLM judge that just checks that 0–1. And when I combine that score together, it just kind of gives me an aggregate score that is much more reliable. When you run it again, it's not changing as much.

Dan Shipper (00:16:48)

That's interesting. So what about like so first of all, I like the idea of breaking it up into subtasks, but then like within a specific subtasks, like the hook. Is it still only going to be able to do 0–1, good hook or bad hook? Or could you do 0–5? And would it be able to do that?

Michael Taylor (00:17:04)

Yeah, it's not as good when you go 0–5. So what I would try and do is break that up into subtasks. break that classification down into subclassifications. Right? So, I would think about, okay, what is a good hook? A good hook grabs your attention. And a good hook may name drops or references something famous or credible. A good hook does this. And the way I get these, by the way, I use an LLM judge for this too. I think the other thing it's really good at is comparison. So you can give it two different articles that you've written. You say what is different between these two articles and then you look at the differences and then some of those will then become judge criteria.

Dan Shipper (00:17:46)

That's a whole other tool that would be super useful is just recursively making a template classifier thing that takes a bunch of examples of good and bad, and then uses that to create like the most detailed possible rubric for another LLM to use.

Michael Taylor (00:18:00)

Yeah. I did think about doing that. I worried it was like way too far down the rabbit hole. I don't know how much people actually care about the quality of their writing in most places.

Dan Shipper (00:18:10)

Yeah, maybe not for writing, but in general, I feel like there's got to be use cases for that, but it may not, it may not be a today thing. It might be that in a year people will be more sensitive to this kind of thing.

Michael Taylor (00:18:25)

Yeah, I was doing a lot for prompt engineering with my clients. And then, once you have a list of these criteria, then you can build a metric and then and then you can use it as a prompt optimization library. And that goes pretty deep into the weeds like DSPy or something like that. because then you can improve the accuracy of the classifier. And then once the classifier is much better, then you can improve the accuracy of the generation task as well. So, it depends on how deep you want to go on this, but this is something I spend a lot of time thinking about.

Dan Shipper (00:18:55)

I want to go deep. I mean, the thing I'm thinking about is we're talking about making evals, basically. And I'm just thinking about Cora, our email tool and trying to improve those summaries and then trying to figure out what is a good summary? And we're basically doing this where every week I go in and look at the summaries that Cora generates for me with Kieran, who's the GM of Cora. And I literally rewrite my inbox for him personally—

Michael Taylor (00:19:25)

That’s usually what I tried to get clients to do.

Dan Shipper (00:19:30)

Yeah. And just sort of hope that over time we'll have enough examples and enough rewrites and enough pulling out all those principles that it starts to work. And it's really fun. It's really interesting. And it's also really hard. And I love all of the mapping, all of the little things that I know that are not explicit that I just sort of know because I'm, this summary is wrong. Or here's how I would summarize this. But I can't explain it until I look at it, and then I’m like, well, this is the rule I'm following. And it's really fun.

Michael Taylor (00:20:05)

And sometimes you don't know the rules you're following as well.

Dan Shipper (00:20:10)

You mostly don't know the rules you're following. That's part of the fun.

Michael Taylor (00:20:15)

Yeah. Once you see it, then you're like, oh, I get it. So I was trying to make a Tinder like interface when I'm doing this, where I'm, one to zero, one to zero, thumbs up, thumbs down.

Dan Shipper (00:20:25)

Yeah, it’s like the eye doctor.

Michael Taylor (00:20:30)

Yeah, but so I'll walk you through how to make quick evals for this, right? So if you take that big list of all the emails where you have the one that it generated, and then the one that you rewrote. Then what you want to do is run a big analysis—you can do this in a Jupyter notebook or get the engineers to do it or do it manually to take a long time. But you just get your compare, just ask Claude or GPT-4, what is the difference between the rewritten and the original? And then you take all of those differences and then summarize and say, what are the main differences between a good email and a bad email? And then you ask it for bullet points, and then each bullet point becomes your evaluation criteria. And then you can then you can look at the accuracy of the evaluation criteria because you have now you have for every one of your rewritten emails you have, whether it's scored one or zero on that evaluation criteria, whether it was present or not. So then you can look at the false positives, false negatives. This example, I said, didn't have a good hook. But my LLM judge said it had a good hook, so it was wrong, right? So that's like a false positive. So that would mark the score of the classifier down. So, yeah, that’s what you would do.

Dan Shipper (00:21:58)

And you would do this in a Jupyter notebook or what's the format that you keep all the emails and examples in?

Michael Taylor (00:22:05)

Yeah, I would love to use a tool to do this and I have tried a lot of them. And I find the abstractions change so often with AI that I keep going back to just no framework, no tool, just a Jupyter notebook. And then like, I mean, I'm doing it in Cursor and I'm just letting Claude YOLO create the interface, save everything locally in CSVs and stuff. And it's probably not the best way to do it, but I'm the sort of guy that just doesn't really care that much about the elegance of my code and just wants to kind of get the job done.

Dan Shipper (00:22:40)

I've always been like that. And now like everyone is catching up to us because now everyone just, well, I let the LLM do it and I press accept all. And I've always basically YOLOed all my code. I used to have to handwrite it and now I don't.

Michael Taylor (00:22:50)

Yeah, I was YOLOing code back when it was Stack Overflow. Just copy someone else's code in there and just see if it runs.

Dan Shipper (00:23:00)

You know what this also reminds me of? We run a media company. So all the time I'm giving feedback to different people on their writing or the videos they make or the tweets or whatever. And, I just want someone to make a rule thing and then make a bot that just grades this stuff and gives all that feedback because it's so repetitive. And what's been really interesting is we have this writer, Alex, who started writing with us over the last month or two. And he writes Context Window, which is our Sunday email. And he writes all of the here's what happened in AI, here's all the new releases, here's some analysis or whatever. And then also when new models come out, I always write a big piece. So, yeah, OpenAI launched Deep Research this week. And so I wrote that and we collaborated on that. And the first time we collaborated, he's a junior writer, right? And so he's got a ton of great ideas, but, the first draft he did of that of the first piece we did together, it was some other new model release, I can't remember. But it was the first draft he did, I was like, oh, this is not good. No shade to Alex, but—

And it’s not just not me because his byline is on it and stuff, but it had a lot of the information that was supposed to be in there, but it had a lot of information that shouldn't be in there. And then the actual sentences themselves they just weren't that good. And there's a lot of reasons and a lot of mistakes and a lot of whatever. And so what I did was like, after he gave me the draft, we didn't have a lot of time, so I just rewrote it. And, honestly, that kind of thing where it's a new product release. It doesn't take me that long to write for me, like 45 minutes, right? The whole thing. Once all the information is there. And then when I had him do it he took his draft and then took my draft and threw it into a o1 pro and be like, what's the difference? And the next time when we wrote this Deep Research piece, I had him write the first draft, and it was 1,000 times better the first time. And I was like, what the fuck? Because before all of these tools, to get from where he was to what I saw him deliver would have taken 1,000 articles. You would have to write 1,000 articles or a year or two of really grinding and it was just the next time it was great. And I was just, this is crazy. It's absolutely crazy. And it's because he's sort of every time I edit him, he likes throwing in the difference and then makes a little rule book and it constantly as he's writing, going back with one or any of these models and having it remind him of some of these things and write little portions that he struggles with and all that kind of stuff. And it's so much better. It's crazy.

Michael Taylor (00:25:58)

Yeah, and I think that's the thing that people are missing is that, yes, AI is a threat to people's jobs. And nobody really knows what's going to happen as we get like Ph.D.-level AIs doing everything. But then at the same time they're also helping us learn a lot faster. So if you're actually using these tools and you have kind of these you're keeping rule books of what is a good article, according to the Every style guide, then you have that strong opinion based on the collective wisdom of the publication. And then that is a prompt, right? I feel like prompts are kind of converging where you'll be using the same prompt for people and for AI. But yeah, it is essentially like what I did at the agency as well. we would write these SOPs, we called them, standard operating procedures. And we had like a Google Drive full of hundreds of them. And it was like, here's how I do this. Here's how I do that. 

Dan Shipper (00:27:01)

That's the thing that, I don't know, a year ago, the really popular thing to say was, oh, prompt engineering is going to be dead or whatever. And then the opposite has turned out to be the case. It has become more and more important. And I think the mistake that people make when they say that is that the gap between what you intend and what the model does is going to get smaller and smaller. So the gap between your objective and what it does is smaller and smaller. But, the thing that people miss is that everybody has different objectives. Objectives are these big, high-dimensional things that are really hard to express. And people who are better at detailing their particular objective are prompting. And that's what it is. And so detailed prompts are not going away because people have very detailed things they need to get out to tell the model what to do. And I just think that that's really interesting. I love the ways that we tend to misjudge new technologies and this is, I think, a really great example.

Michael Taylor (00:28:14)

Yeah, I think it stems from the fact that a lot of the people who are early in AI come from machine learning or engineering backgrounds or data science. And while obviously that's a huge benefit and they did really well, even very early on in AI to get some of these systems working, they only really worked on things that were well specified. Being engineers, the tool is already briefed, right? The product manager took the messy world outside and crammed it into a PRD for you, right? So, you don't realize how messy the world is, I think, when you're an engineer. And I say that as someone who was a business person and now is a full-time engineer and I can see both sides of it. And so they are completely right that like for well specified tasks, you don't need to do prompt engineering anymore, you can use DSPy. But, so much of the world is completely unspecified. And people didn't even know what the specification might be, like we talked about earlier. You don't know what a good article is until you see it and then you're like, oh, that's something that I feel allergic to.

Dan Shipper (00:29:28)

Totally and I think that's the really interesting thing is that so much of the world is not specified. and it's really interesting to see these companies OpenAI going down the RL route with getting better at tasks where you can specify the end result in every step, verify the end result in every step, which will do some interesting things. But so much of the world is outside of the realm of what can be specified. And I think people just sort of miss that or assume that eventually it will all be specified. And I think that it won't. Specifications tend to be too low dimensional to express all the stuff that's going on. Some things that always have to be sort of inexplicit is my sort of philosophical view. And I think the beauty of language models is it is the first time that we have had tools that can deal with things that are inexplicit. We've never had that before. Any kind of computer you have to, but you have to be able to specify, it has to be exact, has to be mathematical and logical, basically. And LLMs are like, here’s an example, just classify based on these examples or follow the examples or whatever, which is a good way to talk about something or point to something that is inexplicit. Why is that example a good example? You can probably boil it down to rules, but there's something in there that is still kind of unexpressed or there's a lot more richness in an example than the rule that you use to describe the example. And I love that. I love all that stuff. It makes me so happy.

Michael Taylor (00:31:08)

Yeah. I mean, I do wonder how far it will go. If we're going to keep seeing this exponential improvement or if it's going to be an S-curve. Even if it does tail off, I think we'll have enough to keep us busy for 50 years, even if the models just stopped improving today. So I'm not worried about that, but. but yeah, like I always think about there's this thing people talk about billiards or snooker or pool. You can calculate where a ball is going to hit for the first bounce. But by the third or fourth bounce, you'd need more time than has ever passed in the entire universe to calculate where it's going to hit. And there’s something in that, you know?

Dan Shipper (00:31:56)

I first read that in one of the Nassim Taleb books. I can't remember if it's Fooled by Randomness or The Black Swan or whatever. And I love that because the reason why is because you have to start to take into account the pull of gravity of every object in the universe in order to keep— Because everything pulls on everything else. And so after one it's not that much pull. But after the fifth, it's like a lot. Same thing for personality prediction. If you can get 80 percent of what I would do, you might be able to predict the next thing I do. But once we get five moves out, things are widlly diverging. And I don't remember where I was going with this, but—

Michael Taylor (00:34:00)

Yeah. But essentially you're not gonna fit the gravitational pull of every object in the universe in the prompt. It doesn't matter how good the models get. It doesn't matter how good we get at pre-training or whatever. It is never going to completely specify— It's always going to be a simplification of reality, right? It's always a model.

Dan Shipper (00:34:23)

And then I think to your question about where things go. Obviously I don't really know either. But my feeling about this stuff is that we tend to collapse our view of ourselves and the world to reflect the tools that we have at our disposal. So as an example, when we only had watches and telescopes and calculus we tended to think of the human mind and the universe as sort of a mechanical clock and sort of operating according to Newtonian mechanics. which it sort of does, if you look at it that way. But also, if you look at it a different way, it does not work like that at all. But we were completely convinced for hundreds of years, basically, that that's how everything worked. And we just needed to do a better job of specifying or finding the underlying rules or principles by which the clock of our minds or the clock of the universe worked. And then Einstein came along and he was like, well, it's not really like that at all from this other perspective, or maybe this larger perspective or whatever. And I think something similar happens with language models where you look at a thing that can do a lot of tasks that you're used to being able to do. And you're immediately like, it's going to do everything I do because, you just sort of shrink your sense of self and sense of the world and sense of what you can do to that thing. At least initially, it tends to hide all the complexity of who you are and what you can do and all that kind of stuff. But I think over time, we will discover that there's a lot of things about us that it is not able to do even if it's operating at Ph.D. or post-Ph.D. levels. We just can't see what that is right now. But, I think that it's there. And that's another example of ways that new technologies cause us to see things differently and make mistakes we tend to make. And yeah, I really think, I’ve said this before on the show, but there’s this false peak thing with AI where every time you get to a new level, you're like, I can see the peak right there. And once I get there, that's going to be it, it's going to take over everything. And then you get to that peak and there's another horizon that opens up. And I think that that's true too. Humans are way more flexible and powerful than I think we give ourselves credit for in a lot of ways. And not to say that this technology is not amazing and powerful. And I totally think that too, but I think holding both is important.

Michael Taylor (00:37:08)

Yeah. Do you know the Bullshit Jobs book? The hypothesis that some 40 percent of all the jobs in the economy are basically unnecessary, like admin jobs. So he uses the example of a parcel being delivered to a military base, right? And 50 years ago the guy who was delivering the parcel, a package from Amazon, whatever it is, would just take it straight up to the soldier and then the soldier would sign for it, right? And today the guy has to go and put it in a holding space and then there's an admin, a manager, an office manager of the holding space who signs for it, fills in a form and then they take it. There's someone else who goes. And then they sign it off and then they have HR kind of training and— Basically when you kind of add all these extra people in, the cost is inflated by 40–50 percent. But the same task is still being done. There has been no improvement in the efficiency of parcel delivery. AndI don't know if I fully agree with this, but the interesting thought in my head when I read that book was, oh we already have universal basic income.They're like admin jobs. So yeah, maybe that's controversial, but I do think you see what Elon is doing in, in some of his companies or in the government now. And ripping out a lot of these jobs that actually don't seem to make a measurable difference. I mean, we'll see, right? What happens if he still keeps getting away with it? But, yeah, I mean, it's an interesting hypothesis. So, if you believe that then essentially all of the gains from the internet. And all the productivity gains essentially went to creating this extra 40 percent float. And then AI might just make that 80 percent or 90 percent.

Dan Shipper (00:39:15)

That's interesting. I don't think I agree with that fully. I think the interesting devil's advocate— You're not necessarily arguing that you agree. You're definitely not arguing that you agree with everything about that book. But I think the interesting devil's advocate is, one way to think about it is we make all those sorts of rules because we just want to waste things basically, but usually there's there is at least some organizational reason why there's some risk management thing where it's like, well, if the parcel gets delivered directly to the soldier, then XYZ bad thing happens. And so we need to create more processes or whatever. And usually I think the interesting thing about human organization is that we've needed processes or rules in order to facilitate collaboration between large groups of people. That system of rules just creates a lot of bureaucracy and middle management and things that everyone sort of complains about and doesn't really like, even the people who are doing it, they don't really like it, but we need that in order to coordinate masses of people. And I think one possibility for AI is that it removes the need for so many layers and rules and processes because AI can do a lot of the coordination stuff that usually would require lots of middle managers or and that kind of stuff. I think another way that I sort of agree with one of the conclusions of this hypothesis is: When I go to get my shirt tailored in Brooklyn they don't take credit cards. It takes a long time for people to adapt to new technologies. Even though we are not getting in as much out of this stuff as we possibly could. There's so much stuff that I'm sure is latent in the tool. And so, pure productivity, pure efficiency, it takes a really long time to filter through society and it's not— One of the things that I always say when I talk to people at big companies is I think AI is a really good test of what would happen if we had magic powers. We would just basically do nothing. Maybe we would create a working group where people like some people at the company would spend some of their time, figuring out what to do with our new magic spell-casting abilities. But, it would take them a year and they wouldn't actually figure anything out. And so I, yeah, I think I think we tend to feel or believe that just because we have a new capability that it just sort of gets integrated like super fast in society. And that's definitely not true.

Michael Taylor (00:41:57)

Yeah, I think where there is some kind of insight here is that a lot of the way that we organize people is based on a kind of organizational scar tissue, I guess is how I think about it. One person, one time did something really stupid and therefore now we have a form.

Dan Shipper (00:42:26)

My dad is always like one guy who tried to blow himself up with his shoes in 2006. And, now we have to take our shoes off every time. And it's like, yeah, that is, that is true. The shoe bomber, he ruined it for everybody, and he didn't even succeed.

Michael Taylor (00:42:33)

And we're afraid to tear down the fence in case something comes through it. I think it does serve a purpose, but I think it's like an emotional purpose and a sociological purpose. It makes us feel good that we have these protections in place, even if they're not actually doing something measurable. So I actually think that AI will increase them because the only reason we have—to take your example—the only reason we have the TSA scanning your bags at an airport, but not at a train station is that it's too inconvenient to do it at a train station. You couldn't do this in the subway in New York, right? The whole thing would shut down. Maybe they should. I don't know. I've had some issues recently, but as it becomes more convenient to make you fill in forms and do administrative tasks, I feel like eventually that's just going to balloon and you'll have my AI talking to the government AI and checking that everything's okay across 1,000 different forms.

Dan Shipper (00:43:30)

As long as I don't have to do it. I'm fine. Yeah, that's true.

Michael Taylor (00:43:35)

It's still a win, I guess, if it's automated.

Dan Shipper (00:43:40)

I think one of the things you're saying that I agree with is actually the limits of technology right now are not even the limits of what's technologically possible. GPT-3.5 could have taught you how to make a nuclear bomb, but it doesn't because like OpenAI made it so it wouldn't do that. And so there's all sorts of ways in which technology is limited by our tolerance for risk and everyone's tolerance for risk is different, so every organization has a different approach to what risks they're willing to take or whatever. To some degree, that's what makes baseball. I think that's actually good.

Michael Taylor (00:44:20)

Yeah, they can find the right answer, like collectively by saying, oh, that feels wrong. When they've released it completely openly, they get backlash or if they're too conservative, then they fall behind and it becomes a real race.

Dan Shipper (00:44:35)

Exactly. And it also provides opportunities for startups because what OpenAI can do is just different from what you and I can do. Every government in the world is watching them like a hawk. And no one's really watching us because it's small enough that like no one cares. And so we can take more risks and find more good things to do with it than a bigger company can. So another way in which I think people were wrong at the outset of AI stuff is to be, oh, incumbents are just going to win. And it's like incumbents always [bleep] it up. And it's about risk taking. and it's not their fault. It's not like they're stupid. It's just massively difficult to take real risks when you have tons and tons of customers in a big organization and all this pressure. It's just not a good environment for that.

Michael Taylor (00:45:30)

Yeah, it's really rare that they managed to do something good. And it's not through lack of trying, right? They all read the disruption stuff, right? The literature. And you’ll see it happening and you can see them trying their best to do it, but I would say actually they're doing— I think the scorecard is pretty good. I know people talk about Google and falling behind and all this stuff, but I would say big companies are innovating faster today than they ever have. And there was one example that really jumped out in my mind where I went to a conference. It was two months after ChatGPT came out. And it was a B2B enterprise— It was all Fortune 500 companies. And they had NASA there and talking about AI and stuff like this. And they said, who's working on generative AI projects/ And 80 percent of the hands shot up. And I was like, this is two months after it's pretty amazing. And just contrast that with like, I was working in growth marketing, growth hacking with my agency that was a growth hacking agency. We started in 2014 and it took four years for our first Fortune 500 client to Google the term growth hacking. We were number one for growth hacking on Google and yeah. It was like four years before we got our first enterprise client.

Dan Shipper (00:47:02)

Yeah, or mobile it took years. Or even the internet, it took years and years and years and years. Yeah, that's interesting. I do think they're getting better at it because they thought they've been learning their lesson in a lot of ways. 

So I want to talk about Cursor, but before we do that, I want to bring this whole conversation back around to Rally because we haven't shown people Rally. And so if you feel like showing a demo, I think it'd be really fun. I just think it's the coolest piece of software.

Michael Taylor (00:47:20)

Yeah. So we talked about the script earlier, about how I was A/B testing your headlines— multimodal media vs. metamedia. And what I've done in the past couple of weeks is put a basic user interface on it. And I have that live now actually with a waitlist, but it's askrally.com. And this is the main thing I'm working on now just to see how far we can push it because I'm getting surprisingly good responses when I run this. And I've got maybe 20 of my friends checking out now. So yeah, we could try having a look on the call as well.

Dan Shipper (00:48:05)

That's great. Yeah, let's do it. So basically, for people who are listening, instead of watching, the Rally screen is at the top. You see something that says your audience and you can pick an audience. So millennials, manufacturing CEOs, there's an audience for the general population. There's an audience in there that's like Hacker News, audience, dog lovers in Dallas.

Michael Taylor (00:48:23)

That's my test audience. I always use it because it's the best thing I can think of.

Dan Shipper (00:48:30)

Let's do Hacker News. So if we select Hacker News. And then it's like, what would you like to ask the audiences? There's a text box. So it's sort of who wants to be a millionaire but for real, anyone can do it. You don't have to know Regis Philbin. So do you have a question or should we use—? I mean, I can come up with one. Should we use the metamedia one?

Michael Taylor (00:48:45)

Yeah. Well, maybe do a headline test. Do you have like a couple of headlines for the latest post? 

Dan Shipper (00:48:58)

Yeah, totally. So we just did a launch. We just did a new launch where we said every now and then includes Cora, which is our email app. And basically as part of the Every bundle. You pay one price, you get access to Cora and all the other stuff we make. Okay. So the current headline is “The Every bundle now includes Cora,” which I don't really like that much. One thing I've been thinking about is, whether or not we should call it a bundle or just. The Every subscription or just Every. People don't say the Prime bundle. They just say Prime. So, membership is an interesting one. So, maybe a B would be like, “Every now includes Cora. Manage your emails with AI.”

Michael Taylor (00:49:40)

And then like the capitalizations on each one.

Dan Shipper (00:49:45)

Yeah. We usually do all caps basically, but, I don't think the caps make that much. But that'd be a good A/B test is, do the caps make a difference on what people click on. 

Michael Taylor (00:49:50)

We can do that. I used to test stuff like that in the agency.

Dan Shipper (00:50:05)

Yeah. And then the last one would be, “Every now includes Cora, the most human way to manage your inbox.”

Michael Taylor (00:50:20)

So, “Every now includes Cora, the most human way to manage.” My wife always complains about the loud sound of my typing because I'm a heavy typer. You're probably hearing that now.

Dan Shipper (00:50:45)

There's that George Clooney movie, Up in the Air, where he says, or she says like, I type with purpose. I always think about that.

Michael Taylor (00:50:50)

There you go. I need to use that line. Cool. So we have three different examples, and this is just sending a prompt to ChatGPT. So we were going to kind of tell it what we want out of this. And in this case, you're going to say, “Which of these headlines would you click on?”

Dan Shipper (00:51:10)

I want to know if they would open it. Because it's going to go to their inbox.

Michael Taylor (00:51:15)

Okay, “These are email subject lines for the publication Every. Which of these headlines would you open?” Okay, so we'll just keep it simple there. And then I'm going to put it in voting mode, which just basically just gives it a tally at the top as well.

Dan Shipper (00:51:30)

Oh cool. I don’t think I've seen voting mode. That must be new—

Michael Taylor (00:51:35)

Yeah. Just added it. Thank you, Claude.

Dan Shipper (00:51:40)

That’s the other thing I love about this era of AI tools is everyone's just shipping so fast. You're just like, oh yeah, I added that in an hour, two days ago. I had the idea and I just did it. I love that.

Michael Taylor (00:51:45)

Yeah. And it's definitely really appealing. The problem is when you have to refactor everything.

Dan Shipper (00:52:00)

And then the UI gets messy and the whole thing is—yeah. 

So basically you typed in a prompt and then basically it spun up an audience and then it asked the audience that question. So, which of these three would you click on? And then what you can see basically returns the results and the results are “Every now includes Cora, the most human way to manage your inbox” by far the most clicked on, opened one, which is really interesting. And “the most human way to manage your inbox” is the one that we've been going with. So that's cool. But also in the sidebar, you can see each individual simulated person, what they said. 

So, “The phrase the most human way to manage your inbox stands out to me because it's just a more personal approach, which resonates with my values about technology and its impact on our lives.” So, that's what one of these AI people said about why they clicked on it. And then you also have this summary thing where it's like, it summarizes what everyone says. So, “The feedback indicates a strong preference for the subject, ‘Every now includes Cora, the most human way to manage your inbox.’ The choice resonates due to this emphasis on balancing technology with a human touch, which aligns with the values of the individuals.” 

So, I think that's really interesting. And I think we would get different results for example, if we change the audience or, another thing that I know you're building, which I really want is— I don't want it to be testing them against each other, I want them tested in the context of someone's inbox and the kind of emails they usually have to see which one of them is clicked more or opened more. And I know that that's coming, but, yeah, this is so valuable. Before you'd have to throw it out into the world and that costs time and money. And if you don't have an audience it's really hard if you don't know the people who it's going to be going to whatever. So, yeah, I love this thing.

Michael Taylor (00:53:55)

Yeah, exactly. And then once you, once you have a sense of whether you're asking the right questions or if you really disagree with the result or you really agree with it, you can then go and validate with the real test, right? So it just kind of gives you more options and more room to navigate these different creative decisions.

Dan Shipper (00:54:15)

Right. Obviously, disclaimer, this could be wrong. It's just another piece of information.

Michael Taylor (00:54:20)

Exactly. It’s another opinion. And this is still pretty early as well. So we're working on getting all the evals set up and doing the rigorous testing against real human responses. So that's coming too. But yeah, that's like a long road. And yeah, there's a lot of really cool stuff you can do. The thing I'm working on right now, I showed you earlier today, is a Hacker News simulator as well. So you've got all the Hacker News links from today and then you just insert your headline in there and just see if they click on that one. And we'll see how that goes.

Dan Shipper (00:55:00)

Yeah, that's the best. So. you're using Cursor for all this stuff. Tell me about that.

Michael Taylor (00:55:05)

Yeah, so I've tried a bunch of them, right? I tried Windsurf. Heavy copy-and-paster here. Initially an early adopter of Github Copilot. And the reason I like Cursor and I keep coming back to it is just that you can hop quite quickly between very low-level and very high-level. So my typical flow is I'll actually talk through an idea. I'll record myself just kind of blabbing about the thing I want to build because I don't really know what I want to build yet.

Dan Shipper (00:55:45)

Like, in a voice note?

Michael Taylor (00:55:50)

Yeah, quite often it would be, just record it. I'll use Descript just because it's like what I have on my computer for recording video. So I'll just record that. It doesn't really matter what you record it with, but then I'll transcribe that. So you can upload that to Google Gemini, the AI studio, and it's pretty good at transcription. So I tend to do that and then I'll have a transcript of me blabbing and then, I'll stick that into Claude and kind of get a PRD, a product requirements document. I'll edit that quite a lot. And that's where I think a lot of the prompting.

Dan Shipper (00:56:25)

Claude vs. 01? 

Michael Taylor (00:56:30)

Yeah, I tend to use Claude still over o1. I think o1-max is, I'm sorry— They changed all the names, but the $200 version, I do use that. I'm using that more and more. But I still find Claude is more creative is that the only way I can describe it? I use 01 to settle problems. It’s the big daddy that comes in and just fixes the problem when it's like getting in a loop and doesn't know what to do. So I'll take that PRD and I'll run the Cursor agent and let it just completely go nuts and create any files at once, do whatever it likes. And then I'll kind of look back and see what I have and if it works and I'll change a few things based on the general vibe, but when it gets stuck, which is actually pretty often, then that's when I kind of jump out to o1 in some cases and say hey, here's the context, here’s where it's getting stuck, here's what we've tried, and I'll just kind of copy all that and then and then get the answer back.

Dan Shipper (00:57:37)

But you say you're editing the PRD that comes out of Claude before you put it into Cursor. You're not just YOLOing it. Tell me about that. Because I like one of my failure modes when I do this stuff is sometimes I just YOLO it because I'm just too lazy and it kind of gets [bleeped] up in it. And yeah, tell me about that.

Michael Taylor (00:57:57)

Yeah, I found that quite a lot. It's actually similar to what we talked about earlier with the evals where what I'm looking for is, what do I strongly agree with and what am I allergic to? So I'm scanning, mostly. I'm not like fully reading in a lot of detail, but I'm just looking for things. It's actually similar to if an employee sent me something and I'd be like, I'm going to scan it and just see if there's anything that really upsets me. And if not, then it's fine to go ahead. But quite often, especially when it's a new feature that I haven't really thought through. Then I'm like, oh, I see that it's tried to do this and that feels really wrong to me. So I'm going to change my whole, whole approach. Actually that was like why DeepSeek was really good actually is I used that a fair bit over the past past week. But the ability to kind of see the thinking and just see, oh, it's thinking wrong. I must've got something wrong with my prompt. It's trying to create a new file, but I already have that file. I just needed to put it into the context. So it's really about refining— It's just me figuring out what I actually want and then once I'm fairly happy with that, then it's much easier to build the stuff.

Dan Shipper (00:59:10)

What do you think of o3? Are you using it at all?

Michael Taylor (00:59:25)

I did try o3-mini a bunch today and I would say it's an impressive model. It's surprisingly fast, which was my first impression, for a thinking model.

Dan Shipper (00:59:30)

Which makes a big difference. It’s not trivial that it's fast.

Michael Taylor (00:59:35)

Yeah, I find I lose— I don't know if you find this, but I lose a lot of productivity because I just dump my thoughts into o1 and then just say like, oh, I’m going to have a coffee. But, there's only so many coffees you can have in one day. So then I'm browsing Twitter and I'm talking about politics. I look up like 30 minutes later, like, oh yeah.

Dan Shipper (00:59:50)

I have the same thing. I did that this morning with Deep Research. I posed this really big, deep research question. It was like a page of stuff. And then I went out and got coffee and just, I knew it was working. It was so weird.

Michael Taylor (01:00:00)

I need to find something to do in that thinking time.

Dan Shipper (01:00:05)

Maybe just enjoy yourself. How about that?

Michael Taylor (01:00:10)

I did hear someone say that the other day they were like, oh, it's really nice to have regular breaks now with AI whereas before I was thinking the whole day. But no, I mean, It reminds me a lot of my first internship. It was in the UK government for benefits—Social security is the equivalent. And we were writing analysis code, basically, in SAS, which is a very old language. And we'd write the code, and then we'd send it off to a central server. And it would take a couple of hours to run. So I basically did no work that whole internship. I was just like, sorry, my code's running and it's like, I think there's an XKCD comic that makes fun of this as well, my code's compiling. But yeah, it feels like we're back in those times again.

Dan Shipper (01:01:05)

Back to the future.

Michael Taylor (01:01:10)

Yeah. I gotta find a way to be productive. I was thinking about maybe I should start double-teaming Cursor, so I’ll have my blog writing in Cursor, in one tab and I'll be creating blogs and talking about those. And then, while o1’s working in the background and the other thing.

Dan Shipper (01:01:30)

I like that. I was doing that a lot with Devin when I was using it. I would have four Devins at once doing different things. And then, if you have four of them then every minute there’s something it needs help on. And so you're just constantly kind of tending it. And it reminded me a lot of when I was in college, I used to play online poker. And I was never that good, but good players can play four tables at once or whatever. And so, sometimes I would do that because, whatever, I was in college and I was an idiot. But it felt like that where it's you're doing all these different things all at once, but each one only requires divided attention so you can get it all done. And I think that's so cool.

Michael Taylor (01:02:10)

Yeah. It's like the chess grandmasters you see in Washington Square Park and they're playing four games at once. Yeah. It does feel like that, but yeah, I wonder how much we're really getting done when we do that.

Dan Shipper (01:02:20)

I don't know. Well, I mean, to some degree, running a company is that. It's very similar, where I'm constantly just— I'll spend like three minutes looking at something and go on to the next thing or whatever. So I just think more people are going to be working that way with all of its benefits and all of its trade-offs. So we're going to have to figure out how to manage that.

Michael Taylor (01:02:43)

Yeah. What I really want to try and get good at is doing it while I'm on a call—obviously not this call, but I find it quite often. Maybe this is just me but, I'll be on a call and it could have taken five minutes—the actual relevant part of it was five minutes. And I don't even need to take notes anymore because AI is listening and taking notes for me. So that's the thing. I haven't quite gotten good at it yet. But yeah, that's the dream is, I can be managing Cursor in the background while I'm, also—

Dan Shipper (01:03:20)

Also, it's also just, I'll have my agent call your agent and we don't even need to get on the phone.

Michael Taylor (01:03:30)

Yeah. One of the really out there ideas that I would love is—Maybe I'll just do it for fun on the weekend. But I'd love a negotiation-settlement kind of agent where I give it my red lines and it's a trusted third party and you give it your red lines and say this is the price I'm willing to pay. This is what I want to argue for. And then it just goes back and forth for virtual negotiation. And then it comes back and says we've settled at this price. It's the best you could get.

Dan Shipper (01:03:58)

I think it's cool. And then, and then it's like, you have to also decide how much you want to fund the agent. How many compute cycles does the agent get to get to a result?

Michael Taylor (01:04:10)

Yeah. This was a big thing with the early agents, like Agent GPT. I don’t know if you ever played with that back in the day. I say back in the day—it was like two years ago. It's still going, but yeah, like I remember seeing a talk by those guys and they had. It really stuck out to me because they had LLM judges in the CI pipeline. So whenever they pushed code to that repo they had AI agents checking the code and making sure it didn't break and running tests. And that, at the time, was kind of radical. Now there's products that do that. but they said they were spending $200 every time they pushed a commit.

Dan Shipper (01:04:50)

That's crazy.

Michael Taylor (01:04:55)

That's crazy, but also very now that probably that same commit probably cost $2, right?

Dan Shipper (01:05:00)

Totally. Probably less.

Michael Taylor (01:05:05)

Yeah, I do think it makes sense. There is a case to be made that you should just be spending as much money as possible in AI because we're almost definitely being too conservative.

Dan Shipper (01:05:20)

Totally agreed. Well, that is a great place to leave it. This is a great conversation. I'm so glad we finally did this. I'm so excited for Rally. I'm so excited to get to work with you. If people are looking to find you and your work online, where can they find you?

Michael Taylor (01:05:30)

Yeah, on Twitter. I'm @hammer_mt. It's probably the easiest place to find me. And then the book that I did for O'Reilly was Prompt Engineering for Generative AI. It's on Amazon. It's got an armadillo on the front, so it's easy to spot.

Dan Shipper (01:05:35)

Awesome. Thanks for joining.

Michael Taylor (01:05:37)

Cool. Thanks man.


Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

We also build AI tools for readers like you. Automate repeat writing with Spiral. Organize files automatically with Sparkle. Write something great with Lex. Deliver yourself from email with Cora.

We also do AI training, adoption, and innovation for companies. Work with us to bring AI into your organization.

Get paid for sharing Every with your friends. Join our referral program.

Find Out What
Comes Next in Tech.

Start your free trial.

New ideas to help you build the future—in your inbox, every day. Trusted by over 75,000 readers.

Subscribe

Already have an account? Sign in

What's included?

  • Unlimited access to our daily essays by Dan Shipper, Evan Armstrong, and a roster of the best tech writers on the internet
  • Full access to an archive of hundreds of in-depth articles
  • Unlimited software access to Spiral, Sparkle, and Lex

  • Priority access and subscriber-only discounts to courses, events, and more
  • Ad-free experience
  • Access to our Discord community

Comments

You need to login before you can comment.
Don't have an account? Sign up!
Every

What Comes Next in Tech

Subscribe to get new ideas about the future of business, technology, and the self—every day