Transcript: ‘How Packy McCormick Finds His Next Big Idea’

'AI & I' with the influential tech writer and investor

15

The transcript of AI & I with Packy McCormick is below.

Timestamps

  1. Introduction: 00:01:24
  2. Packy’s thesis about the future of technology: 00:02:40
  3. What Packy quick takes on your crypto portfolio: 00:07:42
  4. Use LLMs to validate your understanding of complex concepts: 00:14:31
  5. How Packy used Claude Projects to write an essay he published recently: 00:18:26 
  6. Packy’s process to make interactive visual graphics for his essays: 00:24:00
  7. How to use AI to be thorough in your research: 00:31:10
  8. How Packy uses Claude to edit his writing: 00:35:04
  9. The tools Packy uses to create his newsletter: 00:36:44
  10. Using Claude Projects to make a tool that grades Packy’s essays: 00:44:12

Transcript

Dan Shipper (00:01:24)

Packy. Welcome to the show.

Packy McCormick (00:01:25)

Great to be here, Dan. Excited for this.

Dan Shipper (00:01:29)

Really good to have you. So for people who don't know, you are the founder of Not Boring. You write some of the premier essays and analysis of tech. You have about 200,000 email subscribers. And I think you've just been one of the main voices of this era in tech. And in addition to your writing at Not Boring, you also have a fund, Not Boring Capital. And you've been a good friend in this whole journey. So it's really fun to have you on the show.

Packy McCormick (00:02:00)

I mean, thank you for the intro. It's great to be here. I don't think there would be a Not Boring if it weren't for that group chat that we had on Telegram with all of the writers. The first day that I walked in and saw you in there, and saw a few other people in there, I was like, I've hit the gold mine of internet writers. This is amazing. And I really don't know if there'd be Not Boring without that, so it’s fun to bring it full circle here.

Dan Shipper (00:02:22)

Yeah, I love it. And I want to get into some of your AI use cases. You have some interesting ones that I really want to talk about, but I think before we do that I just want the update, almost as a friend and an admirer of your work, what's going on with Not Boring? Where are you right now? What are you thinking about?

Packy McCormick (00:02:40)

To reuse the phrase full circle, one of the first things that I ever wrote before there was Not Boring, just when I started writing was this piece on natively integrated companies, internet-slash-vertically integrated businesses. And I've really just gotten obsessed again with vertically integrated businesses, calling them kind of techno-industrials. I think we've talked about AI and how it applies to analyzing some of these, because they're just these big complex beasts, but I really like writing about complex things that, if they work, turn into really good business models and businesses that make an impact. And so more and more on both the writing and investing side, I'm trying to converge a little bit on really just diving deep into some of those companies, which means that— I'm writing about a crypto company next week and that was a complex one that just nerd-sniped me and continues to nerd-snipe me. But more and more I've written about Anduril, a company Fuse Energy, that's doing radiation effects, testing fusion energy, a mining company, Earth AI. And, I just love this excuse to dive into these industries that I should know more about and don't when I start the process. And so trying to orient as much as I can around the investing and writing in those types of companies.

Dan Shipper (00:03:54)

That's interesting. What characterizes a techno-industrial company? Is it frontier tech?

Packy McCormick (00:04:00)

So that I have cheesy— I don't love the name techno industrial, but I needed something that's different from deep tech or frontier tech. Because I think there's a lot of deep tech that is an incredibly cool kind of science project-type companies where they're trying to make something that has never happened before happen and there's scientific risk there and it may work, it may not work. And then there's a kind of stage beyond that where some of that stuff has already been invented and then you integrate it into a company that can deliver a product that is kind of the thing that I look for. Deliver a product that there's a ton of demand for already better, cheaper, and at higher margins, which then you can reinvest back into R&D and you can reinvest back into growth and all of those things. I think to use the cliche one, I think Anduril is by far the best example out there right now. SpaceX, I think, kind of kicked off the hard-tech revolution, both in terms of as an example and the talent diaspora from that company, but the space launch competitors kind of sucked. And so obviously zero knock against SpaceX—it's the most impressive company in the world right now. But I think that, to me, the interesting unlock about Anduril was that they said we have this different approach to defense and we think we can actually take on these huge hundred-billion plus dollar defense primes that kind of just operate in this cabal, these five companies that dominate everything. And we think that if we make smaller, cheaper, more charitable hardware, but we infuse it with software, then we can deliver a better product to the government at better prices and cheaper margins and take on some of that risk ourselves, as opposed to doing it cost-plus, and it's kind of working now. And I think that kind of thing is going to happen in a lot of other industries. And so that’s the thesis.

Dan Shipper (00:05:42)

Interesting. And then what is the why now of that? What's making those companies potentially important or impactful right now?

Packy McCormick (00:04:41)

Yeah, I mean, I think one is they're fundable. And so, there's still a bunch of them that we look at. I still think investors are like, I don't know, that seems like a lot, and a lot has to go right for this to work, but they are fundable now. I think AI is one of the big why nows. I think, if you look at something like nuclear, for example, there's public sentiment shift. There's this shift from this kind of construction—this belief that every nuclear project should be a gigawatt reactor that is a construction thing—to a manufacturing process. And it's like starting to see enough companies do that the fuel supply chain is strengthening a little bit. And, there's just a bunch of things kind of coming together. The why now, I think— It's hard to do a general why now on these and one of the things I like about it is that there are specific why nows. I think one of the big why nows is that now you just feel in a bunch of different categories, there is sclerosis among the incumbents and there's the opportunity for a company to come in and do something better. I mean, someone should replace Boeing, as one good example. That mining company that we looked at, the interesting reason is that it's just gotten a lot more expensive and harder to discover new rare metals and new critical metals. And so you need kind of— This to happen in the history of oil development, the history of mining, where you kind of get everything that you can at one technological level and you hit a wall and people are like, oh shit, we're going to run out of this stuff. We're going to hit peak oil. We're going to run out of critical metals or China's going to get all of them. And then someone comes in and innovates and then you can kind of get to the next level of something that used to be more expensive that now you can make economical. And so it's different per category, but I think infusing software and hardware is probably the big kind of overall trend.

Dan Shipper (00:07:40)

That's really interesting. And then and just selfishly, what should I think about my crypto portfolio right now? I'll say I'm HODL-ing. I've got all Ethereum. But obviously there was a huge bull cycle, I guess in 2022-ish. And then it seemed like it was almost coming back like six months ago or four months ago. And then now it's sort of back off the menu, at least like in the spheres that I'm traveling in, which is only crypto-adjacent. What's your current crypto take?

Packy McCormick (00:08:12)

I'm also HODL-ing Bitcoin and Ethereum and SOL and a few other things in our portfolio.

Dan Shipper (00:08:20)

Is it hod-ul or hoh-dul?

Packy McCormick (00:08:22)

I don’t know. Hoh-dul is close to hold.

Dan Shipper (00:08:24)

I’ve only ever read it.

Packy McCormick (00:08:30)

Yeah, me too. But I've never said it out loud. And I understand why people don't say it out loud. Because it's kind of tough. But I mean, obviously not financial advice, and blah, blah, blah. But I have held on, I've bought more in the bear market. I was certainly excited when I got to see the prices of the things in my portfolio going up in that kind of last mini-little bull, but I think so much of that was exogenously driven and driven by the Bitcoin ETF and the anticipation of the halving and things that don't actually matter. A lot of the people, even then, that I talked to, I was like, oh, this is awesome. Is this the bull market? Is it happening? And they're like, no, nothing has actually changed on the product side. This isn't the bull market. And I think a lot of people are smarter than me and said 2025 is when— If it happens, it'll probably happen in 2025. But, I would think we're still waiting to see what those breakout things are going to be. 

This company that I'm writing about for next week, Blackbird, I think is a really, really interesting product that is kind of crypto and more just kind of a restaurant app. There's a company that I wrote about last week called Oncyber. That's one of our portfolio companies that makes it easy to build 3-D worlds in a web browser. It's another interesting one where they wrote their own kind of scripting language to build worlds on top of the game engine they had created to build this NFT gallery. But now the founder, I told him on a call what website to make, and he kind of described it and it wrote something in their scripting language in the ChatGPT bot, and then he plugged it in. And then he just had this kind of game-slash-world that got made. And so I think that's going to be really, really interesting. And they have crypto kind of payments, just. Baked into the site so you can turn on stablecoin payments. You can turn on NFT minting, a bunch of easy stuff. So I think hopefully we see a bunch of things where crypto is infused, but it's not the whole product in the cycle.

Dan Shipper (00:10:12)

Do you buy the argument that we're just starting to get to the place with L2s and other crypto infrastructure that we can really start to see actual breakout consumer apps in this next cycle. Where do you think the next breakout comes from?

Packy McCormick (00:10:30)

If I knew I would be richer than I am. So I don't know where the next breakout comes from, but I do genuinely believe that. In the middle of the bear market, I talked to a bunch of people who kind of re-inspired me on crypto generally and wrote this piece that essentially says, crypto gives you these things that you can't otherwise do. Up until now, you've had to make some real performance trade-offs in order to use those things. And so if you need those things, if you're launching a meme coin, or if you need to own an NFT that you can't own, there are specific things where that trade-off and performance is really, really worth it. But over time, the performance trade-offs drop and the capabilities you have either remain or get stronger. And so at some point you almost get a free option on all of those other things that crypto enables. I don't know if it's this cycle or the next one when that kind of flips, but I think just over time as performance gets better, as things get cheaper, you just get stuff for free with crypto that will be useful to include in different products.

Dan Shipper (00:11:30)

And what are your thoughts on the— I think there's this sort of galaxy brain take— I think it might be Fred Wilson, who's one of the main people behind this. But just generally, there's a thought that as AI enables the proliferation of lots and lots of content that may or may not be trustworthy, it creates more of a job to be done for crypto to verify identity, to verify who's saying what. Is that on your radar at all? Is that part of your thesis?

Packy McCormick (00:12:10)

It’s a medium part of my thesis. And I think it's one where it's a big part of a lot of people's thesis. I think Fred has a very nuanced take on it. I think there's a general kind of crypto AI thesis that is like, oh my god, crypto and AI, of course it's going to happen. And crypto AI coins have taken off, whether or not there's an actual kind of real product behind them and all of that. So it is a real narrative in a meta. But the thing that I'm still trying to figure out is, there is a tipping point where, at some point, to have to sign something— If you're like the only person who's signing something and proving that it's real, it doesn't really matter. if no one is signing their transactions cryptographically, then there's no expectation that other people will. So I don't know where that tipping point comes from. I do wonder if an X integrates it at some point, or if platforms start to integrate it. Maybe I do think we wake up in a world in 10 years and that's probably something that just happens, for people to prove it. Obviously, Worldcoin doing the proof-of-humanity thing and others doing proof-of-humanity or decentralized ID. I think all of that is interesting and will be valuable. I just think there's a lot of stitching work that needs to happen in the middle to get to a point where it's just kind of default. If you're sending something as yourself, you're signing that thing.

Dan Shipper (00:13:28)

That makes sense. So I want to start getting into your uses of AI. And the way I want to do that is just sort of taking a step back. You, as a person, have this sort of incredible amount of curiosity. You love thinking about really, really complex companies and spaces and things. And writing long-form essays about how they all sort of fit together. And so your job is almost to be really curious about the world and about companies, do a shit ton of research, write about it, and use that to kind of crystallize a worldview that then you go and invest on. And I really want to understand. I just feel like AI as a tool, it sort of supercharges that whole process. And I just really want to understand how you've integrated it to help you do that core thing that you do. That's sort of being curious, doing a lot of research, writing, and then using all that writing to create a thesis that you then use to invest.

Packy McCormick (00:14:29)

Yeah. So I think it's evolved a little bit. So, with GPT-3 and ChatGPT, it started out just as I was writing something, I wanted it to give me facts and all of that. And it didn't do that predictably well enough that I could use it. But if I were writing about fusion or doing this podcast that we did, Age of Miracles with Julia DeWahl, where we looked at nuclear fission and fusion, there'd be a lot of things where I'd be writing a script for that. Or if I was writing a piece where I'd be like, am I getting X, Y, and Z about fission right? Or, explain what I'm getting wrong here. Or is there a simpler way to say what I'm trying to say here? So it was a lot of going back and forth and doing kind of, do I understand this? Or, here are the different types of fusion fuels. Am I missing any or different things like that, as I was exploring? And I still do that for sure. As I'm using the chat tools it evolved into, I started using ChatGPT as my editor. And I'd say it's an okay editor, not a great editor. It's not going not going through and doing copy-edits or line by line. It's not really tightening my prose. What I've started using recently though, I've started using Claude 3.5 Sonnet and now projects. And so Sonnet, I just think it is at least—it sounds nicer as an editor. And it makes me believe that it's giving me better feedback than ChatGPT. I asked for feedback in a grade just cause I'm vain and every time, pretty much without fail, the grade starts at an A-minus and then we'll get to an A the second time I send something and then we'll get to an A-plus the third time. I said, I would also like not to change something and it'll still probably go through that same progression.

When I'm asking for feedback, if I send it the same exact thing. And I'm like, alright, give me a grade on this one. Somebody else that I sent it to said that this was one of the issues with it. Almost invariably, it will really emphasize that issue and then give me a B or B-plus if I like, so it's not the most reliable editor, but it will say this section is too long and it drags and I'll go back and read it. And that section is too long and it drags, or I do think it's too defensive or wants you to like really examine all the other sides and counter arguments to what you're saying, which I think is a valuable thing from a valuable exercise for me to do as I'm writing, and if there's something that really does poke a hole in my argument then I'll go back and kind of try to incorporate that. But, it really, I think, over-emphasizes the defensiveness in writing, which is okay. But I do think it's useful. And I think it's just become almost a habit now. If I'm stuck for a second, I'll just throw it into Claude as a thing to do to do something while I'm feeling stuck. Projects I've started using over the past couple of weeks and they've been a revelation and that's both on the writing side and the investing side where I—Take the piece that I'm writing on Blackbird, I dumped all of my notes with the company in there, and dumped the materials that they sent. I was studying— When I write one of these pieces, I'm trying to understand the industry that they're operating in. So try to understand the restaurant industry, put analyst reports in there, put blog posts that people have written in there, just dumped as much content as I could everything that I'd normally put into a Notion as I'm going to research, I put in the Notion now, but then I also just dump the text into in a Claude and then I'll ask you questions as I'm writing. I'll be like, cool, here's the section that I'm writing. Here's what I wrote. Is there any interesting data anywhere that I should be using that I haven't thought about using? Or are there quotes from X, Y and Z investors that I talked to that might be useful here? And that's been really cool to be able to do. Or, here's my full investment memo on this company. Am I missing something really important? Or if you’re an LP, looking at this, where are you poking holes in my thinking? So giving it that context and then asking it to analyze the work given that context, I think has been really, really valuable.

Dan Shipper (00:18:30)

That's really cool. I want to see that. Can we look at your project for Blackbird? 

Packy McCormick (00:18:33)

Ooh, I don't know if there's anything secret in there, but alright. So here on the right hand side is a bunch of different stuff that I was reading about Blackbird. And so that could be Fred Wilson's original blog posts, a couple of those, Jay Drain at a16z wrote a blog post. It could be their press release. It could be blog posts that the company's written. An interesting one that they did recently, they just dropped their white paper, the Fly Paper. And so that one's really interesting to say like, alright I'm going to read this and there's actually a little bit more approachable because it is more of a consumer and restaurant-facing product, but for a typical crypto white paper to be able to have something where I can just ask a question as I'm going through and ask it different things is super, super valuable. And so one thing that I did here was ask, “Please explain Blackbird’s economics in a style that's a crossover between Not Boring and The Diff,” or— I played with doing The Diff recently. And I always have to tell it to not do it like a caricature and then it'll do it like a caricature and then I'll go back and be like, you did it like a caricature and then it'll write something kind of good in my style. And I don't take that, but I'm just looking for inspiration for different angles that I might not have thought of, but I feel like putting The Diff in there, in this one in particular, they had a couple of explanations that I was like, yeah, I don't know if I would have explained it like that, but that's actually a really good explanation. And so just use this as a way to go back and forth, say, okay, this is still a bit too caricature and over the top and then use it to go back and forth, kind of almost describing the thing that I'm reading back to me in my own voice, in my own way that I would write about it. And it's not again, perfect, but it's close enough that it does make it sink in more than just kind of looking at a regular white paper would.

Dan Shipper (00:20:38)

I just want to stop you there. I think it's so important and misunderstood how much summary is a part of good writing. And in particular, how much summary is important for writers to even know what they're talking about. And then for writers to give to readers to set up the point that they want to make. That's a huge, huge, huge part of writing. And for something like that if you scroll up like a summary of tokenomics, for example, it might take you a day or whatever to really deeply do that, even if you generally get it, even if you've read all this stuff and you have it in your head, to actually put it into like a crisp, clean format could take a day or two at least. And one of the beauties of Claude projects, but just generally this generation of AI is, particularly for complex ideas that you already understand, it just saves you that summarizing time in your writing and in your thinking, so that you can get to the more interesting stuff. And yeah, you're probably not going to just copy-paste this exactly, but it's surprising how much just having it in that form where it's basically, in the context that you need it, just speeds you up, just get through that part of it so that you can do the actual interesting thinking afterwards.

Packy McCormick (00:22:05)

1,000 percent and like if I do summarize, being able to say— Because I'm always debating should I do more? And probably too often, I do more than less. But should I do more here? What am I missing here? Did I get anything wrong? Or like put it another way, if you're a Hacker News commenter, what is wrong with how I just summarized this thing. And I do think there's a danger to that. And, same with the way that they give feedback when I asked for feedback in a grade, which is, I think there have been a couple of pieces probably that got a little like more watered down than I would have wanted them to because I have this critic there that otherwise wouldn't be involved in the writing process. And so I think it's a balance between just making sure that I avoid really obvious errors and then kind of watering down or pulling back my argument too much because I have a Hacker News commenter kind of all of a sudden in my ear. And so that is like an interesting balance to try to play with.

Dan Shipper (00:23:09)

I do that all the time. I mean anytime I'm writing about something that's technical, I just have this voice in my ear that's like, you're not really a great programmer. Do you really understand this? You don't have a Ph.D. And I love Claude for that because it just gives me enough confidence because I can just put what I'm saying in there and just be like, okay, pretend you're like a Ph.D. computer scientist, what are the holes? And it might not get everything, but it gets enough that I just feel comfortable that I'm not saying something completely stupid. And I feel like that's so important for writers to be able to like having that resource. And previously it was only available to people who had big editing teams or a lot of resources or whatever. And it's just the coolest thing.

Packy McCormick (00:23:55)

Totally. I wrote a piece a few weeks ago called “The American Millennium.” And I had this idea that I wanted to play with that was: One of the interesting things about America is that the system works so well and the entrepreneurial drive works so well and all these things that kind of no matter who's in the White House, as long as rule of law and the constitution are maintained, we might not suffer the fate of the Romans because people can just build things and prevent the decline. And so I wanted to show that at— First of all, play with the idea, but then show that. And so I asked it to code me up a little thing where I could show what happened to the vector sum of the kind of government and entrepreneurship on progress. And it just coded that up for me. And it's like a very, very simple thing, but I put it in the essay. And then I went over to Replit and just dumped the code in there and I had a little website that you would click on in the essay and play with it yourself. And so it's fun. This is not going to hold up to peer review by any stretch of the imagination, but it's just like another interesting thing where I never in a million years would have made this. Maybe I would have tried to draw a shitty version of this, but to have this little interactive thing that I could put in there because of Claude, that was, I thought, a pretty fun thing to be able to do.

Dan Shipper (00:25:23)

That is so cool. Wait, we have to see how you made this. So press continue chat. I want to just go up to the top. I want to know where you started this because I think this is just such an underappreciated thing where you're someone with so many ideas in your head. And suddenly you can just make something that's this new form or new representation of something that's been in your head that maybe previously you could have done but it would have taken a lot of work or it would have you would have had to pay someone and now you can just get something out of your head in this totally new way that's not writing but it's a visual representation or it's a it's an interactive demo and share with your audience. I just think that is the coolest thing.

Packy McCormick (00:26:04)

It’s really fun. And I've done this a couple of times. This is a slightly different one, but I used ChatGPT for this one when I wrote about America's tolerance for risk, or risk appetite. I had it judged through a bunch of different periods. What America's risk score between 1 and 100 was, and just asked it to make me a graph on that. And you have to disclaim the hell out of the fact that this is not accurate. It is just guessing all of that, but it does give, I think, a pretty good kind of rough guide. And it's a quick way to make a visual where I don't know how I would have come up with those numbers otherwise. And so just to be able to quickly be, here's a dumb idea, can you give me these numbers and then turn them into a chart? It's something that I probably wouldn't have done, but if I had, it would have taken a few hours and would have been equally kind of imprecise. And so it's a fun little thing that I've done a couple of times now. So this one actually started with, here's a thesis I'm playing with that I'd love your help thinking through. Since technology compounds and government doesn't, technology should overtake government in importance at some point. And so it gives me some ideas here.

Dan Shipper (00:27:12)

Can we read through some of the ideas? I'm sort of curious. Yeah.

Packy McCormick (00:27:19)

So, “You're right. Technology tends to advance exponentially. Each new innovation builds on previous ones. Government evolution: While governments do evolve, they generally do evolve at a slower pace. Relative importance: The idea that technology could overtake the government. An importance raised some questions. What do we mean by importance, political power, impact, economic influence? Is it a zero-sum game or can they both increase in importance simultaneously? What are some scenarios if that happens? What are some,”—again, always—”what are some counterarguments to this point?, and then historical perspective and then—”

Dan Shipper (00:27:47)

This is really cool. I think that's so cool actually. And I know I totally get your point. It's sort of annoying when you're just trying to figure out how to make this essay better or whatever and then it's, here are 15 counter arguments. Or one of my pet peeves is Claude always apologizes, even though I didn't say you did something wrong. I'm like, okay, make this better. And it's like, sorry you're totally right. And I'm like, you don't have to say sorry, it's fine, but I also think it's so cool that it's even presenting some of this stuff because, let's say it's not you and it's someone else and they're asking a question that presumes a particular worldview. There’s a lot of assumptions and it doesn't just say, here's the answer. It says, let's assume what you're saying is right, but here's a bunch of ways that it might be wrong or some things you need to think about, which is completely different, I think, from the sort of filter bubble world that we come from with Google, where it's just showing you the latest Breitbart screed when you search for something that sort of indicates that you want that, you know? And I love that. I think that's really cool.

Packy McCormick (00:29:00)

It might be for you. For me, as someone who doesn't ever get anything wrong, it's not particularly valuable. But—

Dan Shipper (00:29:04)

I love it. I love it. Yeah, I mean, I just don't know what it's like to have such god-like power. But, yeah, that's good.

Packy McCormick (00:29:18)

It's normally me correcting the LLM, but I mean— And then you can go in and—The other thing I don't want to do is spend all this time, like, oh my god, I have this really cool idea about technology and government. And then 75 people have written that exact thing before, so I'll ask it. And sometimes it gives me things that are real. Sometimes I'm like, oh my god, that sounds like the exact blog posts that I want to write and it's a completely made-up thing and it doesn't make any sense. And so I'm like, alright, cool. So there's that. 

Dan Shipper (00:29:48)

Did any of these help you? Let's read through them a bit. Like, The Sovereign Individual. Are any of these— Actually it, you kind of get it from— Or like, it helped kind of shape the argument for you?

Packy McCormick (00:29:58)

None of these did, actually. Yeah, none of these did.

Dan Shipper (00:30:04)

Yeah, that's interesting because I do think that is a sort of underrated thing for Claude, especially for writers or anyone who's trying to do something new. If you're truly writing something new, there isn't a pre-existing website that talks about this idea in terms that you're talking about it. And that makes the question effectively un-Google-able. It's like, who else has thought about this? You just can't Google that. You can try a couple of things, but it's just unclear whether or not you're going to find it. And because Claude or ChatGPT or any of these AI tools, kind of get the underlying concepts. They can move laterally through the entire space of human knowledge to zero in on people that might have talked about similar things, but in different terms than you. And that is the most valuable thing for people like us. I just love it.

Packy McCormick (00:30:35)

I agree. Yeah. And these ones— I hadn't read everything, but I'm familiar enough with the ideas from The Sovereign Individual. I've probably read most of the things that Marc Andreessen and Peter Thiel have written. And so I had a lot of it and so it jogged me, but there was nothing in there that was like, oh, that is a totally different way of thinking about this thing that I'm trying to think about. But yeah, there's been plenty of times where I'll be like, oh my god, I have this novel idea. And it’ll be like, philosophers have been talking about this for 17,000 years. Alright, cool. You saved me a lot of time. Thank you for your embarrassment. Thank you for letting me know. And it's also helpful because, I do think there are a couple of types of writers I'll put like Ben Thompson and Byrne in one category where they just have bigger, more structured brains probably. And like anything that comes out, they're like, oh, let me hang it on this piece of my framework over here next to all of this other relevant information. And my brain is like, that’s an interesting new idea. Even if I had thought about it yesterday, because I just have no memory. And so having something like this where I can just check those things and maybe in another brain just kind of exist, I think is really helpful.

Dan Shipper (00:32:02)

I agree. I feel the same way about my writing. That's something I've been trying to change. It's sort of like a sitcom. Every day is new, every other day is new. Whereas, people like Byrne or Ben Thompson, they're much better at having like a plot arc through the whole season, you know? And I do agree, having this tool as a throwback. I often will put in custom instructions, things about what I'm thinking about, or what are my key ideas right now? And trying to have it hearken back to those things or keep me on track is really helpful.

Packy McCormick (00:32:38)

Oh, that's really interesting. And then we actually get right into the government and entrepreneurship vector graph. Yeah. So out of there and I think this shows how kind of messy my process is. I'm asking about these different books and asking them to summarize. And I'm like, alright, great. Nevermind that. “How would you show a graph where government and entrepreneurship are vectors that can either work together or against each other?” And then it says, “That's an interesting visualization concept. I can help you create a graph to represent this idea using a react component. We use a simple coordinate system where the x-axis represents government influence and the y-axis represents entrepreneurial activity. The angle between the vectors will show whether they're working together or against each other. Let's create this visualization.” 

And so then it goes ahead and actually creates the visualization, and it has a direction in this one, but not a magnitude. So this is great. I also want to be able to adjust the magnitude of each and look at what the combined effect—let's call it progress—looks like. And then it says, “excellent idea,” which I appreciate. That was a pretty good idea. 

Dan Shipper (00:33:45)

At least it's not apologizing.

Packy McCormick (00:33:48)

Yeah, I'm so sorry. And then it gives me something here to kind of play with. And then I just— This is a really long chat. I'll ask for feedback on different things. I’ll ask this particular thing about the graph that I think he got wrong. And it’ll explain something to me and then make an update to the little visualization based on kind of almost thinking through what makes sense. I think the preview isn't loading. Then it does, I apologize. Then it does apologize for the inconvenience of the preview, not loading. And then I asked for a simplified version because it gives me a choice between a simplified version, code explanation, alternate visualization, conceptual explanation. So I just want the simple version and then it gives me that. And then it totally, totally messes it up and makes it even worse. And then I say, I do want magnitude back and we try that again.

And so just kind of going back and forth through this and then it gives explanations on the graph of what the different things mean, but this is a really long one. So somewhere within here, and this is before I was using Projects. This is just when there were Artifacts, but not Projects yet. And so I tried to keep everything in the same chat so that it had a context on our whole conversation. Now I would probably have— This is one separate thing. I have a separate kind of editor project that will go in and edit things. Cause I got so sick of writing like, “Hi, I'm Packy McCormick. I write a newsletter called Not Boring. You're the top editor from a world-class tech publication. I've hired you to edit stuff.” That is now just like it's stored in its memory when I go to the editor and start a new chat, which saves me seconds every time. And then, actually one other thing here, so I kind of know what's going on with this graph, but I asked it to describe what was happening in the graph in the style of Not Boring. And then it does, “Picture this, you're at the helm of a rocket ship called progress.” And so I have to go back and say, “This is a caricature of my style. Try it again, actually like Not Boring.” “I apologize for the exaggeration. You're right. And I appreciate the feedback. Let me try again with a more authentic Not Boring style. Our vector graph tells a fascinating story about the interplay between government and entrepreneurship in driving progress. At first glance, it might seem counterintuitive.” So it does, even if I tell it ahead of time not to caricature me, it really does. And then we'll get it right on the second try.

Dan Shipper (00:36:20)

At least it knows who you are. I feel like that's a compliment in and of itself.

Packy McCormick (00:36:22)

The first time I asked it to do that and it knew me, I was like, woah. I mean, it knows the whole internet, but it was still pretty cool.

Dan Shipper (00:36:32)

That's great. I love this. I think this is super cool. Have you thought about any other ways that you would integrate something like this in future articles? Where does this fit in your creative tool toolbox?

Packy McCormick (00:36:44)

My creative toolbox is very small. It's the internet, which I could access through a dark browser. I use the Notion to just kind of dump everything. I use Readwise to save articles and highlight things that then feed into my Notion. I use Google docs to write and Substack to send and Figma to make graphics and that's pretty much the whole stack. And so these are kind of the first new addition to that stack in a while. And it's just kind of— It's when I get stuck or when I have an idea or when I have something, I'll go in here and jump into it.

Dan Shipper (00:37:28)

I guess I'm asking more about this specific interactive graphic.

Packy McCormick (00:37:30)

Oh, the interactive graphic. It really depends on the piece. So in this one I did this, and then I also asked it for kind of the same thing that I did on the risk piece. I asked for the values for all of the great empires in history. And I didn't influence it. And I wanted it to say that America was stronger in entrepreneurship and all the other empires were stronger in government. And that's what it said, got me in a little bit of trouble. Because then I confidently said that America was the only capitalist empire. And I had a couple of people be like, what about the Dutch? What about the British? And there, I still think the government was more powerful than the capitalist system there. And so, maybe it does do what I expect it to do or hope that it does. It might give me a little more confidence than I probably deserve in that thought, but there were a couple of examples here. Another interesting one I'm writing about a nuclear company and the founder told me that one of the ways he started designing the reactor was he played around with this particular nuclear code. And so I downloaded, I pulled a bunch of that code and I was like, alright, walk me through this, explain what it's doing. What happens if I change X, Y, or Z values, blah, blah, blah, blah, blah. And so, not a nuclear engineer by any stretch of the imagination, but even being able to understand that code and understand what the different types of things you might be looking at when you're designing a nuclear reactor that's very cool. I haven't written that piece yet, but there's a good chance that I'll have if I can figure out how to do it, we'll have something interactive. It's like playing with a couple of the parameters of this nuclear reactor and see what happens. And so things like that I just never would have done otherwise I could see doing here.

Dan Shipper (00:39:18)

That's really cool. I love that. I want to move into— You've said that you're using it for editing and different parts of the writing process. I'd love to see the project you're using for that.

Packy McCormick (00:39:30)

So it's like literally just, here's my instructions. “I’m Packy McCormick. I write a newsletter called Not Boring. You’re a world class editor. Think of the heyday of the New Yorker, the Atlantic, Harper's, the Economist, Stratechery, or Pirate Wires today. I recently hired you to help me take—”

Dan Shipper (00:39:44)

I can’t believe Every isn't on there. I'm extremely disappointed.

Packy McCormick (00:39:51)

I’m sorry, I’m sorry. There's some tech writing on Every that is a different approach to— Or, less optimism, I would say, on tech. I do love the writing though. “I recently hired you to help me take my writing to the next level. I want to maintain the voice and style of Not Boring, but sharpen the writing and analysis to make it the best tech blog in the world. Something that smart, successful people read to learn what's happening at the frontier of tech. You can push me on everything from writing style to structure to the logical strength of my arguments and the data support I provide for those arguments. I want the writing to be as fresh as it was in the early days of Not Boring. I want you to push me on the novelty of my ideas. Letting me know if someone has already written something similar, but better. In short, I want you to push me to be the best tech analyst I can be while remaining approachable to a wide audience. Not Boring’s mission is to make the world more optimistic and that means reaching more people with a fact-based look at the most promising companies and trends in tech. I'm very excited to work with you and grateful to have you on the Not Boring team as our first managing editor.” 

I don't know if Claude, more than the other ones, really makes me want to talk to it like a person and that's pretty much all I have there. And then you know so today— It does an annoying thing, actually, if I dump something that's too short in, or when I ask for feedback, it just starts typing the whole thing in, which it did here. Today I decided to piss off both sides of the aisle with a fairly moderate take on tech and Democrats. And so my Democrat readers were offended that I didn't just like to say that Donald Trump was the worst person in the history of the world. And my Republican readers were pissed that I thought that the Democrats were at all redeemable. So it was a really good idea to write the piece that I wrote today. But, well, here's the latest draft of my essay. Feedback grade. And then it'll go back and say, “Structure and flow: it's well structured. Tone and voice: the arguments, the evidence, balanced perspective, timeliness, length.” It always says it's quite long because they are quite long.

I wish I could have written this in 1,000 words. I'm sure Claude 5.5 Sonnet, I'll be able to just say, can you just rewrite exactly this, my exact style, but half as long and hitting the key points. And it'll be able to do that. And I can't wait for that day, but it's not quite there yet. It always says it’s proofreading and then I’ll ask for specific typos and either it'll make something up or or actually, I just looked and there aren't any typos, but it always does say proofreading. And then, I said, it'll give me an A-minus grade on the first draft pretty much every single time. I do want to try to just put something absolutely garbage in one time and see if it gives me like a C. But the other thing that I'll do, just because I said it's what I do when I feel like stuck or when I just hit a point where I want to stop writing for a minute, I'll dump in the intro in or the first half of the next section and be like, by the way, this is only up to the intro, and this is only up to the intro of the next section. There's still a lot more to come. Just give me feedback on what I have so far, and then it'll invariably tell me like it does end kind of abruptly and blah, blah, blah, blah. And so, some of those are not quite there, but overall, it's pretty amazing just how quickly it can pick up what I'm writing.

I understand kind of my normal style you know, say that it'll fit well with the Not Boring audience, all of that. And then give me, yeah, concrete policy proposals. I didn't take any of these—joint task forces on cyber security. So some of that is kind of basic and bland. So it's a totally new idea generation. I don't find it to be as good here. I asked for it to cut and I didn't love its recommendations on what to cut. So I didn't, I took one of those. And then I'll send a new draft. I'll say better or worse feedback and grades. And there's not been a single time that it has told me that the new version was worse. And maybe I'm just that good at improving every time, but I can't imagine that's actually true.

Dan Shipper (00:44:07)

I want to see if we can fix this for you. We've got about 30 or 40 minutes left. And one of the things I love doing in these episodes is just doing something together. And so, here's my idea. I want to— I'll pitch it to you. I feel like we could make a Claude project for you that is actually a really rigorous set of rubrics for what grades mean what, and what essays are best. And then we can basically just feed it in an essay and you can be more confident that the grades it's giving are actually good. 

Packy McCormick (00:44:40)

Hell yeah, let's do it.

Dan Shipper (00:44:42)

Okay, cool. So we're in Claude. So I'm gonna make a new project. And we'll call it Not Boring Grader: A grader for Packy to use on his Not Boring projects. Okay, cool. So basically, the way that I would approach this, and I would love any ideas or thoughts or revisions that you have, but I think a good way to start would be to define what good and bad looks like. And the best way to do that is usually to find examples. So do you have an example of an essay that you've written that you think is like the best of what you can do?

Packy McCormick (00:45:31)

The most popular is probably “Excel Never Dies.” I guess we can just do the first one. I thought that was good. I wrote it with somebody, so it'll be slightly different, but maybe that's good that it grades me on what I can do with someone else.

Dan Shipper (00:45:46)

Okay, cool. So we can take a couple. So “Excel Never Dies.” Anything else on your mind?

Packy McCormick (00:46:02)

We can do “The Great Online Game.” 

Dan Shipper (00:46:05)

Okay. I mean, that's one of the ones I think of. I think it's so good. Okay, anything else? Because one of the things I also want to do is, a lot of what you're doing, we want to find examples of the kind of pieces that you're writing right now, where it's really deep, technical, in-depth explanations.

Packy McCormick (00:46:21)

We can do “Fuse Energy.” Should be like a month or so old.

Dan Shipper (00:46:33)

Tell me when you see it. Perfect. Okay, cool. So I'm going to say, okay, sorry. I'm going to impersonate you, I apologize. “I'm the writer, Packy McCormick.” Is this how you spell your name? “... from Not Boring. I want to create a detailed rubric that explains what goes into my best writing. Here are three examples of what I think are my best pieces. Can you write a very detailed explanation of what they look like and how they're composed?” How does something like that sound? Alright, cool. I think I spelled rubric wrong, but we're just going to roll with it.

Packy McCormick (00:47:39)

I was wondering if that was an alternate spelling.

Dan Shipper (00:47:45)

I think it's the Swedish spelling. So we're getting, “Certainly Packy, I'd be happy to analyze these for you.” And then it's like creating an artifact. So, and I want you to just tell me do you think it's getting it? Do you think it's getting it? So, “Structure and flow, clear, compelling introduction that hooks the reader, logical progression of ideas with smooth transitions, effective use of subheadings to break up long form content, engaging conclusion, consistent pacing.” Do these feel good? Okay, cool. “Deep dive analysis, comprehensive exploration of the main topic, integration of multiple perspectives, original insights and connections, balanced treatment of complex issues, and clear explanations.” How does that look? Okay. We’ve got “extensive use of primary and secondary sources.” Just scan through these and see generally does this look at least reasonably good?

Packy McCormick (00:48:40)

Yes.

Dan Shipper (00:48:42)

Okay. Cool. So one thing that I like to do— I've actually picked this up from Matt Shumer recently is I just say, “make it better,” just to see what it does. And it will often honestly be better, which is kind of funny.

Packy McCormick (00:48:55)

Crazy. And it is interesting how it defined for itself what “better” was.

Dan Shipper (00:49:19)

Yeah, totally. It seems a little bit less generic, which is nice. “Connection of topic to broader cultural societal trends, injection of humor and wit.” Here's another thing that I think it's missing is: “Can you provide really specific examples from these pieces to judge?”

Packy McCormick (00:49:44)

See, it already is confusing. “‘Excel Never Dies’ opens with ‘There may not be a company worth rooting harder for than Fuse Energy.’” It'd be a weird way to start the Excel piece.

Dan Shipper (00:49:59)

Oh man. Well, you know, every once in a while, Claude shits the bed but, we’ll roll with it.

Packy McCormick (00:50:06)

The next one it nailed.

Dan Shipper (00:50:09)

Cool. Next one it nailed.

Packy McCormick (00:50:11)

That one it nails.

Dan Shipper (00:50:15)

Okay. So this seems reasonable. I honestly don't think it will correct itself. So we'll correct this in the final rubric.

Packy McCormick (00:50:27)

Yeah, that works very well though.

Dan Shipper (00:50:28)

Yeah, we'll make this better as we go along. So the last thing I want to try doing is, do you have any pieces that you've written that you think are bad? 

Packy McCormick (00:50:36)

Huh? No!

Dan Shipper (00:50:38)

Or let's just say don't represent the best of what you can do. You were tired and stressed. You hired a ghostwriter one week just to see if it would work—all that kind of stuff.

Packy McCormick (00:50:52)

I would never. Let me see. “Praise Elon” was not great.

Dan Shipper (00:50:58)

“Praise Elon.” Okay, cool.

Packy McCormick (00:51:08)

Whoa. I didn't know you could search within Substack. 

Dan Shipper (00:51:14)

Yeah, I don't know. That's interesting. Okay. So here's a piece I don't like very much based on this rubric, can you tell me why? And before I do this, what is your feeling about why this isn't a good article?

Packy McCormick (00:51:36)

It was like too in-the-moment. I'm still on Twitter and part of the angle was Elon, it was like kind of almost too glib and tongue-in-cheek where it was like, Elon's going to kill Twitter, but that's a great thing because then I can actually be productive again, and it was just wasn't representative of what I want to be writing.

Dan Shipper (00:52:03)

Okay, cool. And so based on this rubric— I feel like rubric still looks weird with a C, but is there some other spelling—?

Packy McCormick (00:52:12)

With a C is what I thought, but—

Dan Shipper (00:52:13)

Okay. Alright. Let's just go with the C. The only other possibility is that— That's definitely not right. Okay. “Here's a piece I don't like very much based on this rubric. Can you tell me why?” Alright, here's what we got. So, “Based on this analysis, ‘Praise Elon’ doesn't quite meet the high standards set by your best work. Here are some key reasons. The narrative feels forced and less organic than usual. The premise of Elon intentionally killing Twitter seems like a stretch. The depth of analysis is shallower than your typical pieces. The piece lacks technical depth that often characterizes your work. While the topic is timely, the approach feels more reactive and less insightful than usual work.” I feel like that's actually pretty good.

Packy McCormick (00:52:52)

Yeah, that’s what I said pretty much.

Dan Shipper (00:52:52)

It's basically what you said. So that's pretty cool. So basically, “I agree with this. Are there any modifications to the original rubric that might make it better that you'd recommend?” Just to see if there's any ways that it might sort of change the rubric or want to look out for new things.

Okay. So it's not doing the rubric with the examples. So we'll have to add that back in. I think it added this rigor and credibility section, or at least it looks new.

Packy McCormick (00:53:57)

I was wondering if it was going to actually just keep adding to the point total and it does. So it's not redistributing within 100. We just now have 200 points to go on.

Dan Shipper (00:55:09)

I mean, I think out of 200 points it's good. It's the Not Boring style. Let's see. Cool.

Packy McCormick (00:54:16)

By the way, last time when we asked for examples, it took away the points.

Dan Shipper (00:54:21)

Oh, did it? “Make sure to keep the point structure intact and make sure the examples are accurate. Be careful.” Let's see if that works. Is the right hook to ‘Fuse Energy’?

Packy McCormick (00:54:51)

Yep.

Dan Shipper (00:54:52)

Okay, great. So it's less hallucinatory. So we're getting “a compelling hook.” Good. “There may not be a company worth rooting for”—bad. “Whitepill: Elon Musk is intentionally killing Twitter for the good of humanity.” I think this is potentially good.

Packy McCormick (00:55:09)

I think so too. I need to feed it more bad examples, so it doesn't just take from that one, but—

Dan Shipper (00:55:16)

Yeah. We can make this a little bit more in depth later, but now the thing I want to do is I want to add this into our knowledge base. So I'm going to say “add content—grading rubric,” I'm going to paste it and then I'm going to set custom instructions and be like, “I'm Packy McCormick, the writer or Not Boring. You're the best editor in the world, in the style of the New Yorker, Stratechery, and Every—” Just kidding! “The New Yorker and Stratechery when I ask you to grade an essay, please use the grading rubric in your knowledge base to give my essay a grade.” Okay, cool. So I feel like we should take something that you got an A-minus on previously and then just throw it in here and see if it gives you a better score or more accurate score.

So you just shared with me this doc, and this is a doc that you put into Claude and you got an A-minus on, right? So basically what I'm going to do is I'm going to just copy it. I'm going to go back to my project and I'm going to say, “Here's an essay draft. Please grade it.” Then let's see what it says. 27 out of 30 on narrative craftsmanship and premise. What do you think?

Packy McCormick (00:57:10)

Pretty good.

Dan Shipper (00:57:12)

Pretty good.

Packy McCormick (00:57:15)

Where do I lose points? See it says that. It gives me a perfect score, but it doesn't—

Dan Shipper (00:57:24)

Yeah, okay. We're going to have to— Interesting. “Original connections, balanced treatment—” Only five out of six, seven out of eight. So it's definitely still pretty positive, overall, probably too positive.

Packy McCormick (00:57:38)

It’s a really good piece.

Dan Shipper (00:57:39)

What'd you say?

Packy McCormick (00:57:40)

Or it was just a great piece.

Dan Shipper (00:57:41)

Or it was a great piece! That's what we're learning here. Yeah. It gave you 182 out of 200. So that I think is basically an A-minus.

Packy McCormick (00:57:58)

It’s a 91, but we do know that it had some issues adding so it could be higher than that.

Dan Shipper (00:58:03)

“Can you review this score and make sure it's really accurate? I want you to be as objective as possible and follow the rubric exactly.” I keep filling with a K instead of a C. So I think we're— This is kind of interesting because even with the kind of reasoning it's still giving you an A-minus. So there's something in it that just wants to go for the A-minus no matter what, Ω–

Packy McCormick (00:58:45)

Maybe I'm just an A-minus writer.

Dan Shipper (00:58:46)

Well, the true test is whether I put one of my articles in there and it gives me an A-minus.

Packy McCormick (00:58:52)

Wait, we just asked it to be more accurate and it dinged me. 

Dan Shipper (00:57:35)

It just gave you a 155 out of 200. I wonder— If I take the kind of “be accurate” thing and I put it in my custom instructions and I'm just like, "Make sure the score you give is really accurate. I want you to be as objective as possible and follow the rubric exactly." Alright, let's try that one more time. Another thing that we could try and this might be getting a little bit too much in the weeds is one thing it's doing is it's outputting the score before it outputs the reasoning. And I don't want that. I want it to see, look, we're back to kind of— I want it to output the score only after the reasoning.

Packy McCormick (01:00:04)

Wait, are we going to get a perfect score here? Wait one sec.

Dan Shipper (01:00:10)

We got a couple points off here.

Packy McCormick (01:00:15)

198 out of 200. Not bad.

Dan Shipper (01:00:17)

So, we had the exact opposite effect of the effect that we were going for—at least I was going for. I think you like the A-plus. Let's try this one more time. Okay.

Packy McCormick (01:00:26)

It’s not about actually making my writing better, any of this. It's just about making me feel good enough to hit send.

Dan Shipper (01:00:31)

“Only after you output the reasoning, this goes for subsections as well as the full grade output reasoning first.” Okay. One more time, “Please grade this essay.” Alright, 20 out of 30, 32 out of 35, 22 out of 25. So we're starting to get maybe a little bit more a little bit more accurate—183 out of 200 And I think the reason is ChatGPT thinks by writing and if it's outputting the score first, it's going to score it and be pretty positive and then use that score as a way to figure out what it should write in terms of its evaluation. We want to reverse that.

Packy McCormick (01:01:35)

It's so funny too. I was talking to my son about something the other day and I forgot what it was, but it was like something complete, made-up nonsense, whatever. And he got the end idea in his head. And then I saw him, his little four-year-old brain—not yet a four year old brain—rationalizing his way into that thing by making up a whole story. I was like, wow, we really all, including, I guess, Claude all do this.

Dan Shipper (01:02:00)

We do. And that's what I think is so funny about people who are saying, oh man, I don't think these things are smart and they're not going to get smarter. And it's like, have you ever interacted with a child? This is how children reason a lot. And they just get better and better and better and get better at having more nuance and all that kind of stuff. But there still is sort of that basic thing that you're talking about where, yeah, you kind of use the reason to justify the intuition or whatever. And yeah, I really think it's what makes me sort of confident that these things are going to be very, very smart, even smarter than they are now. If you just watch children and they do the same thing.

Packy McCormick (01:02:46)

Where do you think that in this current path, where do you think that smartness ends? I've been thinking they're going to get really, really smart and there's agency or something missing that just doesn't make it feel like they're going to take over. What's the house view?

Dan Shipper (01:03:00)

I'm definitely not a doomer. And I tend to think that right now, yeah, we have the whole scaling law thing. But I generally think that these systems are incredibly, incredibly inefficient currently. And they're going to be orders of magnitude more efficient in terms of training and in terms of inference, the amount of compute you need for inference. And as soon as that happens a couple of things, I think, will start to become possible. One is right now, in order to get new information into them, you have to prompt tune. And I think that we will figure out architectures that allow them to learn. After their initial training is done, which I think is really important. And the second thing is that what you've probably seen me do so far is, I'm pressing replay on this chat a few times because I want to see all the different ways that it could possibly grade this essay. So I can map the entire space of what it might think. And then, what I can do is I can just sort of flip through them and be like, okay generally the score is around 180, but, depending on the specific day, it's going to be different. This is 184. The last one is 181. The last one is 182. But that's basically where it is. And I think a lot of creativity is basically doing this, but 1,000 times in half a second. And the minute we can do that and they update as you talk to them, I think it will feel very, very, very similar to talking to an actual human.

Packy McCormick (01:04:42)

Interesting. That makes sense. I wonder if— I'm trying to find another bad essay, because I want to see if I can get an A-minus. If it gives me an A-minus or not.

Dan Shipper (01:04:53)

Yeah. Give me one that you don't think is very good.

Packy McCormick (01:04:56)

I don't know if this is bad or good, but give it “The Unbearable Heaviness of Being Positioned.”

Dan Shipper (01:05:02)

Okay. What are your general thoughts on it before we grade it?

Packy McCormick (01:05:10)

I don't remember it super well enough. It was like 15 months ago, but I think maybe one of the things might be it's a little too cute on words in terms of “sustaining an innovative disruption” and “counter positioning and being positionally,” it might be like trying to do too much on like small distinctions, but let's see.

Dan Shipper (01:05:41)

Okay, cool. So I'll grade this, I say, narrative craftsmanship, 30 points, it got 26 which is lower, much lower than the last one. Depth and originality, 35, which is pretty high. Technical mastery, 21 over 20 out of 25. Cultural resonance, 14 out of 15. Distinctive voice and style, 23 out of 25. Intellectual rigor, 26 out of 30. This looks actually significant. Oh, I was about to give Claude a lot of credit and Claude just proved me wrong. If it's math is right, I have no idea if the math here is right, but eyeballing the scores, it looked like the subsections were a bit lower, but yeah, it just looks like Claude wants to give you an A-minus.

Packy McCormick (01:06:21)

It's so interesting.

Dan Shipper (01:06:24)

I've actually found this because we have this app Spiral that we built. And we have the same sort of grading system for it where it will allow you to generate a couple different versions of the same thing. And then it has a grader that says, this is good and this is bad. And it just gives itself good scores. I really like to wonder, I'm sure that there's a way to prompt tune this and kind of get around the A-minus thing. But it seems like it's an attractor. That's where it wants to go.

Packy McCormick (01:07:00)

I wonder if there's a bell curve trick you can give it or a grading on a curve kind of trick. 

Dan Shipper (01:07:06)

Yeah, I think we could also— What I would probably do if we had a little bit more time if I would probably have it explicitly lay out, this is an A, this is a B, this is a C, this is a D. And rather than having it do math, I'd be like, here's a C essay. An example. And here's why. And then put it in a bucket depending on what you think it's most similar to. I think that would probably be better than having it like doing math and reasoning over different parts of the rubric, but not enough time for now. But I actually think that might work.

Packy McCormick (01:07:44)

You know what else I'm going to do? And I can grade this against myself, but then there's also the— I tweeted, it must've been two years ago now, asking for people's favorite essays that are more than two or three years old. And it created a list of essays that I go back to fairly often. And so maybe just dumping a bunch of those in and be this is the bar, grade me against these.

Slightly different because there are different types of stuff I write, but it'd be interesting to see how I do there.

Dan Shipper (01:08:05)

That would be interesting because of your writing style, you're grading it against your own writing style, right? And I wonder if that has anything to do with the kind of A-minus grade thing. If you put a totally different writer in there, how would it do? I think we should try that. I just want to see before we hop off this, I think we have to like go over the full distance and I'm going to take one of my essays and we'll see if I get an A-minus on the Packy scale.

Packy McCormick (01:08:35)

We should. Let's grade your essay not against me also, and just see if you get an A-minus.

Dan Shipper (01:08:42)

We're doing it for science folks. We have to know. Okay. So I'm going to take it. Let me see what my— What are some recent things I've written? So I just wrote, this is Every’s master plan. I think this is good. I want to find something that's not that good. This wasn't that good. I don't think—yeah. I'm not super proud of this one. Okay, cool.

Packy McCormick (01:09:20)

I like that you're an AI user and you just copy the whole page and don't worry about it. I need to stop doing that.

Dan Shipper (01:09:38)

I think we might be headed for another A-minus based on what I can see here.

Packy McCormick (01:09:43)

It looks A-minus— Oh no, no, 14 out of 25 on cultural resonance.

Dan Shipper (01:09:56)

Ooh, 173 out of 200. This is the lowest score yet!

Packy McCormick (01:09:46)

Alright. But now grade it without the rubric. Just go ask Claude to grade the essay.

Dan Shipper (01:09:52)

Okay. Alright. What's your prompt for this? “I'm Dan Shipper. I write Every. Please grade this essay.” That's it?

Packy McCormick (01:10:13)

Yeah. I sometimes do, you're the top editor at a world-class publication.

Dan Shipper (01:10:21)

Let's just try this for now. We'll just go easy for now. A-minus, B-plus, A-minus. 

Packy McCormick (01:10:39)

Oh, there it is! We're incredibly mid writers. Both of us.

Dan Shipper (01:10:42)

I love it. So I think what we're learning is obviously there's this attractor state for A-minuses in Claude. And if you give it a rubric, you can kind of get it out of that state to some degree. But if the rubric is based only on your own writing, it's going to be, I think, pretty likely to stay at that state. So giving it a wide diversity of examples is probably going to be what gets you over the top.

Packy McCormick (01:11:12)

Yeah. I'm going to give that a try. That's my homework from this conversation, but I will say I do find it very valuable within that rubric, if you look there, I'm sure there's a lot of just little nuggets that will at least change the way that I think about the writing. Because the way that I do a lot of my editing is I'll like to go back and forth and make changes and whatever. And then I'll hit a certain point where I'm like, alright, I kind of know what I'm saying now. Let me just start fresh and I like to copy things in, but I'll write a lot of it fresh, near the end and do it again. And I do think a lot of those little tips that stick in my mind. And so, I do think it's useful, although the A grade is almost a safety blanket for me, where if it were a C, I'd be like, fuck, I can't send this.

Dan Shipper (01:11:58)

No, totally. I think like just having some of those rules printed out, it's stuff that you already know intuitively, but you might not have said to yourself or totally known in a rational way. And I think that in itself is super awesome and valuable.

So Packy, this was awesome. Thank you so much for coming. If people after watching or listening to this episode want to find you, where should they go to find you?

Packy McCormick (01:12:25)

@packyM on Twitter and not boring.co. I have not sprung for the.com yet. It's notboring.co.

Dan Shipper (01:12:32)

Awesome. well really, really, really fun to get to chat. Thank you so much for joining us.

Packy McCormick (01:12:38)

Thanks for having me.


Thanks to Scott Nover for editorial support.

Dan Shipper is the cofounder and CEO of Every, where he writes the Chain of Thought column and hosts the podcast AI & I. You can follow him on X at @danshipper and on LinkedIn, and Every on X at @every and on LinkedIn.

Comments

You need to login before you can comment.
Don't have an account? Sign up!
Every

What Comes Next in Tech

Subscribe to get new ideas about the future of business, technology, and the self—every day