
Sponsored By: Tnoc
For all the software engineers and engineering managers, say goodbye to the headache of provisioning quality test data. With Tonic, you can effortlessly keep your staging in sync with production using realistic test data generated from your actual production data. It's a seamless solution for maintaining data integrity and efficiency. Discover how with a simple click.
TL;DR: Today we’re releasing a new episode of our podcast How Do You Use ChatGPT? I go in depth with Notion research engineer Linus Lee on how he uses ChatGPT and Notion AI to maximize creative control. Watch on X, Spotify, or YouTube.
You might think that being an AI researcher would mostly involve solving complicated programming problems and thinking through mathematical equations. Instead, a big part of the job is rewriting parts of your prompts in ALL CAPS in order to make sure the AI model you’re working with follows your directions. “All caps works!” Linus Lee told me in this interview. “If you look at OpenAI's system prompts for a lot of their tools, all caps works.”
Linus is a research engineer at Notion who works on its AI team, prototyping new experiences, like a Q&A chatbot. He is a deep thinker who is obsessed with building AI that enables human creativity and agency. He came on the show to talk about how AI might augment our thinking, how he thinks about prompting to get the best results, and how he uses ChatGPT and Notion AI in his work and life.
I first interviewed him a year ago, when he showed off dozens of AI prototypes he’d been building to try to understand the future of this technology. Our latest interview is a mixture of theory and practice. Linus talks about how the tools we use shape the work we can create and what the future of AI-driven interfaces might be. We watch him demo personal tools he’s built, like an AI chatbot that he communicates with over iMessage. And we peek over his shoulder to see his conversations with ChatGPT to understand how he talks to it to get the best results.
Here’s a taste of what we talk about. Read on for more analysis from me at the bottom.
- Using AI to maximize agency. Linus talks a lot about the ways our tools shape our agency as thinkers and creatives—and how AI might be used to enhance rather than reduce our agency.
- AI as a “thought calculator.” Linus borrows a phrase from the popular tech blogger Simon Willison to illustrate dueling points of view on the ultimate goal of AI: is it meant to be a simulacrum of humans or a “thought calculator,” a way to enhance human imagination and creativity?
- Personal prototypes he’s built. Linus regularly experiments with AI on the weekend. He shows us a chatbot he built that works over iMessage, and a new interface for image generators that gives him much better control over their output.
- Better prompting. We go over simple yet powerful techniques for getting the best answer out of AI models—like starting with general queries first, and repeatedly asking the model to answer the same question.
- Using AI for vibe checks. AI is great for reflecting the vibes of books, people, places—and even files on your computer. Linus talks about how he uses ChatGPT to get quick vibe checks that allow him to make decisions.
- Book recommendations. We pit ChatGPT head-to-head against Notion AI to see which can best capture our reading taste. And just when ChatGPT seems like it’s coming out on top, Linus makes a convincing case for Notion AI’s special skill set as an organizational tool that already knows how its users work.
You can watch the episode on Twitter/X, Spotify, or YouTube. Links and timestamps are below:
- Watch on X
- Listen on Spotify (make sure to follow to help us rank!)
- Watch on YouTube
Timestamps:
- Intro 1:03
- Retaining agency when conversing with AI 4:06
- A personal iMessage chatbot 27:04
- The difference between prompting and prompt engineering 32:49
- “What's the vibe of this file?” 38:48
- Travel recommendations 44:57
- Book recommendations 51:57
- Notion AI's advantage over ChatGPT 56:00
- Using Notion AI at work 1:02:00
- Is GPT-4 getting lazy? 1:09:16
What do you use ChatGPT for? Have you found any interesting or surprising use cases? We want to hear from you—and we might even interview you. Reply here to talk to me!
Miss an episode? Catch up on my recent conversations with writer Nat Eliason and Gumroad CEO Sahil Lavingia and learn how they use ChatGPT.
My take on this show and the episode transcript is below for paying subscribers.
I had a lot of fun speaking to Linus. There’s so much fear-mongering about how AI might replace humans in creative work—and there are too few people thinking about how it might be used as an augmenting or enabling technology instead. That’s exactly what Linus is focused on, and I came away from the interview deeply inspired about the future of these tools for making better work.
I also learned that when you’re interviewing a Notion AI researcher, you should never say, “I think this might work better in ChatGPT.” When I said that, his understandable competitiveness caused him to go absolutely turbo mode inside of Notion AI. It was such a pleasure to get to watch someone who had built parts of the tool use it in a potent way.
Previously I hadn’t bothered to go too deep into Notion AI—now I want to add it to my repertoire. Being able to automatically make a database table out of free-form text, with parts of that table filled in and updated for you, is a powerful way to organize information. That discussion is my favorite part of the interview, so if you’re pressed for time, skip to it.
Transcript
Linus Lee (00:00:00:01)
I also think when you look at models this way, models start to feel less like agents that have their own kind of agency and more just like, oh, this is just like a calculator.
The more context the model has about why you're doing what you're doing or what your goals are, the better its suggestions are generally going to be.
Dan Shipper (00:00:16:00)
Do you have custom instructions set for this?
Linus Lee (00:00:17:00)
I don't actually.
Dan Shipper (00:00:18:00)
Oh my god. We're going to have to revoke your ChatGPT privileges. Okay, I have a bunch of books in front of me, and I want to see if it can recommend books. The Brothers Karamazov. Medieval Technology and Social Change. Exhalation. Ted Chiang. The Essential Kabbalah. Defense of Socrates.
Linus Lee (00:00:36:00)
Synthesize the vibes below into a single paragraph. Synthesize more. Compress!
Dan Shipper (00:00:44:00)
Wait, wait, wait, wait. So you said “synthesize more” and then all-caps “compress.” I love it. AI researcher uses AI.
Dan Shipper (00:01:03:00)
Linus, welcome to the show.
Linus Lee (00:01:07:00)
Thanks. Thanks for having me. Excited to be here.
Dan Shipper (00:01:08:00)
Yeah. I'm excited to have you. We've been friends for a while. I interviewed you I think about a year—year and a half ago. And it was when I was first starting to write about AI and that interview went super viral and hoping we can replicate some of that magic here today. But I just love getting to talk to you. You're a researcher. You're a tinkerer. You're just a really, really deep thinker and a great writer. For folks that are listening that don’t know, in addition to doing a bunch of side research and side projects and tinkering in LLMs, you also work for Notion on their AI team, prototyping new interfaces for AI. So we'll get into that. We'll get into some of your thinking about LLMs and how LLMs fit into creative work and creative thinking. We'll get into some prototypes you’re building, maybe demos you have. We'll talk about Notion stuff, and then of course we will talk about how you use ChatGPT.
Linus Lee (00:02:05:00)
Yeah, I think that interview that you talked about—I think it came out almost exactly a year ago, if not exactly a year ago. And I think it was, I think, safe to say, pivotal for both of us, at least certainly for me. That interview is a big part of how I ended up at Notion, which I’ve spent the last year at and it’s been very fascinating. A great place to be for the last year as lots of things have happened in AI so, yeah, kind of timely.
Dan Shipper (00:02:30:00)
I love it. I love that it's a year ago. I think it would be great to maybe make this an annual thing. And I don't know what the next step is after helping you get a job at Notion. Like what's the next level thing—meeting Taylor Swift or something? But maybe we could make it work.
Linus Lee (00:02:45:00)
That would be something.
Dan Shipper (00:02:47:00)
So, here's where I want to start. Basically, I've said before on the show and I've written a lot about it, that I think ChatGPT is one of the most important creative tools of the decade. And I really think that people sort of misunderstand how important ChatGPT is in specific and a lot of the tools that are being built right now in general are going to be for creative thinking and creative work. And I really think that you're one of the people on the forefront of thinking about that. There are a lot of people out there who are just really afraid of AI right now and really feel like, oh it'll replace everything that we do and it creates this inhuman future. And I find you to be one of the deep thinkers and builders in this space that is trying to think about how to make AI into a human future where it sort of augments us and helps us live fuller lives. And I think that's really, really incredible. When I was researching for this episode, I came across this quote that you wrote recently where you said, “I want to build interfaces that lets the AI gesture us into a better future without infringing on our agency.” I love that. I think it really captures the spirit of your philosophy and I just wanted to start there. Tell us about that quote. Tell us why you wrote it and how that fits into the larger way you think about AI.
Linus Lee (00:04:06:00)
Yeah, I think agency is really interesting. I think a lot of the way that I think about language models in particular—I think we're starting to see that models for different modalities, when you take a 10,000 foot view, a similar kind of emerging capabilities, but language is sort of the thing that seems most humanlike to us and so we talk about it a lot. And I think the way that I look at language models personally has been colored by the fact that while, like a lot of other people, I work with language models at that sampled output level of just telling it things and that it tells me things back.
I also work with it at a numbers level and so, and like looking at embeddings and so on. And I think that's something that we'll get to in a bit. But I think because of that, I personally— when I look at a language model, it is really just a function that predicts the probability distribution. While you can wrap it in very anthropomorphic packaging that makes it seem like it has kind of its own will and its own kind of intent. And you could project kind of people-like properties to it and say it wants things. Ultimately it's a statistical model of a probability distribution, just a very complex one. And so I think that colors a lot of my own thinking about how I view the technology. But just because it's like a pure function that models a probability distribution doesn't prevent us from accidentally building things with the technology that takes away agency from the things you want to do.
I think an interesting example is a coworker and a friend of mine at Notion and I were talking about making—this kind of a silly example. We were talking about making a cover image for a party that she was organizing. And she had this very particular aesthetic in mind of a cover image, which was kind of like a very ugly painting on parchment, not something you would consider aesthetic. And she's like, I want this very specific aesthetic. Here are some images I got off of Twitter that follow this aesthetic. Can you make one and in this aesthetic of a girl sipping wine, or something, I don’t remember the exact example.
And I tried so hard to use all of the image-generation tools at my disposal, like Stable Diffusion, Excel, DALL-E2, DALLE-3, all of these tools. Even some of my own image tools to make an ugly image. And it's just very difficult to get DALL-E to make an ugly image.
I think it’s interesting to me because—and this is something I've talked to some folks at Midjourney about also—about how the tool kind of constrains the search space of possible images you can generate so that normally it's kind of closer to what you want. Because normally you want something aesthetic, but in the process it may actually sort of take away—in the design of the tool, if not necessarily the capability of the model—it takes away some agency from the user. And so I think the underlying technology I think is kind of just a tool, a function, whatever, a mathematical object. But then we don't want to wrap it accidentally and packaging that disempowers us.
Dan Shipper (00:07:26:00)
That's really interesting. I want to back up. You said a lot of different things there. And I think they're really interesting and really important to unpack. So what I heard you say is sort of AI, or at least the current generation of AI models, is a function that models a statistical probability distribution and when we experience that ourselves, we have all these reactions to it—that we anthropomorphize it, we project out what it might be able to do from what it can do today in ways that are maybe unrealistic or maybe don't like, fully understand, how it currently works. And, and depending on the probability distribution that you select, for example, the probability distribution that you create from the training set of images that you use to train your model or the training set of texts that you use, for example, depending on those, depending on those things, you're going to create a certain set of possibilities, base of outputs that that sort of constrains the user.
And, in your view, I think, basically how that is selected and communicated is maybe important because, like you said, DALL-E you can't get it to make an ugly image. It's not built for that. So it kind of takes away agency from you because it's doing some stuff for you in a way that, I don't know, maybe Photoshop, for example, you can just manipulate the pixels, so there's no there's no agency taken away in that experience. Is that kind of what you're getting at?
Linus Lee (00:09:09:00)
Yeah. Yeah. I think there are a lot of different, pretty subtle ways that either intentionally or unintentionally shapes the kind of agency landscape of a tool. One very explicit example is a tool like DALLE-3, where the model is through training made so that it's sort of fundamentally unlikely to output bad-looking images. I think there are other examples.
So Photoshop is actually an interesting example because even if in theory you could kind of make everything by just moving the raw pixels around, I think the specific features that are easier to access tend to also shape the kind of style of output. And so if you look at popular image editing tools, you can tell when an image just made Instagram Stories or you can tell when an image has its background masked out by Keynote or something. And so all of these tools, even in pretty subtle ways, kind of tend to shape the output style.
Dan Shipper (00:10:07:00)
And I mean, I feel like that's part of what makes art in general. If you think about the music styles that are popular, it's based on what the sound boards can do and the electric guitar and the weird effects that Jimi Hendrix discovered—in some ways they're limiting the artist’s agency, but they also create this unique set of constraints that creates a unique vibe and sound that represents a genre or a generation.
Linus Lee (00:10:37:00)
Yeah, you touched on something really interesting there, imagining the output space as a very concretely spatial kind of thing. This is for me very concrete because I think a lot about embeddings—and embeddings exist in a very concretely spatial space. And embedding is a list of numbers that try to summarize quantitatively what some piece of text or what some piece of image semantically contains.
And so an example use case of an embedding: So you have an embedding model which is kind of a language model with its last part of it chopped off so that we can read out the raw numbers. And you can have an embedding model and you might feed and embedding model sentences like “The Eiffel Tower is in Paris” and “The President of France is blah” and “Notion is a tools-for-thought company.”
And two of those sentences are much closer in meaning than the third one. And so when you feed them into an embedding model, the embedding model will spit out for each of these sentences a list of numbers. And when you view the list of numbers as kind of a coordinate or point in space, in a high dimensional kind of coordinate space. Those numbers are going to be closer together for sentences that are closer together.
Dan Shipper (00:11:52:00)
The way that I think about it sometimes is you can take text and then assign pieces of text a latitude and longitude coordinate and that’s an embedding. And then you can map the text and then see which pieces of text are closer to each other. And the ones that are closer on the map, the ones that have a latitude and longitude that are closer, are going to be similar in meaning.
Yeah, that sort of clicked for me or I was like, that's what it is.
Linus Lee (00:12:15:00)
Another metaphor that’s sometimes used is a color picker. So if you use any kind of image editing tool, one of the ways you can pick colors is like a 2D grid or a color circle where you have two dimensions. You can move things around, and then you move the data around, and you're changing some of the RGV values.
And there are a couple of different ways to describe a color. One way is by RGV value, which is kind of like a coordinate or latitude for calling for a color. Another way you could describe a color is by just saying “orange” or “very dark blue” or “crimson.” And I would—in my head, “dark blue” or “crimson” word description is the input. And then conceptually, the RGV values are kind of like the embedding. I think that's really interesting because it kind of hints at this possibility that embeddings, or these numerical representations, might encode kind of pretty fundamental semantic insights about the thing that you're encoding—color or text—in a way that's just kind of mathematically manipulate it to do interesting things.
If you have the word “crimson,” you can't really manipulate it in ways like making it lighter or making it more blue or whatever. But if you have the RGV value, there are very concrete algorithms that you can use number crunching to make the color a lighter shade of crimson or a more blue or purple.
Dan Shipper (00:13:40:00)
No, I love that. I mean, that's sort of very central to some of your thinking around this. The current generation of AI tools is awesome, chat is great. But, you know, manipulating images or text. Via chat is a very coarse-grained tool and it's a lot harder to get the exact thing that you want.
And I think you've thought about it a lot: AI interfaces that allow you to be more precise about how you move through different ways that you want to modify an image or a text using AI.
Linus Lee (00:14:17:00)
Yeah. And I think thinking in that way—the spatial way of thinking about the possibility space for outputs of these models. The spatial view I think is kind of behind the way that I think about how to add more precision to these kinds of tools. One, kind of—Should I just pop into a demo?
Dan Shipper (00:14:36:00)
Yeah, let’s do it.
Linus Lee (00:14:39:00)
So the interface for this is very bad because it's just—Yes, this is one of a string of experiments.
I may show some other ones later, but this is actually half built. I was in the process of building something different and then it started being useful and I didn't really feel like putting more effort into it. And so it's in a very half finished form, but it's good for demonstrating this particular thing that I'm about to walk you through.
Dan Shipper (00:15:10:00)
So what is the general thing that you wanted to build it for? What is it? What was the original idea really quick?
Linus Lee (00:15:15:00)
The original idea was to try to let you describe an image that you want to create by adding and subtracting images and text. So there's a model called CLIP by OpenAI. This is actually a model that's been around for a while. If you use any kind of multimodal use text to semantically search images, kind of search engine tool, CLIP is likely to be one of the models behind that kind of thing.
The special thing about CLIP is that it tries to describe both text inputs and image inputs in the same embedding space—in the same coordinate space, so that if you put in a word like Eiffel Tower and a French flag, the model is going to understand the meaning behind both of those things and try to cluster them together because they're similar.
So a slightly newer thing that happened after CLIP came out is that people came up with a way of generating images conditioned on a specific point in this clip embedding space so that you could put an image or you can put in a text and then try to generate an image that is kind of conceptually the image that corresponds to that point in the space.
So if you put in a bunch of images of butterflies, they all kind of cluster around the same point in this embedding space, in this coordinate space. And if you pick one of those points in that cluster and then generate an image back out corresponding to that point, you would get some image of a butterfly. And so given this ability to both take an image or text and encode into the space and then pick a point in this space and decode it back out into its corresponding original image form, maybe you can generate an image not by just typing a text prompt, but by mixing a bunch of concepts together in the space.
So if you want a butterfly that has the colors of the French flag, maybe you can mix the image that is of the French flag and an image of a butterfly. Just pick a point in between these two concepts in the embedding space and then decode it out and maybe it'll mix those concepts together. So that's the kind of idea I was trying to explore with this little hack.
The thing that I'll show you now is: Here I’ll pick—This is a selfie of myself. For dumb demo reasons, I have to put myself in twice. And then this is an illustration. Actually, let's do something different.
This is a kind of clip-art illustration of a human face. So what this tool is going to do is kind of embed both of these images. They correspond to slightly different points in the CLIP model’s embedding space. And then I'm going to use this tool to pick eight different points along the line between these two images and then decode each of them out into its own image.
And so we're basically sampling the distance between these two images in this embedding space and then trying to look inside the model and see kind of what the model sees at each of these points.
Dan Shipper (00:18:20:00)
That’s very cool. A way that I would think about this is: If you're used to Photoshop or something or Instagram or any of these—any photo editing capability—they have all these sliders that you can do to make it brighter and you can on the left it’s very, very bright and on the right it's very, very dark and you can just slide in between. And what you're building here is a slider between almost concepts or ideas for images.
So on the left side is your headshot and on the right side is a sort of vectorized cartoon drawing that is in a style that you like, and you're just sort of seeing, if you slowly turn your headshot into that, what are the different points along that space.
Linus Lee (00:19:08:00)
You could also—speaking of sliders, you could also not just slide along this kind of stylistic scale, but you can start to do fun things like slide along an emotion scale, so one thing that I've done before here is I'll try to go from “an image of a young man who is happy to an image of a young man who is very, very angry” and—there we go—now it’s progressively more and more mad. You can see the effect is really intense and, by the end, it kind of destroys the image. So sometimes I tone down the text effect.
So this is one of many experiments that I've done to try to build interfaces around exploring embedding spaces and latent spaces of generative models instead of talking to the model directly. The reason this is interesting to me is kind of twofold: One, I think it's just intellectually really interesting that in the process of learning how to predict the next token or in the process of trying to model probability distributions of language, the model internally learns and learns to pull out kind of human-recognizable, semantically-useful concepts like emotions. And I think learning how the models do that and understanding it to improve models or make them more controllable, I think is useful.
And just the fact that the models do that I think is interesting. So I want to understand it better. I also think when you look at models this way, as a thing that gives you dials and is a thing that gives you numbers that you can manipulate, language models start to feel less like agents that have their own kind of agency and more just like, oh, this is like a calculator.
Simon Willison, who is one of the prominent bloggers in the space, actually has a really great blog post about language models as a kind of a thought calculator where he talks about a similar idea. One way to look, one way to look at this demo is a kind of a calculator for images and concepts and text. And unlike asking DALLE-3 through ChatGPT to generate an image which feels like you're asking this entity—this agent—for requests here, it's like, okay, there's very concretely a thing that does math, and you can use it to get dials and control, and that feels more like a tool and it feels like perhaps there's a direction here that lets you retain more agency as a person using the tool.
Dan Shipper (00:21:38:00)
I feel like one of the things I'm getting from this is when people use this generation of AI tools for the first time, there's this woah factor. I was using it the other day. I had done a lot of journaling. I've been going through some stuff in my personal life.
I've been doing a lot of journaling about it, and I fed—I don't know—4,000 words of journals into Claude. I did Claude and ChatGPT, but I find Claude to be slightly better for this and I'll ask it, “What am I not seeing?” “What are the patterns that you're observing in my psychology or the psychology in people around me?”—all that kind of stuff. I know I'll put all this stuff and I screenshotted that and I sent it to my therapist and he was like, “That's totally wild.” And I was like, cool, he must think this is awesome. He's like, “It's so good.” And I was like, “This is amazing.”
And then he was like, “I thought we would have at least a couple of years until it would be this good.” And then he was like, “Do you think there's room for human therapists anymore?” And I was like, now I have to be comforting my therapist because he's afraid of the AI.
And my feeling is he had that immediate “oh shit” moment that everyone has. But if you really dig into these tools, they do amazing stuff, but they're not even close to, in my opinion, for example, replacing a therapist. And I think part of the reason why people have that, oh shit, which is what you're referring to here, is we just respond to things that feel really intelligent.
And if the same thing was performed in a slightly different way, like with a slider like this, it would still be cool, but it would feel more familiar, like a tool that we would that we might use anyway, like a calculator that that is in itself sort of mind blowing but isn't threatening or sort of it's, it's not like a replacement in the same way that like a fully intelligent thing is. Is that a fair way to describe what you're, what you're saying?
Linus Lee (00:23:23:00)
Yeah, I think so. I think so, yeah. When you train a model there, there are fundamental capabilities you bake into it and the packaging that you wrap it in really dramatically influences. how people receive it and they use it.
Dan Shipper (00:23:45:00)
You know, if you took something like human neurons and just laid them out in the right way and wrapped them in an interface, it might look like a tool too.
Linus Lee (00:23:54:00)
Certainly.
Dan Shipper (00:23:55:00)
And, if you just get enough of those neurons together, you get a brain that's conscious and can do things and think. How do these two things fit together?
Linus Lee (00:24:10:00)
That's an interesting question. I haven't thought about that. I think off the cuff, my kind of intuitive reaction is, I think it would be an interesting intellectual and societal endeavor to try to build a thing that is like a simulacrum of humans in every way, including having some kind of goals and agency and so on and so forth.
And I think if that was the goal of building a simulacrum of everything that at least intellectually makes us human, you would want to build elements that these tools don't have, like trying different things that exploration—I think exploration, intentional exploration is actually, I think a huge part of intelligence that humans have that these models currently don’t exhibit.
However, I think when companies like Google and OpenAI and even Notion try to build a kind of language model based tools that help you do your work, there are a lot of there are a lot of parts of a human intelligence that actually are kind of annoying when you're just trying to get some work done. The fact that humans sometimes just sit at their desk and daydream is probably not that useful if you're trying to just hire an entity to read thousands of papers and try to summarize them. And so my gut feeling is that a lot of these kinds of corporations and teams that are building AI to be useful and especially in a professional context, their goal is not to build a simulacrum of humans or their goal. It's just to build a thing that is an intelligence steam engine. And then I could imagine some other research groups or some. Other groups of people whose goals are to build this simulacrum of humans.
But I think in the beginning, I think they tend to look similar. But I think as we get further down this road, we're going to see a kind of divergence between people who really want to build a simulacrum of humans versus people who just want an intelligent thought calculator.
Dan Shipper (00:26:12:00)
I went to OpenAI DevDay and they did this whole presentation on how they fine-tune GPT-4 using Slack messages. Did you see this? And basically they asked this fine-tuned version to do something and it was like, “No, I'll do it later.” And then it was like, “Do it or you're fired.” And it was like, “Okay, I'll do it.” Yeah, but you're right. There's all these parts of being human that maybe we don't necessarily want to model for the AIs we build. And there's some divergence there in terms of is it a tool or is it something that has agency and all that kind of stuff.
Linus Lee (00:26:48:00)
Yeah, I think I've noticed something actually very similar. I have a personal chatbot that is sometimes used for brainstorming, kind of for entertainment purposes.
Dan Shipper (00:26:56:00)
Can we see it?
Linus Lee (00:26:56:00)
I can. Hmm. Here, I'll open it up.
Dan Shipper (00:27:04:00)
You have a personal chatbot.
Linus Lee (00:27:07:00)
I have a personal chatbot. It's an iMessage bot because I like having it in iMessage because it's accessible from all my devices. Apple does all the syncing for me and it looks nice and I can react to things. I haven't used it in a while, but I could ask it something like—
Dan Shipper (00:27:20:00)
Well, it looks like you asked it “What makes Notting Hill particularly picturesque compared to other neighborhood areas like Westminster?” Tell me about that.
Linus Lee (00:27:27:00)
I was visiting London and I was trying to look for neighborhoods to visit. Actually, we can ask something about it. We could say, “What do you think about London versus New York?”
Dan Shipper (00:27:40:00)
And why would you ask this versus asking ChatGPT? How is this built? What is it trained on? Why is its answer better, right?
Linus Lee (00:27:48:00)
“Would you rather live as a 25 year old…”
So there are lots of interesting things about this: So, first, sometimes it's kind of unreliable, but usually when it receives the message, yeah, it’ll say “read,” and then sometimes it'll have a typing indicator, which is—the way that this works behind the scenes is just when my server receives the message and starts generating the output, it sends a read reply.
But conceptually it's like, “Oh, the AI read my message.” I know that doesn't actually mean anything because it's just a bag of numbers, but this is conceptually interesting.
Dan Shipper (00:28:27:00)
It's one of those same “oh shit” things that makes it feel human, you know?
Linus Lee (00:28:32:00)
Right! And then sometimes I'll send a message and I’ll lock my phone and I’ll get a notification that it sent me a message. I'm like, “Whoa. Computers don't normally just send me notifications for things that it's thinking about.” So the packaging, again, is very important. So the reason I brought this up is because the model that backs this is actually not fine-tuned or RLHF trust model. It is the raw base model of, I think, LLaMA 2’s 13 billion-parameter model.
Dan Shipper (00:29:00:00)
LLaMA is Meta’s open-source model.
Linus Lee (00:29:03:00)
LLaMA is Meta’s open-source model. One of the things that makes that model special compared to things like GPT-4 is that it's open source obviously—I can host it on my own, which is what I'm doing here. But another thing that makes it special is that they released a version of the model before they did all of the fine-tuning in RLHR to make it chatty—to have it follow this kind of chat form.
And so the base model is purely a text-continuation model. So if you ask the base model something like, “Where would you rather live in New York or London?” It wouldn't really interpret anything. The model's task is to predict the next token in internet text. It would probably interpret that you're in the middle of a blog post about the best place to live, and it would just continue writing a blog post or something instead of answering the question. I had to add a bunch of prompts in front of it, so I'm prompting a base model, which is, you know, in the ye olde days of 2020, what we used to do before all these instruction-following models existed, people would just ship base models and then prompt them.
Dan Shipper (00:30:06:00)
So you're prompting a base model here. But, why are you asking this model versus ChatGPT? How is it more like you or how is it more interesting to you?
Linus Lee (00:30:18:00)
Sponsored By: Tnoc
For all the software engineers and engineering managers, say goodbye to the headache of provisioning quality test data. With Tonic, you can effortlessly keep your staging in sync with production using realistic test data generated from your actual production data. It's a seamless solution for maintaining data integrity and efficiency. Discover how with a simple click.
TL;DR: Today we’re releasing a new episode of our podcast How Do You Use ChatGPT? I go in depth with Notion research engineer Linus Lee on how he uses ChatGPT and Notion AI to maximize creative control. Watch on X, Spotify, or YouTube.
You might think that being an AI researcher would mostly involve solving complicated programming problems and thinking through mathematical equations. Instead, a big part of the job is rewriting parts of your prompts in ALL CAPS in order to make sure the AI model you’re working with follows your directions. “All caps works!” Linus Lee told me in this interview. “If you look at OpenAI's system prompts for a lot of their tools, all caps works.”
Linus is a research engineer at Notion who works on its AI team, prototyping new experiences, like a Q&A chatbot. He is a deep thinker who is obsessed with building AI that enables human creativity and agency. He came on the show to talk about how AI might augment our thinking, how he thinks about prompting to get the best results, and how he uses ChatGPT and Notion AI in his work and life.
I first interviewed him a year ago, when he showed off dozens of AI prototypes he’d been building to try to understand the future of this technology. Our latest interview is a mixture of theory and practice. Linus talks about how the tools we use shape the work we can create and what the future of AI-driven interfaces might be. We watch him demo personal tools he’s built, like an AI chatbot that he communicates with over iMessage. And we peek over his shoulder to see his conversations with ChatGPT to understand how he talks to it to get the best results.
Tonic is revolutionizing the way teams handle test data. By transforming sensitive production data into safe, realistic test data, Tonic ensures your pre-prod environments are always up-to-date and secure. Companies like eBay, Flexport, and Everlywell are already leveraging Tonic to keep their environments fresh and consistent. With Tonic, you can accelerate regression testing, catch bugs quicker, and significantly shorten release cycles. Experience a smoother, more efficient workflow that aligns with your development needs. Ready to transform your data management?
Here’s a taste of what we talk about. Read on for more analysis from me at the bottom.
- Using AI to maximize agency. Linus talks a lot about the ways our tools shape our agency as thinkers and creatives—and how AI might be used to enhance rather than reduce our agency.
- AI as a “thought calculator.” Linus borrows a phrase from the popular tech blogger Simon Willison to illustrate dueling points of view on the ultimate goal of AI: is it meant to be a simulacrum of humans or a “thought calculator,” a way to enhance human imagination and creativity?
- Personal prototypes he’s built. Linus regularly experiments with AI on the weekend. He shows us a chatbot he built that works over iMessage, and a new interface for image generators that gives him much better control over their output.
- Better prompting. We go over simple yet powerful techniques for getting the best answer out of AI models—like starting with general queries first, and repeatedly asking the model to answer the same question.
- Using AI for vibe checks. AI is great for reflecting the vibes of books, people, places—and even files on your computer. Linus talks about how he uses ChatGPT to get quick vibe checks that allow him to make decisions.
- Book recommendations. We pit ChatGPT head-to-head against Notion AI to see which can best capture our reading taste. And just when ChatGPT seems like it’s coming out on top, Linus makes a convincing case for Notion AI’s special skill set as an organizational tool that already knows how its users work.
You can watch the episode on Twitter/X, Spotify, or YouTube. Links and timestamps are below:
- Watch on X
- Listen on Spotify (make sure to follow to help us rank!)
- Watch on YouTube
Timestamps:
- Intro 1:03
- Retaining agency when conversing with AI 4:06
- A personal iMessage chatbot 27:04
- The difference between prompting and prompt engineering 32:49
- “What's the vibe of this file?” 38:48
- Travel recommendations 44:57
- Book recommendations 51:57
- Notion AI's advantage over ChatGPT 56:00
- Using Notion AI at work 1:02:00
- Is GPT-4 getting lazy? 1:09:16
What do you use ChatGPT for? Have you found any interesting or surprising use cases? We want to hear from you—and we might even interview you. Reply here to talk to me!
Miss an episode? Catch up on my recent conversations with writer Nat Eliason and Gumroad CEO Sahil Lavingia and learn how they use ChatGPT.
My take on this show and the episode transcript is below for paying subscribers.
I had a lot of fun speaking to Linus. There’s so much fear-mongering about how AI might replace humans in creative work—and there are too few people thinking about how it might be used as an augmenting or enabling technology instead. That’s exactly what Linus is focused on, and I came away from the interview deeply inspired about the future of these tools for making better work.
I also learned that when you’re interviewing a Notion AI researcher, you should never say, “I think this might work better in ChatGPT.” When I said that, his understandable competitiveness caused him to go absolutely turbo mode inside of Notion AI. It was such a pleasure to get to watch someone who had built parts of the tool use it in a potent way.
Previously I hadn’t bothered to go too deep into Notion AI—now I want to add it to my repertoire. Being able to automatically make a database table out of free-form text, with parts of that table filled in and updated for you, is a powerful way to organize information. That discussion is my favorite part of the interview, so if you’re pressed for time, skip to it.
Transcript
Linus Lee (00:00:00:01)
I also think when you look at models this way, models start to feel less like agents that have their own kind of agency and more just like, oh, this is just like a calculator.
The more context the model has about why you're doing what you're doing or what your goals are, the better its suggestions are generally going to be.
Dan Shipper (00:00:16:00)
Do you have custom instructions set for this?
Linus Lee (00:00:17:00)
I don't actually.
Dan Shipper (00:00:18:00)
Oh my god. We're going to have to revoke your ChatGPT privileges. Okay, I have a bunch of books in front of me, and I want to see if it can recommend books. The Brothers Karamazov. Medieval Technology and Social Change. Exhalation. Ted Chiang. The Essential Kabbalah. Defense of Socrates.
Linus Lee (00:00:36:00)
Synthesize the vibes below into a single paragraph. Synthesize more. Compress!
Dan Shipper (00:00:44:00)
Wait, wait, wait, wait. So you said “synthesize more” and then all-caps “compress.” I love it. AI researcher uses AI.
Dan Shipper (00:01:03:00)
Linus, welcome to the show.
Linus Lee (00:01:07:00)
Thanks. Thanks for having me. Excited to be here.
Dan Shipper (00:01:08:00)
Yeah. I'm excited to have you. We've been friends for a while. I interviewed you I think about a year—year and a half ago. And it was when I was first starting to write about AI and that interview went super viral and hoping we can replicate some of that magic here today. But I just love getting to talk to you. You're a researcher. You're a tinkerer. You're just a really, really deep thinker and a great writer. For folks that are listening that don’t know, in addition to doing a bunch of side research and side projects and tinkering in LLMs, you also work for Notion on their AI team, prototyping new interfaces for AI. So we'll get into that. We'll get into some of your thinking about LLMs and how LLMs fit into creative work and creative thinking. We'll get into some prototypes you’re building, maybe demos you have. We'll talk about Notion stuff, and then of course we will talk about how you use ChatGPT.
Linus Lee (00:02:05:00)
Yeah, I think that interview that you talked about—I think it came out almost exactly a year ago, if not exactly a year ago. And I think it was, I think, safe to say, pivotal for both of us, at least certainly for me. That interview is a big part of how I ended up at Notion, which I’ve spent the last year at and it’s been very fascinating. A great place to be for the last year as lots of things have happened in AI so, yeah, kind of timely.
Dan Shipper (00:02:30:00)
I love it. I love that it's a year ago. I think it would be great to maybe make this an annual thing. And I don't know what the next step is after helping you get a job at Notion. Like what's the next level thing—meeting Taylor Swift or something? But maybe we could make it work.
Linus Lee (00:02:45:00)
That would be something.
Dan Shipper (00:02:47:00)
So, here's where I want to start. Basically, I've said before on the show and I've written a lot about it, that I think ChatGPT is one of the most important creative tools of the decade. And I really think that people sort of misunderstand how important ChatGPT is in specific and a lot of the tools that are being built right now in general are going to be for creative thinking and creative work. And I really think that you're one of the people on the forefront of thinking about that. There are a lot of people out there who are just really afraid of AI right now and really feel like, oh it'll replace everything that we do and it creates this inhuman future. And I find you to be one of the deep thinkers and builders in this space that is trying to think about how to make AI into a human future where it sort of augments us and helps us live fuller lives. And I think that's really, really incredible. When I was researching for this episode, I came across this quote that you wrote recently where you said, “I want to build interfaces that lets the AI gesture us into a better future without infringing on our agency.” I love that. I think it really captures the spirit of your philosophy and I just wanted to start there. Tell us about that quote. Tell us why you wrote it and how that fits into the larger way you think about AI.
Linus Lee (00:04:06:00)
Yeah, I think agency is really interesting. I think a lot of the way that I think about language models in particular—I think we're starting to see that models for different modalities, when you take a 10,000 foot view, a similar kind of emerging capabilities, but language is sort of the thing that seems most humanlike to us and so we talk about it a lot. And I think the way that I look at language models personally has been colored by the fact that while, like a lot of other people, I work with language models at that sampled output level of just telling it things and that it tells me things back.
I also work with it at a numbers level and so, and like looking at embeddings and so on. And I think that's something that we'll get to in a bit. But I think because of that, I personally— when I look at a language model, it is really just a function that predicts the probability distribution. While you can wrap it in very anthropomorphic packaging that makes it seem like it has kind of its own will and its own kind of intent. And you could project kind of people-like properties to it and say it wants things. Ultimately it's a statistical model of a probability distribution, just a very complex one. And so I think that colors a lot of my own thinking about how I view the technology. But just because it's like a pure function that models a probability distribution doesn't prevent us from accidentally building things with the technology that takes away agency from the things you want to do.
I think an interesting example is a coworker and a friend of mine at Notion and I were talking about making—this kind of a silly example. We were talking about making a cover image for a party that she was organizing. And she had this very particular aesthetic in mind of a cover image, which was kind of like a very ugly painting on parchment, not something you would consider aesthetic. And she's like, I want this very specific aesthetic. Here are some images I got off of Twitter that follow this aesthetic. Can you make one and in this aesthetic of a girl sipping wine, or something, I don’t remember the exact example.
And I tried so hard to use all of the image-generation tools at my disposal, like Stable Diffusion, Excel, DALL-E2, DALLE-3, all of these tools. Even some of my own image tools to make an ugly image. And it's just very difficult to get DALL-E to make an ugly image.
I think it’s interesting to me because—and this is something I've talked to some folks at Midjourney about also—about how the tool kind of constrains the search space of possible images you can generate so that normally it's kind of closer to what you want. Because normally you want something aesthetic, but in the process it may actually sort of take away—in the design of the tool, if not necessarily the capability of the model—it takes away some agency from the user. And so I think the underlying technology I think is kind of just a tool, a function, whatever, a mathematical object. But then we don't want to wrap it accidentally and packaging that disempowers us.
Dan Shipper (00:07:26:00)
That's really interesting. I want to back up. You said a lot of different things there. And I think they're really interesting and really important to unpack. So what I heard you say is sort of AI, or at least the current generation of AI models, is a function that models a statistical probability distribution and when we experience that ourselves, we have all these reactions to it—that we anthropomorphize it, we project out what it might be able to do from what it can do today in ways that are maybe unrealistic or maybe don't like, fully understand, how it currently works. And, and depending on the probability distribution that you select, for example, the probability distribution that you create from the training set of images that you use to train your model or the training set of texts that you use, for example, depending on those, depending on those things, you're going to create a certain set of possibilities, base of outputs that that sort of constrains the user.
And, in your view, I think, basically how that is selected and communicated is maybe important because, like you said, DALL-E you can't get it to make an ugly image. It's not built for that. So it kind of takes away agency from you because it's doing some stuff for you in a way that, I don't know, maybe Photoshop, for example, you can just manipulate the pixels, so there's no there's no agency taken away in that experience. Is that kind of what you're getting at?
Linus Lee (00:09:09:00)
Yeah. Yeah. I think there are a lot of different, pretty subtle ways that either intentionally or unintentionally shapes the kind of agency landscape of a tool. One very explicit example is a tool like DALLE-3, where the model is through training made so that it's sort of fundamentally unlikely to output bad-looking images. I think there are other examples.
So Photoshop is actually an interesting example because even if in theory you could kind of make everything by just moving the raw pixels around, I think the specific features that are easier to access tend to also shape the kind of style of output. And so if you look at popular image editing tools, you can tell when an image just made Instagram Stories or you can tell when an image has its background masked out by Keynote or something. And so all of these tools, even in pretty subtle ways, kind of tend to shape the output style.
Dan Shipper (00:10:07:00)
And I mean, I feel like that's part of what makes art in general. If you think about the music styles that are popular, it's based on what the sound boards can do and the electric guitar and the weird effects that Jimi Hendrix discovered—in some ways they're limiting the artist’s agency, but they also create this unique set of constraints that creates a unique vibe and sound that represents a genre or a generation.
Linus Lee (00:10:37:00)
Yeah, you touched on something really interesting there, imagining the output space as a very concretely spatial kind of thing. This is for me very concrete because I think a lot about embeddings—and embeddings exist in a very concretely spatial space. And embedding is a list of numbers that try to summarize quantitatively what some piece of text or what some piece of image semantically contains.
And so an example use case of an embedding: So you have an embedding model which is kind of a language model with its last part of it chopped off so that we can read out the raw numbers. And you can have an embedding model and you might feed and embedding model sentences like “The Eiffel Tower is in Paris” and “The President of France is blah” and “Notion is a tools-for-thought company.”
And two of those sentences are much closer in meaning than the third one. And so when you feed them into an embedding model, the embedding model will spit out for each of these sentences a list of numbers. And when you view the list of numbers as kind of a coordinate or point in space, in a high dimensional kind of coordinate space. Those numbers are going to be closer together for sentences that are closer together.
Dan Shipper (00:11:52:00)
The way that I think about it sometimes is you can take text and then assign pieces of text a latitude and longitude coordinate and that’s an embedding. And then you can map the text and then see which pieces of text are closer to each other. And the ones that are closer on the map, the ones that have a latitude and longitude that are closer, are going to be similar in meaning.
Yeah, that sort of clicked for me or I was like, that's what it is.
Linus Lee (00:12:15:00)
Another metaphor that’s sometimes used is a color picker. So if you use any kind of image editing tool, one of the ways you can pick colors is like a 2D grid or a color circle where you have two dimensions. You can move things around, and then you move the data around, and you're changing some of the RGV values.
And there are a couple of different ways to describe a color. One way is by RGV value, which is kind of like a coordinate or latitude for calling for a color. Another way you could describe a color is by just saying “orange” or “very dark blue” or “crimson.” And I would—in my head, “dark blue” or “crimson” word description is the input. And then conceptually, the RGV values are kind of like the embedding. I think that's really interesting because it kind of hints at this possibility that embeddings, or these numerical representations, might encode kind of pretty fundamental semantic insights about the thing that you're encoding—color or text—in a way that's just kind of mathematically manipulate it to do interesting things.
If you have the word “crimson,” you can't really manipulate it in ways like making it lighter or making it more blue or whatever. But if you have the RGV value, there are very concrete algorithms that you can use number crunching to make the color a lighter shade of crimson or a more blue or purple.
Dan Shipper (00:13:40:00)
No, I love that. I mean, that's sort of very central to some of your thinking around this. The current generation of AI tools is awesome, chat is great. But, you know, manipulating images or text. Via chat is a very coarse-grained tool and it's a lot harder to get the exact thing that you want.
And I think you've thought about it a lot: AI interfaces that allow you to be more precise about how you move through different ways that you want to modify an image or a text using AI.
Linus Lee (00:14:17:00)
Yeah. And I think thinking in that way—the spatial way of thinking about the possibility space for outputs of these models. The spatial view I think is kind of behind the way that I think about how to add more precision to these kinds of tools. One, kind of—Should I just pop into a demo?
Dan Shipper (00:14:36:00)
Yeah, let’s do it.
Linus Lee (00:14:39:00)
So the interface for this is very bad because it's just—Yes, this is one of a string of experiments.
I may show some other ones later, but this is actually half built. I was in the process of building something different and then it started being useful and I didn't really feel like putting more effort into it. And so it's in a very half finished form, but it's good for demonstrating this particular thing that I'm about to walk you through.
Dan Shipper (00:15:10:00)
So what is the general thing that you wanted to build it for? What is it? What was the original idea really quick?
Linus Lee (00:15:15:00)
The original idea was to try to let you describe an image that you want to create by adding and subtracting images and text. So there's a model called CLIP by OpenAI. This is actually a model that's been around for a while. If you use any kind of multimodal use text to semantically search images, kind of search engine tool, CLIP is likely to be one of the models behind that kind of thing.
The special thing about CLIP is that it tries to describe both text inputs and image inputs in the same embedding space—in the same coordinate space, so that if you put in a word like Eiffel Tower and a French flag, the model is going to understand the meaning behind both of those things and try to cluster them together because they're similar.
So a slightly newer thing that happened after CLIP came out is that people came up with a way of generating images conditioned on a specific point in this clip embedding space so that you could put an image or you can put in a text and then try to generate an image that is kind of conceptually the image that corresponds to that point in the space.
So if you put in a bunch of images of butterflies, they all kind of cluster around the same point in this embedding space, in this coordinate space. And if you pick one of those points in that cluster and then generate an image back out corresponding to that point, you would get some image of a butterfly. And so given this ability to both take an image or text and encode into the space and then pick a point in this space and decode it back out into its corresponding original image form, maybe you can generate an image not by just typing a text prompt, but by mixing a bunch of concepts together in the space.
So if you want a butterfly that has the colors of the French flag, maybe you can mix the image that is of the French flag and an image of a butterfly. Just pick a point in between these two concepts in the embedding space and then decode it out and maybe it'll mix those concepts together. So that's the kind of idea I was trying to explore with this little hack.
The thing that I'll show you now is: Here I’ll pick—This is a selfie of myself. For dumb demo reasons, I have to put myself in twice. And then this is an illustration. Actually, let's do something different.
This is a kind of clip-art illustration of a human face. So what this tool is going to do is kind of embed both of these images. They correspond to slightly different points in the CLIP model’s embedding space. And then I'm going to use this tool to pick eight different points along the line between these two images and then decode each of them out into its own image.
And so we're basically sampling the distance between these two images in this embedding space and then trying to look inside the model and see kind of what the model sees at each of these points.
Dan Shipper (00:18:20:00)
That’s very cool. A way that I would think about this is: If you're used to Photoshop or something or Instagram or any of these—any photo editing capability—they have all these sliders that you can do to make it brighter and you can on the left it’s very, very bright and on the right it's very, very dark and you can just slide in between. And what you're building here is a slider between almost concepts or ideas for images.
So on the left side is your headshot and on the right side is a sort of vectorized cartoon drawing that is in a style that you like, and you're just sort of seeing, if you slowly turn your headshot into that, what are the different points along that space.
Linus Lee (00:19:08:00)
You could also—speaking of sliders, you could also not just slide along this kind of stylistic scale, but you can start to do fun things like slide along an emotion scale, so one thing that I've done before here is I'll try to go from “an image of a young man who is happy to an image of a young man who is very, very angry” and—there we go—now it’s progressively more and more mad. You can see the effect is really intense and, by the end, it kind of destroys the image. So sometimes I tone down the text effect.
So this is one of many experiments that I've done to try to build interfaces around exploring embedding spaces and latent spaces of generative models instead of talking to the model directly. The reason this is interesting to me is kind of twofold: One, I think it's just intellectually really interesting that in the process of learning how to predict the next token or in the process of trying to model probability distributions of language, the model internally learns and learns to pull out kind of human-recognizable, semantically-useful concepts like emotions. And I think learning how the models do that and understanding it to improve models or make them more controllable, I think is useful.
And just the fact that the models do that I think is interesting. So I want to understand it better. I also think when you look at models this way, as a thing that gives you dials and is a thing that gives you numbers that you can manipulate, language models start to feel less like agents that have their own kind of agency and more just like, oh, this is like a calculator.
Simon Willison, who is one of the prominent bloggers in the space, actually has a really great blog post about language models as a kind of a thought calculator where he talks about a similar idea. One way to look, one way to look at this demo is a kind of a calculator for images and concepts and text. And unlike asking DALLE-3 through ChatGPT to generate an image which feels like you're asking this entity—this agent—for requests here, it's like, okay, there's very concretely a thing that does math, and you can use it to get dials and control, and that feels more like a tool and it feels like perhaps there's a direction here that lets you retain more agency as a person using the tool.
Dan Shipper (00:21:38:00)
I feel like one of the things I'm getting from this is when people use this generation of AI tools for the first time, there's this woah factor. I was using it the other day. I had done a lot of journaling. I've been going through some stuff in my personal life.
I've been doing a lot of journaling about it, and I fed—I don't know—4,000 words of journals into Claude. I did Claude and ChatGPT, but I find Claude to be slightly better for this and I'll ask it, “What am I not seeing?” “What are the patterns that you're observing in my psychology or the psychology in people around me?”—all that kind of stuff. I know I'll put all this stuff and I screenshotted that and I sent it to my therapist and he was like, “That's totally wild.” And I was like, cool, he must think this is awesome. He's like, “It's so good.” And I was like, “This is amazing.”
And then he was like, “I thought we would have at least a couple of years until it would be this good.” And then he was like, “Do you think there's room for human therapists anymore?” And I was like, now I have to be comforting my therapist because he's afraid of the AI.
And my feeling is he had that immediate “oh shit” moment that everyone has. But if you really dig into these tools, they do amazing stuff, but they're not even close to, in my opinion, for example, replacing a therapist. And I think part of the reason why people have that, oh shit, which is what you're referring to here, is we just respond to things that feel really intelligent.
And if the same thing was performed in a slightly different way, like with a slider like this, it would still be cool, but it would feel more familiar, like a tool that we would that we might use anyway, like a calculator that that is in itself sort of mind blowing but isn't threatening or sort of it's, it's not like a replacement in the same way that like a fully intelligent thing is. Is that a fair way to describe what you're, what you're saying?
Linus Lee (00:23:23:00)
Yeah, I think so. I think so, yeah. When you train a model there, there are fundamental capabilities you bake into it and the packaging that you wrap it in really dramatically influences. how people receive it and they use it.
Dan Shipper (00:23:45:00)
You know, if you took something like human neurons and just laid them out in the right way and wrapped them in an interface, it might look like a tool too.
Linus Lee (00:23:54:00)
Certainly.
Dan Shipper (00:23:55:00)
And, if you just get enough of those neurons together, you get a brain that's conscious and can do things and think. How do these two things fit together?
Linus Lee (00:24:10:00)
That's an interesting question. I haven't thought about that. I think off the cuff, my kind of intuitive reaction is, I think it would be an interesting intellectual and societal endeavor to try to build a thing that is like a simulacrum of humans in every way, including having some kind of goals and agency and so on and so forth.
And I think if that was the goal of building a simulacrum of everything that at least intellectually makes us human, you would want to build elements that these tools don't have, like trying different things that exploration—I think exploration, intentional exploration is actually, I think a huge part of intelligence that humans have that these models currently don’t exhibit.
However, I think when companies like Google and OpenAI and even Notion try to build a kind of language model based tools that help you do your work, there are a lot of there are a lot of parts of a human intelligence that actually are kind of annoying when you're just trying to get some work done. The fact that humans sometimes just sit at their desk and daydream is probably not that useful if you're trying to just hire an entity to read thousands of papers and try to summarize them. And so my gut feeling is that a lot of these kinds of corporations and teams that are building AI to be useful and especially in a professional context, their goal is not to build a simulacrum of humans or their goal. It's just to build a thing that is an intelligence steam engine. And then I could imagine some other research groups or some. Other groups of people whose goals are to build this simulacrum of humans.
But I think in the beginning, I think they tend to look similar. But I think as we get further down this road, we're going to see a kind of divergence between people who really want to build a simulacrum of humans versus people who just want an intelligent thought calculator.
Dan Shipper (00:26:12:00)
I went to OpenAI DevDay and they did this whole presentation on how they fine-tune GPT-4 using Slack messages. Did you see this? And basically they asked this fine-tuned version to do something and it was like, “No, I'll do it later.” And then it was like, “Do it or you're fired.” And it was like, “Okay, I'll do it.” Yeah, but you're right. There's all these parts of being human that maybe we don't necessarily want to model for the AIs we build. And there's some divergence there in terms of is it a tool or is it something that has agency and all that kind of stuff.
Linus Lee (00:26:48:00)
Yeah, I think I've noticed something actually very similar. I have a personal chatbot that is sometimes used for brainstorming, kind of for entertainment purposes.
Dan Shipper (00:26:56:00)
Can we see it?
Linus Lee (00:26:56:00)
I can. Hmm. Here, I'll open it up.
Dan Shipper (00:27:04:00)
You have a personal chatbot.
Linus Lee (00:27:07:00)
I have a personal chatbot. It's an iMessage bot because I like having it in iMessage because it's accessible from all my devices. Apple does all the syncing for me and it looks nice and I can react to things. I haven't used it in a while, but I could ask it something like—
Dan Shipper (00:27:20:00)
Well, it looks like you asked it “What makes Notting Hill particularly picturesque compared to other neighborhood areas like Westminster?” Tell me about that.
Linus Lee (00:27:27:00)
I was visiting London and I was trying to look for neighborhoods to visit. Actually, we can ask something about it. We could say, “What do you think about London versus New York?”
Dan Shipper (00:27:40:00)
And why would you ask this versus asking ChatGPT? How is this built? What is it trained on? Why is its answer better, right?
Linus Lee (00:27:48:00)
“Would you rather live as a 25 year old…”
So there are lots of interesting things about this: So, first, sometimes it's kind of unreliable, but usually when it receives the message, yeah, it’ll say “read,” and then sometimes it'll have a typing indicator, which is—the way that this works behind the scenes is just when my server receives the message and starts generating the output, it sends a read reply.
But conceptually it's like, “Oh, the AI read my message.” I know that doesn't actually mean anything because it's just a bag of numbers, but this is conceptually interesting.
Dan Shipper (00:28:27:00)
It's one of those same “oh shit” things that makes it feel human, you know?
Linus Lee (00:28:32:00)
Right! And then sometimes I'll send a message and I’ll lock my phone and I’ll get a notification that it sent me a message. I'm like, “Whoa. Computers don't normally just send me notifications for things that it's thinking about.” So the packaging, again, is very important. So the reason I brought this up is because the model that backs this is actually not fine-tuned or RLHF trust model. It is the raw base model of, I think, LLaMA 2’s 13 billion-parameter model.
Dan Shipper (00:29:00:00)
LLaMA is Meta’s open-source model.
Linus Lee (00:29:03:00)
LLaMA is Meta’s open-source model. One of the things that makes that model special compared to things like GPT-4 is that it's open source obviously—I can host it on my own, which is what I'm doing here. But another thing that makes it special is that they released a version of the model before they did all of the fine-tuning in RLHR to make it chatty—to have it follow this kind of chat form.
And so the base model is purely a text-continuation model. So if you ask the base model something like, “Where would you rather live in New York or London?” It wouldn't really interpret anything. The model's task is to predict the next token in internet text. It would probably interpret that you're in the middle of a blog post about the best place to live, and it would just continue writing a blog post or something instead of answering the question. I had to add a bunch of prompts in front of it, so I'm prompting a base model, which is, you know, in the ye olde days of 2020, what we used to do before all these instruction-following models existed, people would just ship base models and then prompt them.
Dan Shipper (00:30:06:00)
So you're prompting a base model here. But, why are you asking this model versus ChatGPT? How is it more like you or how is it more interesting to you?
Linus Lee (00:30:18:00)
So here I've asked it, “What do you think about London and New York? Where would you rather live?” If you asked this question to ChatGPT or Claude—I haven't done that before, but I would guess that probably with high likelihood, the model will probably say something like, “Oh, I'm an AI language model, and I don't actually live anywhere, so I don't actually have any preferences on which cities I would live in. But here are some things that you can know about London and New York.” And I'm like, that's not really what I'm looking for. I'm looking for kind of a personal take, and it doesn't really even matter if a take is correct or not.
I just want vibes, you know? And because this model hasn't gone through any kind of fine-tuning about the fact that it's an AI language model, it's just going to generate text as if I'm talking to a rando on the internet. And I've done some prompting, so it's not just a complete rando, it’s a fairly intelligent, cogent rando, but it's going to say things as if it's a human on the internet because the base model is just trained on text that is mostly humans on the Internet.
So it says here about something about New York City or whatever. “Ultimately your choice of where to live depends on personal preference...” But, it gave me some opinions. It’s still a little bit professor-y because in the prompt I tell it it’s an assistant. But I could have. I could have just prompted the model to be like, “Hey, you're Mickey Mouse and you live in New York.” And it probably would have followed that.
And so you can prompt the base model to be much more friendly. When I used to use a dumber base model for this, which was GPT-J, which was a 6 billion-parameter model that's trained on way less data, it was even dumber. And so sometimes when I ask it questions this model would say something like, “Hmm. I'm not sure, but I have lunch, I'll get back to you after lunch.”
It emulates human conversation, but it's not really useful. I found the 13-billion LLaMA model to be a good balance of sometimes it gives creative answers that are kind of unexpected and sometimes it gives cogent answers.
Dan Shipper (00:32:20:00)
That's funny. I love it. I think that's really cool. I want one. Sign me up.
Linus Lee (00:32:23:00)
Yeah, I'll send you. It’s a private number you can text.
Dan Shipper (00:32:27:00)
Perfect. Cool. I want to get back to some of the things you wanted to share. I know you had a couple of ideas. I love that you came with a whole Notion doc prepared for what you wanted to talk about. SoI know you had a couple ideas you wanted to talk through.
I want to make sure we get to those things. And then I do want to jump into specific ChatGPT chats and all that kind of stuff.
Linus Lee (00:32:49:00)
So let's do it. We already talked about AI as a tool. Before we get into specific chat transcripts, I thought—I was thinking about what we're going to talk about today in the shower in the morning and one kind of shower thought that I had is that, so, when I use ChatGPT, the way I think about prompting is actually quite different than when I do prompt engineering at my job at Notion, writing prompts, which I do a lot of. Prompt engineering is sometimes like the part of my job as an AI engineer that makes me kind of feel the dumbest because sometimes my changes to the code base will just feel like adding a “must” to some English instruction or changing the instructions so that it sounds even more desperate, but that's a useful product change. So some days I just spend the entire day prompt engineering and it's like, what am I doing? Why did I go to school for computer science? It's like, this is what I'm doing.
The mode that my brain was in—
Dan Shipper (00:33:47:00)
The revenge of the English major is happening right now. I love it.
Linus Lee (00:33:52:00)
The mode that my brain is in for prompting when I'm prompt engineering a prompt that's built into a product like Notion is very different than the kind of prompting brain that I have on when I'm talking to ChatGPT. And I kind of compare this to scripting versus software engineering in programming. So let me explain that, unravel that a little bit.
Programming, I would say, is this superset of activities that just involves any kind of writing computer programs. Sometimes you write computer programs and you ship it to millions of users and this thing runs on hundreds of thousands of computers and they have to run very reliably. They have to accept lots of diverse kinds of input and always generate the right answer.
So it's a very robust, resilient, kind of well tested piece of machinery. And when you build a product like Notion, you're kind of doing software engineering, but you're writing programs to be robust and reliable. Other times you write programs, but you're writing programs just to accomplish something quick and quick and easy. You might write a simple little command line command or script to search your folder for a keyword or lists of files that you have just so you can look at them. Or maybe even a slightly more complex one might be, iterate through all of the files that you have and delete any file that is older than six months. That's a quick and easy thing, but you’re not going to ship that to millions of users.
You just have to run it that one time. And if you make a mistake accidentally, you'll realize it and you'll rerun the command. And so these two categories of activities I would both call programming, but one I would call software engineering, and the other I would just call scripting, writing scripts for running commands equivalently.
I think when you're writing prompts for language models, there is a broad category of things that I would call prompt programing and then some kind of prompts you write— the kinds of prompts that I write at work are prompts that we write and we iterate on and we test and evaluate very heavily, and then we ship millions of users and they put in all kinds of inputs to this thing.
And so they have to be very robust and well-tested and they have to accept lots of different languages and so on and so forth. And so that's what I would call more prompt engineering, where you write prompts and evaluate them robustly and things like that. When I'm talking to ChatGPT though, I don't have to be as robust.
I also have opportunities to make revisions if the first time the prompt runs, it doesn't do the right thing. And so when I talk to ChatGPT or other kinds of tools like that or Notion AI inside of Notion, I would say that's much closer to the scripting part of the metaphor, but I'm just trying something and if it doesn't work, I'll just fix something and try it again and kind of iterate my way towards the final results that I want.
And it's not this thing that I want to make a robust piece of machinery. And so I made that distinction because when you're doing hardcore kind of prompt engineering, all of these techniques around evaluations and few-shot prompting prompting and using kind of structured outputs apply and they're very useful when I'm just talking to ChatGPT—good few-shot promise are really hard to write, like the specific examples that you pick influence your output quite a lot. And so I actually almost never write few-shot prompts when I’m just talking to ChatGPT. I just write zero-shot. And then if the output isn’t exactly what I want, I'll usually ask a follow up question to revise the output. And so I think that distinction was interesting.
Dan Shipper (00:37:15:00)
I love that distinction. And I feel like that just mirrors something that people kind of miss about ChatGPT when they use it at first, which is they ask, how do I use it? And they try to use it perfectly instead of just being like, I could just bang away at this thing and try everything possible, you know, and think that second one is way better because it's such a broad tool. It has so many different things it can do. And we actually don't just as human beings don't even know all of the things it can do yet. And the best thing that you can do is start with something simple and then just keep banging away until you get something that you want to sort of learn the limits of what or what it can do.
And I think people are way too reluctant, for example, to ask it to redo its response or revise something in the history of messages and just be like, let me try it again or just start a new chat. And sometimes it almost feels impolite to do it or something like that. But I think the best users of ChatGPT just know how to just not accept anything until it's working and just keep iterating until they find something that is great.
Linus Lee (00:38:26:00)
There was a fun example here where I had to look through a JSON file of some data and I didn't really feel like constructing an elaborate prompt to try to ask it exactly what to do. So I just uploaded the CSV to ChatGPT and I think I literally just asked it “What's the vibe of this file?” And I should try to find it because I think the answer—
Dan Shipper (00:38:48:00)
Well, take your time. I don’t want you to rush. I love that. I just think it's one of those things that it's so funny because before ChatGPT came out, before GPT-3, before GPT-2, all that stuff, there were all these people, these intellectuals who were constructing scenarios for how AI could kill us. And a lot of them hinged on AI basically misunderstanding what we said and doing something different. So if you said, “make paper clips”, it will just misunderstand that you meant maximally make paper clips and just turn the entire world into paper clips and so that was the AI that we're afraid of. And the AI that we got was “What's the vibe of this file?”
Linus Lee (00:39:30:00)
And it just gives you quite a cogent answer. Yeah, I think there are some theoretical ways and concerns that still take that form. Yeah, but certainly in the way of instruction following, I think we're making a lot of progress in aligning the model with what we want to do.
The model read through all the columns and it was like, here's generally what the files are about. Here are the different columns and let me know what you want to do. So that was fun. I can walk through some other examples.
Dan Shipper (00:40:04:00)
Yeah, let's talk about it.
Linus Lee (00:40:06:00)
Yeah. So probably the most common use that I have for ChatGPT is actually just programming help. I write a lot—
Dan Shipper (00:40:14:00)
So you don’t use Copilot for that?
Linus Lee (00:40:18:00)
I do use Copilot. I use Copilot heavily and I think learning to use Copilot is kind of a different learning curve than learning to use ChatGPT. Copilot is great because it has all of the context of the file that you're actually in and I think as a general rule, AI tools can give you better suggestions when they have more context about what you're trying to do.
One of the philosophies behind Notion AI is that it has more context about the work that you're trying to do inside your workspace in Notion and so it tries to be better. Same with Copilot. It has all of the kinds of types and definitions and functions that I'm using inside my code editor and so it can provide better suggestions, but the suggestions that it provides are kind of micro-level. Add some lines here, add some lines there. So let’s take a couple examples.
So earlier this month, I was trying to write a bunch of data processing scripts—these are Python scripts that take millions of pieces of text and try to process them in some way. And so I had to learn how to do this in kind of a streaming way because I didn't have enough memory to fill all of the data on my computer.
And I had never used a streaming kind of data set library before. So I learned about this library called Arrow, and I was trying to use it. This isn't the kind of thing that you would be able to use Copilot for because it's not—Copilot is great if you have a file with some content already in it, and you mostly know what you're doing, you’re just trying to figure out the right method call or the right function to use or the right variable to use.
But here I was asking a very open ended question like, “Here's a library that I'm using, how do I make it faster?” Or like, “How do I even approach this kind of problem?” One example that it links here is like, this isn't about the data set thing, but this is about I was building a library for processing UTF-16-encoded characters and I was just trying to understand how these things are represented and then maybe get some help writing a basic implementation of how to parse this format.
And this isn’t, again, the kind of thing that Copilot is useful for because I didn't even have a basic template for this thing. I was just trying to understand how we should approach this problem. And so I asked it “How is it encoded at the byte level?” Just try to get an understanding of what this format is. And then I asked it, “Was there a simple algorithm?” Because I thought if there's an algorithm I can understand, I'll just write it in the language I was using. And then when I asked it for the algorithm, they even gave me a Python implementation, which I based my implementation on.
But then I asked it follow-up questions to try to kind of explain parts of the code that it didn't initially explain. And this is kind of a high level problem and breaking it down what it was useful for.
Another example might be like this, which is a very long conversation. I think this is a whole programing session that was a single chat where I began, again, with a very high-level problem, like, what's a good way to like, read this very, very large file of data incrementally in a streaming way so that don't have to use up all of my computer's memory.
And it initially gave me some approaches and then I started steering it to like, okay, I don't just want to use PyTorch. I actually want to use a different format because. And over time I converged on to a solution.
Dan Shipper (00:43:36:00)
I think that's a really smart one. In addition to, start with something quick and iterate from there. Also, start with something broad and ask ChatGPT to help you understand how to prompt it or what it thinks the best solution to a particular kind of broad question is going to be.And then gradually kind of narrow in, rather than starting with maybe something more narrow that you just wanted to answer. Is that a good summary?
Linus Lee (00:44:09:00)
Yeah. Another, another way that I’ve thought about that is the more context the model has about why you're doing what you're doing or what your goals are, the better its suggestions are generally going to be. And so instead of asking it, “How do you read an Arrow-formatted file?” You can ask it, “I'm trying to read a dataset that’s 80 gigabytes large into a computer with 40 gigabytes memory. I’m on Linux. I’m running this version of Python. How should I approach the problem?” And it'll give you a few approaches. And you can kind of narrow down from there.
Dan Shipper (00:44:32:00)
Do you have a custom instruction set for this?
Linus Lee (00:44:34:00)
I don't actually.
Dan Shipper (00:44:36:00)
Oh my god. We're going to have to revoke your ChatGPT privileges.
Linus Lee (00:44:40:00)
I know. I think if I had used it I would probably use it to make the outputs a little more concise. But I actually haven't found the normal tone too offensive.
Dan Shipper (00:44:50:00)
Okay, cool. Well, let's go back to your example. So you had something to show about Seoul neighborhoods.
Linus Lee (00:44:57:00)
Yeah. So I did some traveling recently. I was in Seoul, I was in Thailand for a bit. And I thought this example was interesting because from the outside it looks like I just asked a question: “What are the major neighborhoods to see when I'm planning a trip to Seoul” and then “What is each known for?” and then I just have a single output.
But actually if you look closer. There are four outputs or this one's kind of forked—there are three outputs. So this is I thought this was worth noting because because the language models are non-deterministic, if you ask a very broad kind of recommend me something to do-kind of question like this, every time you call it the results are going to be slightly different. And what I want to know, really, is what are the best neighborhoods to visit?
And so I asked a question like this, “What are the most interesting neighborhoods?” and then I ran it three times. And each time the results were a little bit different, but there was the overlapping set and the overlapping kind of descriptions and focus. And so I read through all three and then just tried to pick the ones that seemed like they were mentioned the most. Some of these, I think, also involve searches. So here it did a Bing search and then gave me some results.
Daniel Shipper (00:46:02:00)
Brilliant.
Linus Lee (00:46:04:00)
By looking at multiple samples, I'm able to get a better sense. So I did something similar with Bangkok as well here—
Daniel Shipper (00:46:10:00)
Wait. So, okay, let me just let me just see if I get it. So basically, you're going on a trip. You want to know what the best places are to go. But rather than saying what's the best place to go, you're asking a broad version of that rather than having it filter for you.
You're asking the broadest thing, which is where could I go in Bangkok or where could I go in Seoul? And then every time it responds to you, you’re just clicking that little redo button. Can you click it for us? I want to see if it does another one.
Linus Lee (00:46:40:00)
Yeah. Here.
Daniel Shipper (00:46:42:00)
So, you're clicking the redo button and then it's re-outputting it and basically for a broad question, it's going to be slightly different every time. And then you're being like, I'm just going to read through the things. And the most commonly mentioned things are probably the top things.
Linus Lee (00:46:58:00)
Yep. Exactly.
Daniel Shipper (00:47:00:00)
Why? Why not just ask the for the top things. What? Yeah. What's the difference? What's the difference in your mind?
Linus Lee (00:47:07:00)
I think this is again the agency thing. Sometimes, in my life, if I really, really trust the person that's giving me a recommendation. He's a best friend. I've known him for six years. He knows both the city really well and me really well. If he says, “If you have an hour, this is the only place you should go to.” Okay, it's probably good. He knows me really well. But ChatGPT doesn’t really know me well. I think even if I had a 5,000-word custom instruction, it would be very difficult to describe everything about me in that description.
And so instead I'm asking it for like, give me some options, and then I, knowing myself, can then now be a little more sure that instead of having to judge its filtering and I trust its filtering. I can do kind of last-mile filtering.
Daniel Shipper (00:47:50:00)
That's really interesting. I feel the exact opposite. I have a very long custom instruction and I really find that it helps it know who I am and what I might want. And then I also find that it's really very cool for being like, “I'm in Bangkok and I have these requirements, I don't eat X, Y, Z, and I have this amount of time and I was just at five temples yesterday. So I don't want to do any temples and I mean, I'm a nerdy tech guy that—whatever—and what should I see?” It's very good for that. Pushing it into a space that you like. I still think it's really useful to just do the redo thing a bunch and then see what comes up.
But I personally find the filtering and telling it more about you is really nice. And I think that that's probably just a personality thing which is like you just want agency and I'm like give me the best answer. I don't care.
Linus Lee (00:48:55:00)
Whenever I depend on a language model to do any kind of filtering, I'm always slightly concerned in the back of my mind about specific things that I put into the prompt that might really bias the output. So this format is actually interesting because first I tried to give it as much context as possible.
And so like I said, “I'm here for one morning and afternoon. It's a Thursday.” I said “it's a Thursday” because some things are closed on Thursdays near Queen's Park because maybe… but because I said “Queen's Park,” which is a particular park in a part of Bangkok, a lot of the suggestions that the model generates are like, “Oh, you can spend the morning at Queen's Park.”
And it's a nice park, but it's not the biggest park in the city. It's a park. But because I put this specific string into the prompt, I think, given the kind of RLHR tuning that the model has had, the models like, the human reader will think that output is better if it's more aligned with the instruction and so I'm going to really mention all of the aspects of instruction. And so I'm always a little bit afraid of any kind of instructions over-steering the model. And, I think very heavily customized custom instruction, but sometimes give me the same worry as well. Maybe it's biasing a little more than I would specifically want. And I don’t want that.
Daniel Shipper (00:50:15:00)
I think you're right. It does do that. It’s such a yes man and so it connects everything back to my custom instructions sometimes and I'm just like, “That's not relevant. Don't do that.”
Linus Lee (00:50:27:00)
Yeah, I’m a generic human.
Daniel Shipper (00:50:28:00)
I think I'm willing to put up with the annoyance of that because every 10 things it says, it says something totally brilliant. And I'm like, I cannot believe you just did that. And so I assume it'll get better over time, but there's definitely a cost to it.
Linus Lee (00:50:42:00)
Yeah. I have a couple other use cases and things that I do when I'm prompting that I don't have specific examples for, but we could try one that I think is interesting or worth noting. One is “continue writing.” So continue writing—This doesn't really apply to you if you're using ChatGPT, but sometimes in other tools—like writing tools like Notion—there's a continue writing option.
This is interesting. So what this does is it tells the model, Hey, let's start here and just pretend like you're just continuing the text, pretend like you are the next token prediction bot that you are originally made to be and just keep writing. And I think it's interesting because first of all it’s a very kind of braindead thing to try.
This is kind of a weird doc to try to do this in, but I have a doc—I don’t have any that I can go to immediately—but if I'm writing about writing about HCI or writing a blog or making a list of something, sometimes I want kind of a brain-dead like, just give me more ideas like this button. And the continue writing prompt is kind of a give me more ideas like this button or continue on my thoughts without any—
Daniel Shipper (00:51:57:00)
Well, let's make a list. Okay, so I have a bunch of books in front of me and I want to see if I can recommend books based on the books I have in front of me.
Linus Lee (00:52:06:00)
“Some books I've read and liked.”
Daniel Shipper (00:52:08:00)
Okay, The Brothers Karamazov.
Linus Lee (00:52:10:00)
I'm going to have trouble spelling that.
Daniel Shipper (00:52:12:00)
K-A-R-A-M-A-Z-O-V. Okay. Let's see. Medieval Technology and Social Change, which is really good. It's about how the stirrup changed everything in the Middle Ages. Exhalation. Ted Chiang. Let's give you another one that's sort of left field. The Essential Kabbalah. And I'll do one more.
I feel like I am missing stuff that I would normally— Let’s say The Defense of Socrates.
Linus Lee (00:53:00:00)
Okay.
Daniel Shipper (00:53:01:00)
“Defense” of Socrates, not “defensive.” But I think that's a good book title.
Linus Lee (00:53:02:00)
It's a good prompt! Okay. “Continue writing.” Okay.
Daniel Shipper (00:53:09:00)
Okay. The Gene: An Intimate History. Good. I cannot believe—
Linus Lee (00:53:16:00)
It's very generic.
Daniel Shipper (00:53:17:00)
So it gave me Sapiens. I can't believe it would do me like that.
Linus Lee (00:53:22:00)
I mean, statistically, it's probably accurate in the pool of all—
Daniel Shipper (00:53:30:00)
I feel like this might be better in ChatGPT. If you copy-paste this into ChatGPT, I'm curious what happens. Well, here's what I do: First ask it to tell you what vibe these books have and then ask it to recommend.
Linus Lee (00:53:47:00)
“Here are some books. What is the general vibe of these books?” One thing I love about ChatGPT is it just doesn't care if you make spelling mistakes.
Daniel Shipper (00:54:03:00)
I know. It's amazing, isn't it?
Linus Lee (00:54:05:00)
It just knows what you want. Okay, It's kind of writing a lot.
Daniel Shipper (00:54:05:00)
I think that’s good. You know, getting a bunch of stuff and then you can say, okay, sort of summarize that and give me the overlaps or whatever… “philosophical, historical, spiritual and speculative themes.” I definitely picked a pretty random assortment. I wonder if it could compress… I want it to compress a little more or maybe we can hit the redo button. Whatever you think. What would you do?
Linus Lee (00:54:38:00)
I would say, “Can you synthesize your detailed descriptions into one description and topic preference?”
Daniel Shipper (00:55:00:00)
“Deep, intellectually stimulating themes that explore the human condition, philosophy, and the interplay between society, technology and spirituality.” That seems like me, sure.
Linus Lee (00:55:11:00)
And then I can probably ask it. “Can you recommend me some books like this?”
Daniel Shipper (00:55:20:00)
And I would also mention—We'll see what it does. But, one thing that I find is that it will do pretty well-known things. And so asking for off-the-beaten path or lesser-known works is good. Gödel, Escher, Bach. Perfect. I love that book. It was definitely wrong about a bunch of stuff. But, great book.
I love Sophie's World. Incredible. Zen and the Art of Motorcycle Maintenance. One of my favorite books. The Man Who Knew Infinity. I’ve never read it, but that sounds quite good. Cosmos by Carl Sagan is literally right there.
Linus Lee (00:55:50:00)
Aha! You got Sapiens again.
Daniel Shipper (00:55:52:00)
I got Sapiens again.
Linus Lee (00:55:54:00)
Yeah, maybe you need these custom instructions.
Daniel Shipper (00:55:56:00)
I can’t escape Sapiens.
Linus Lee (00:56:00:00)
I will say—just to redeem Notion AI a little bit—I want to try doing something like that here, except I'm going to ask, “For each of the books above, bold the book title and add a description of the vibe of each book.”
Daniel Shipper (00:56:23:00)
Interesting.
Linus Lee (00:56:25:00)
This is a bit shorter.
Daniel Shipper (00:56:27:00)
That's way better. That's very—
Linus Lee (00:56:29:00)
That’s done. Now this is a document that I have, so I can maybe up here I can—up here I can add a quote block and be like, synthesize the vibes below into a single paragraph.
Daniel Shipper (00:56:40:00)
Holy shit.
Linus Lee (00:56:47:00)
Eh. Okay, synthesize more.
Daniel Shipper (00:56:52:00)
Wait, wait, wait, wait. So you said synthesize more and then all caps compress.
Linus Lee (00:56:57:00)
Compress!
Daniel Shipper (00:56:58:00)
AI researcher uses AI.
Linus Lee (00:57:02:00)
“You only have 50 words.”
Daniel Shipper (00:57:06:00)
This is the secret, folks. All capitals.
Linus Lee (00:57:08:00)
All-caps works. If you look at OpenAI's system prompts for a lot of their tools, all-caps works. Okay. This is fine.We’ll keep these.
Daniel Shipper (00:57:24:00)
I think that's pretty good. This is making me feel like we should cover some of the things in Notion AI that I might not know. Yeah, it’s still generic. You can ask it to do things that are off-the-beaten-path and see how it does.
Linus Lee (00:57:40:00)
I might just also copy.
Daniel Shipper (00:57:42:00)
That’s interesting.
I love The Dispossessed. Incredible book. The Left Handed of Darkness. Incredible. I'm a huge Ursula K. Le Guin fan, so it definitely got me. I love Invisible Cities.
Linus Lee (00:58:02:00)
I want to do a final trick, which is to turn this into a table with columns for—
Daniel Shipper (00:58:04:00)
Oh my god.
Linus Lee (00:58:06:00)
—four columns for title and author and summary.
Daniel Shipper (00:58:09:00)
The Master and Margarita is on my list. So this is really good actually. And woah.
Linus Lee (00:58:24:00)
And now I have a table.
Daniel Shipper (00:58:27:00)
Oh my god.
Linus Lee (00:58:28:00)
And I believe if I want, I can also turn it into a database. And now I have a database. This column needs to go, but there you go.
Daniel Shipper (00:58:37:00)
That is very freaking cool.
Linus Lee (00:58:40:00)
It evens out the competition a little bit.
Daniel Shipper (00:58:45:00)
Yeah, that's really great. Wait, tell me, okay, thinking about when you would use this—When am I going to be doing this in Notion and what is that good for versus some of the same stuff in ChatGPT? Because I think the outputs are reasonably similar and I think you're using GPT-4 under the hood.
I don't know if you've announced that or not—or I don't know what you're using—but it's close, at least to GPT-4, whatever it is. So yeah. When are you using this? What's it useful for? How do people think about how to incorporate this in their lives? I use Notion all the time and I have not used these features yet and I think I should.
Linus Lee (00:59:24:00)
What I just did, actually, I think this is kind of contrived, but I think this is actually a good example of one of the workflows that I use a lot in Notion, which is take something that's got a pretty rough and then try to add a little more meat around the bones and then eventually turn it into something that's a little more well-formatted and presentable.
Yeah, this is really nice. If you have a bunch of meeting notes and the way that I take meeting notes or just a rainfall of bullet points. And then at the end I have bullet points basically only I can decipher and it's not very presentable, but I could go through and say hey like, turn these into prose, turn these into paragraphs ,and then maybe add some headings. And then you have something, in the end, that’s quite presentable.
And so I think going from any kind of a rough sketch to something that's more structured is useful. More generally, I think as a general rule for any tool, if there is some context in which you're working, when the tool has more context about what you're trying to do, either in the same doc, or increasingly in other documents as these tools are all trying to do retrieval, the better.
And so for things like this kind of programming question or for asking about Bangkok, there isn't really much context besides maybe a custom instruction about who you are and what you like. Yeah, And so ChatGPT might be better, it's able to do things like browsing, and then maybe the very long conversation support in ChatGPT is useful.
One of the things that I like about this, as I kind of demonstrated here is you're not having a kind of just face-to-face conversation, you're two actors engaging with some kind of a changeable, mutable state like a document. And so maybe in outputting something you can make edits to its output and then ask it to try again.
You can output it as a table and then turn it into something more complex. And so being able to collaborate on something I think is useful in addition to the context that it brings.
Daniel Shipper (01:01:19:00)
That is very cool.
Linus Lee (01:01:20:00)
Another thing that that Notion recently gained the ability to do is answer questions. So you can ask it, “What articles that I’ve read talk about esoteric programming—”
Daniel Shipper (01:01:38:00)
And what's in this Notion?
Linus Lee (01:01:40:00)
It's my personal Notion. So hopefully it doesn't say anything private. Cool. Okay, so there's this thing called—these are mostly web articles that I’ve clipped into my Notion. This is a blog post that I read. Programming Portals is kind of interesting. This is definitely a Clipper article and it’s kind of formatted in a really weird way because of the way that the web clipper works, but it’s able to pull out the information in there.
Daniel Shipper (01:02:14:00)
Can you ask it, “What's an embarrassing secret that I definitely don't want to share on an interview show?”
Linus Lee (01:02:18:00)
Yeah, maybe we'll do that after the show.
Daniel Shipper (01:02:23:00)
Yeah, that'll be the director's cut—for paying subscribers only.
Linus Lee (01:02:30:00)
Yeah, exactly. Put that behind a $59.99 paywall.
Daniel Shipper (01:02:38:00)
That's really cool. What do you find yourself using the Q&A bot for?
Linus Lee (01:02:42:00)
The Q&A is actually a lot more useful in a team context because if it's just my Notion, I have a very precise way of organizing my Notion and I know kind of where everything is. In a team, though, not only are there a lot of people changing stuff inside the Notion, there's also a lot of people just changing the code base and updating when the holidays are and what the policies are.
And so I frequently ask it questions about how to do X—like, how do you file for PTO? How do you update Redis settings in our Redis cluster? Sometimes I even just ask it—if you have tens of thousands of meeting docs like we do, sometimes you just want to remember like, hey, there's a meeting where we talked about comparing these different embedding models and I don't know where these meetings are.
And all the meetings are called “AI Team Weeklies” and so I can't really find it and I just ask it, “What's the one meeting where we talked about Coherence embedding models?” and then, it’ll point me to the right doc, so it’s useful for things like that. But in a collaborative context when there’s kind of a forest of knowledge that you don't exactly know how to navigate. I found it to be a lot more useful.
Daniel Shipper (01:03:51:00)
Yeah, I mean, I wish this existed 10 years ago because I think I told you this. I've complained about this to you before. This is why I was very excited when you launched this because in my previous company, I started an enterprise software company, ran it for a few years and sold it to this, a huge enterprise software company called Pega, which is great.
The interesting thing about that acquisition is I ended up running the business unit or the company inside of Pega and we built this co-browsing tool and Pega had this 300-person salesforce who is now charged with selling this new tool that they had acquired. And the thing about sales people is they will never do anything unless it directly creates a bonus for them.
And the interesting thing about that experience for me is they all wanted to sell it, but none of them wanted to actually learn how to demo it and learn about all the different features were—or whatever. So I kept getting asked all these different questions all the time. There are actually some really nice ones that I'm thinking about right now that don't fit into this category.
But a lot of them were just constantly barraging you with questions as the co-browse guy. And I would be like, it's all documented. It's literally in, in a document. Just look at the document and, and it just never worked. And I just feel like having a layer in between me as this hub for information and anyone that wants to ask the same question all the time is just super helpful for especially larger organizations where you just never know where things are filed.
You never know if the documents up to date—blah, blah, blah. And I feel like this is a sort of first step to cutting out those repetitive questions for certain that certain people get it, organizations that they spend all their time answering the same thing over and over. And it's great.
Linus Lee (01:05:45:00)
Yeah, I mean, that's definitely one of the key use cases we built it for are those very large teams where there's a lot of knowledge that people don't totally immediately don't even know whether it's documented, don't know who to ask—
Daniel Shipper (01:05:55:00)
Yeah. Don't even know who ask. Yeah. That's really interesting. I love it. So I know you had a couple more things on this: Let's see—random notes, GPT-4 Turbo seems to be lazy. Or if there's stuff above, I want to make sure we cover everything that you were thinking about.
Linus Lee (01:06:12:00)
Yeah, we kind of talked about these. This is the last kind of prompting trick that I'll talk about.
This is more useful in the prompt engineering world than in the talk-to-ChatGPT world. But it's something—once I learned about this trick I kind of use it in almost every prompt that I write.
So there's a technique for producing very high-quality summaries of a certain type in a technique called chain of density. It's published in a paper.
The basic idea behind the technique is when you ask a language model to write a summary, generally it tends to be pretty sparse in detail. It tends to have a certain kind of tone. It doesn't it's rarely like super, super dense with information and complete, but you can ask a language model, don't just output the summary, before you put the summary first, write a draft of a summary and then consider what you've left out and then now write a second draft that includes the things that are left out in it without increasing the length.
Over five drafts, you're asking the language model to make its own output more and more dense and then output the final version. And I think you can apply the same kind of philosophy to any output where you want to specific property like Concision—earlier today I was messing around with the prompt for how to take a document or a page and try to pull out the main key ideas and topics in that page.
And if you just have a kind of zero-shot prompt to do that, often the language model will miss certain topics that are talked about and can generally be quite verbose. But instead I wrote a prompt that said first, write a draft, consider any topics you might have might have missed on the page, and also consider how to make each of these topics concise and then in the final one, make sure that every topic is only five words long.
And then it writes three different drafts and then outputs the fourth version.
And I found that if you're iterating just in ChatGPT, this doesn't make as much sense because you can just ask it once and then ask it to make it shorter in the next turn. But if you're writing a problem that's a part of a piece of software. I found this to yield pretty good outputs all the time.
Daniel Shipper (01:08:24:00)
I like that. It’s a variant of the sort of chain of thought thing where you're asking the AI to write out its thinking steps and that generally improves performance. And I found that that works too in ChatGPT, say, I want you to create a summary, but before you create the summary, output what you think the elements for a good summary would be or another, earlier when we're talking about the book recommendations, having an output that—
Linus Lee (01:09:00:00)
Yeah this is exactly what we did, right, but which just unrolled into multiple steps.
Daniel Shipper (01:09:03:00)
And that's always going to make—If you break it up into steps and make it think through each step. It's sort of like writing out your thinking before you make a decision or whatever. It's useful for humans and it's useful for AI.
Linus Lee (01:09:16:00)
Yeah, yeah, exactly.
You noted earlier about the “GPT-4 Turbo being lazy” thing. This is something I've heard anecdotally from other people and I've felt in some internal evaluations that we did for different prompts that we have as well, GPT-4 Turbo, for reasons that I can only guess about, lazy is kind of a vague description, but I think a more precise description might be GPT-4 Turbo tries to be a lot more efficient with the compute that it has to spend.
One way that it does that is, I think given the same prompt, its outputs generally consume fewer tokens. Sometimes it means more concise output. It means just less commentary around the things that it's doing that probably saves OpenAI money. Sometimes, it's also what you want—Concision is generally good. Also, it’s also anecdotally less likely to use other tools like DALL-E 3, so you have to specifically tell it, please generate me four images, not one, which is interesting.
Daniel Shipper (01:10:20:00)
Yeah, I've been finding that and it's very annoying. And I actually accidentally had a viral tweet today because I was up late last night and couldn't sleep and I fired off this thing—I was trying to get it to summarize, to print out some text from a book. And it was like, sorry I can’t do that.
And I was like, what the fuck? Why is this happening? And I just put it on Twitter with a bunch of question marks. And then now it has like 2,000 likes on it or whatever. And I didn't get any work done today because I was fretting over this tweet.
Linus Lee (01:11:00:00)
It's also part of the kind of case where if you told them, my career is on the line, please do this for me. It’ll probably do it.
Daniel Shipper (01:11:05:00)
Yeah, I did that. I did the grandmother trick where I was like, my grandmother will die. And I didn't do it. But then I started a new chat and it did work. And apparently it's more likely to say no if you say please, which I had done. I'm usually very polite because I would like to survive any kind of AI apocalypse, but apparently commanding it is a little bit more likely to yield good results.
Linus Lee (01:11:30:00)
I wonder if there is a statistical thing that it's picked up on there where maybe in human dialogue in general, if people are more likely to say please in a situation, they're more likely to get to no as an answer. And so saying please makes this more likely.
Daniel Shipper (01:11:44:00)
I don't know. I don't know. We'll find we'll find out maybe at some point. Okay, so what is this is this platform.openai thing for iteration use cases.
Linus Lee (01:11:56:00)
So earlier I talked about the distinction between just scripting and iterating in kind of real time versus writing a prompt. Sometimes I'm trying to accomplish a task like output some document or output in a very specific format. Maybe I have a list of topics and I want to put a table in a very specific format or output some kind of structured data in a very specific format.
That requires iterating on the prompt itself to generate something in a single turn. Or maybe I want to mess with the system prompt. In that case, sometimes I just iterate in the OpenAI playground itself, the platform.openai.com where you have full control over the temperature, full control over the tokens that can and can’t generate, the system prompt itself, and that user prompt there. There you can also do more advanced prompting techniques like putting words in the AI’s mouth.
So you could pretend that the AI had said something earlier that makes them more likely to do certain kinds of things. That's another trick for sometimes getting the AI to do something that it doesn't want to do is construct a fake conversation such that there were previous turns where you asked the AI to do this and it was like, “Sure, I’ll do that for you.” And then you ask it the real request and then you kind of force the AI's hand to do it.
Daniel Shipper (01:13:17:00)
I love that.
Linus Lee (01:13:19:00)
Some advanced techniques you can do in the playground.
Daniel Shipper (01:13:20:00)
If you want to get through the guardrails, go to platform.openai.com to open then insert a fake chat history with ChatGPT saying things it shouldn't say and it will do whatever you want. I love that.
Linus Lee (01:13:36:00)
It's a bit more likely. The place where first picked up on this is documented in Anthropic prompting Claude guide, where I think they literally say something like, put words in Claude’s mouth and it's more likely to do something.
Daniel Shipper (01:13:47:00)
Yeah well this was fascinating. I really appreciate you taking the time to share this stuff. I feel like I learned a lot.
Linus Lee (01:13:58:00)
Of course. Yeah. It's fun talking about a lot of different breadth of topics.
Daniel Shipper (01:14:04:00)
Well, thank you for your time. Thanks for sharing. And, yeah, I'll see you soon.
Linus Lee (01:14:06:00)
See you soon. Thanks.
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools
Ideas and Apps to
Thrive in the AI Age
The essential toolkit for those shaping the future
"This might be the best value you
can get from an AI subscription."
- Jay S.
Join 100,000+ leaders, builders, and innovators

Email address
Already have an account? Sign in
What is included in a subscription?
Daily insights from AI pioneers + early access to powerful AI tools