AI Works Better When You Make It Pretend

In Michael Taylor’s work as a prompt engineer, he’s found that many of the issues he encounters in managing AI tools—such as their inconsistency, tendency to make things up, and lack of creativity—are ones he used to struggle with when he ran a marketing agency. It’s all about giving these tools the right context to do the job, whether they’re AI or human. This piece is the latest in his series Also True for Humans, about managing AIs like you'd manage people. Michael explores role prompting, a technique where you ask an LLM to role-play as a celebrity or expert in a specific field. Role prompting communicates to your AI coworkers what style of response you want—helping them better meet your subjective expectations.—Kate Lee

Subscribe to Every

During my first job out of college, I played on an office soccer team. One of my teammates was a former soccer pro. He would run circles around everyone else on the field, and whenever he deigned to pass the ball to one of us mortals, we’d inevitably mess up the opportunity. In his frustration, he would always say, “Just be better.” It became a running joke, because, of course, you can’t make your teammates better just by telling them to.Except now you can—that is, if your teammate is an AI. Telling ChatGPT, “You are an expert at [relevant field],” regularly leads to notable performance gains.

If you’ve spent any time working with AI tools, you will have likely encountered examples of people asking the AI to role-play as an expert or celebrity in their prompts. After all, who wouldn’t want an AI version of Steve Jobs to help them brainstorm product ideas, or an AI Albert Einstein to help them do their homework?

The fact that by simply telling an AI to get in character, it gains new functionality associated with that persona, feels like magic. It’s reminiscent of a scene in The Matrix where Neo (played by Keanu Reeves) instantly learns to fight by downloading a program into his brain.

Source: The Matrix.

Just like a human actor delivering their lines after getting in character, AI assistants can play their part better when they know what role you want them to play. Most of the scientific papers exploring role prompting focus on improving LLM’s math scores, but in my experience, role prompting works best when two conditions are met:

What makes a good answer is subjective.
There’s a specific style you’re hoping to emulate.

Let’s review the science behind role prompting to understand why it works and look at a few examples of how to apply it. It’s one of the quickest and easiest ways to get the results you want out of AI—so long as you know what role you want it to play.

Helping AI get in character

Become a paid subscriber to Every to learn about:

The science behind getting AI "in character"
The subjective power of personas
Testing the limits of AI role-play

Subscribe to Every

Except now you can—that is, if your teammate is an AI. Telling ChatGPT, “You are an expert at [relevant field],” regularly leads to notable performance gains.

Source: The Matrix.

What makes a good answer is subjective.
There’s a specific style you’re hoping to emulate.

Helping AI get in character

I use role prompting in almost all of my prompts, particularly when the response will be scored based on subjective preferences.

I recently worked on a piece for which I experimented with automating product manager tasks for Lenny’s Newsletter, to which 700,000 product managers subscribe. I started each prompt with role-playing: “As a product manager for a major tech company similar to Google, Amazon, Microsoft, or Facebook…” This helped guide the AI model toward responses that use the right cultural references and business acronyms to impersonate a Silicon Valley product manager. In the experiments I ran for the piece, I was able to fool about 30 percent of people as to which answer was written by AI.

Below is an example of an AI-generated response to a common interview question for product manager jobs: “What are the most important metrics for DoorDash?” It was generated using role prompting and other prompt engineering techniques:

Source: Lenny’s Newsletter/Substack.

Role prompting doesn’t make the AI any smarter, of course, because the AI was already capable of expert-level responses. What it does do is communicate at what level or style you would like it to operate for this task. Ask ChatGPT to “explain it like I’m five,” and it’ll dumb things down. Ask ChatGPT to role-play as a professor, and you’ll get an undergraduate-level response. Tell ChatGPT to speak like a pirate, and thar she blows! No answer is more correct than the other—it just depends on what you’re looking for.

Source: Author’s screenshot.

Role prompting is a common technique because it’s easy to implement and yields immediate results with little effort. I covered it in my prompt engineering book, and both OpenAI and Anthropic advocate for role prompting in their prompt engineering guides. They recommend adding a role or persona in the system prompt (used to instruct the model how to behave, analogous to custom instructions in ChatGPT) when using their models as a software developer building AI applications.

Source: Anthropic.

The proof that role prompting works

Role prompting has been featured in at least 37 papers, and popular LLM prompt engineering frameworks take it for granted as a core technique. There is significant evidence that telling the model it is an expert will improve its ability to reason and improve its accuracy on math questions. The most effective application was a two-stage approach:

Construct a task-specific role-play prompt: From now on, you are an excellent math teacher…
Let the LLM respond before giving it the task: That’s great to hear! As your math teacher, I’ll do my best to explain mathematical concepts correctly…

The authors found that this prompt structure led to greater immersion in the role and increased accuracy by 12 percent across 100,000 algebra questions. The technique was shown to improve performance across a wider range of problem sets across small, medium, and large versions (7, 14, and 70 billion parameters, respectively) of the Llama 2 open-source LLM.

Source: Arxiv.

The strongest evidence for role prompting comes from an exhaustive review by University of Michigan researchers. They prompted Llama 2 with different social roles (e.g., “You are a lawyer”) across thousands of tasks taken from the Massive Multitask Language Understanding (MMLU) dataset, a set of multiple-choice questions spanning 57 academic subjects including mathematics, philosophy, law, and medicine. The researchers found that the right role could boost performance on a task by as much as 20 percent, though performance differences were not persistent across models—you have to do the work and test many combinations to find the right role for your task and model. The chart below shows there isn’t any correlation between what roles work on one model as compared to another, as the roles are all over the map when ranked for preference.

Source: Arxiv.

The results of other papers have been more mixed, with role prompting sometimes harming performance. The creators of the popular Learn Prompting course recently declared that role prompting doesn’t work. They hypothesized that newer models are likely already trained to act as an expert, so the technique is less effective than it used to be. To test their theory, they created one genius prompt ("You are a Harvard educated scientist...") and one idiot prompt (“You are a dumb person..."), and found no significant difference in performance. In one study, prepending, “You are a financial advisor,” to prompts was even shown to decrease accuracy on financial literacy tasks.

Source: Arxiv.

But these studies miss the point of role prompting: It’s not to turn the AI into a superhuman expert, but to get the model into the right mood to answer in the subjective style you want.

Most tasks can be done in many different styles or approaches and still be correct. The appropriate way to answer depends on the situation and the preferences of the audience. If you’re building an AI recipe app, you’ll get better engagement if it sounds more like Gordon Ramsey than Wikipedia. However, if the recipe app is for children, you’ll need to adopt a more age-appropriate persona. A surprising amount of what we define as good performance is subjective and unique to the use case and organization. Role prompting helps zero in on what is likely to pass initial vibe checks.

We also like to role-play

Generative AI models are designed as simulations of how the human brain works. Humans act differently based on the role they’re expected to play.

The ethically controversial Stanford prison experiment, conducted by Philip Zimbardo in 1971, showed that people quickly adopt and internalize roles they’re assigned, even if they’re done so arbitrarily. Participants who were randomly assigned the role of guard assumed superiority and started mistreating those given the role of prisoner—so much that the experiment had to be stopped.

While there are differences between humans and AI, these models have learned to role-play from the countless examples on the internet of humans doing so, like with movie scripts, fan fiction, and social media. Psychologist Walter Mischel's work on situational behavior in the 1960s demonstrates how human behavior can vary significantly based on the context and expectations of a situation. It’s no wonder LLMs have learned to change their responses based on the role dictated by the prompt.

As LLMs gain access to the ability to recall past conversations, we should expect this effect to deepen. For example, Robert Rosenthal’s study, "Pygmalion in the Classroom," showed how teacher expectations could influence student performance—becoming a self-fulfilling prophecy over long periods. You can imagine that ChatGPT with a memory function would start to behave more intelligently for users who steer it toward smarter answers and exhibit learned helplessness (giving up solvable tasks due to a history of repeated failure) when repeatedly faced with situations in which it makes mistakes.

To this end, I regularly go back and edit my previous prompt input when ChatGPT makes a mistake, rather than responding to tell ChatGPT that it made a mistake. By correcting my prompt and removing any mention of mistakes from the message feed, I find that ChatGPT is less likely to get stuck in a loop of making the same mistake over and over again, and failing to make progress.

Can a chatbot name products in the style of Elon Musk?

While academics and researchers often test role prompting on math questions, it’s more useful on subjective tasks for which it’s harder to determine the right answer, if there even is one. Take, for example, brainstorming names for a new product. Ask 10 people to come up with a name for a new product, and you’ll get 10 different answers. Our preferences for product names stem from our subjective experiences, and role prompting can help you express that in a prompt.

Let’s say I love the way Elon Musk names his companies and products—SpaceX, the Boring Company, Cybertruck, Not-a-Flamethrower—and want to emulate his style. I can start my prompt with, “You are Elon Musk, and you are brainstorming names for new products,” and give several examples of Musk-sounding names. I’ll get back a response that is recognizably Muskian, rather than a more standard answer without the Musk role-play.

Source: Author’s screenshot (ChatGPT-4o, called through the API in a Jupyter Notebook, running through the Cursor IDE).

One caveat in this case is that role prompting doesn’t work that well on its own: I find that I have to add multiple examples of what I mean for it to get in character. I asked ChatGPT for a list of fake product ideas, and in a new chat session I fed it the imaginary product names based on how I imagined Musk would name them. Without this, the LLM is far less reliable in following its assigned role.

Source: Author’s screenshot.

With harder-to-define tasks like this, evaluation becomes tiresome. I don’t want to have to run this prompt hundreds of times and manually review the responses to check how reliable it is. In cases like these I reach for synthetic evaluation metrics and ask an LLM to judge the results. With a simple follow-up prompt—“As an AI language model with knowledge of Elon Musk's style and naming conventions, evaluate the following product names and determine if they sound like they were created by Elon Musk”— I’m able to get an estimate on “Elon Musk likelihood,” with an explanation of why the model gave the rating it did. If you really were building an Elon Musk product name generator, you could add up these scores across hundreds of tests to see what role prompting techniques and provided examples best increased the Muskiness of the names generated.

Using role prompting plus providing four examples of Musk-like names gives me an average Elon Musk likelihood of 88 percent, as compared to only 25 percent without the role-play and examples.

Source: Author’s screenshot.

There are no rules, only principles

As LLMs get smarter, we probably won’t have to try as hard to get an expert response. OpenAI and its competitors are trying to build the most helpful assistants possible, so it stands to reason that the default response will trend toward superhuman over time, even if it can be a little dumb right now.

The evidence on role prompting is mixed, and most of the analysis has been done on previous models, such as OpenAI’s GPT-3.5. If we run those experiments with GPT-4, they may no longer replicate. In my book, role prompting falls under the “give direction” principle, and giving direction will still be useful no matter how smart the models get.

I still use role prompting in all of my prompts, but not because I expect a 20 percent performance gain. Defining the style of response I want from the AI makes the results subjectively better for the tasks I’m asking it to do. While the technique of role prompting may eventually not be needed, AIs can’t read your mind (at least, not until Musk releases Neuralink). It’s always going to be important to communicate what you want to an LLM, and role prompting is a simple way to express your preferences. I still teach it as a core prompting technique as one of the easiest ways to help novices improve their prompts.

For expert prompt engineers, I recommend rigorously testing if role prompting helps or not, and if it does, what role works best. It’s clear that it helps in some cases, especially where the judgment criteria are subjective or there’s a house style to which to adhere. In my testing, role prompting works far better when you also provide examples of the style you want, revealing your preferences both explicitly and implicitly. The AI won’t judge you on your preferences, even if you force it to explain quantum mechanics to you like a pirate.

Michael Taylor is a freelance prompt engineer, the creator of the top prompt engineering course on Udemy, and the coauthor of Prompt Engineering for Generative AI. He previously built Ladder, a 50-person marketing agency based out of New York and London.

To read more essays like this, subscribe to Every, and follow us on X at @every and on LinkedIn.