Prompt Engineering: A Practical Example

Prompt Engineering: A Practical Example

by Martin Breuss Mar 25, 2024 data-science intermediate

You’ve used ChatGPT, and you understand the potential of using a large language model (LLM) to assist you in your tasks. Maybe you’re already working on an LLM-supported application and have read about prompt engineering, but you’re unsure how to translate the theoretical concepts into a practical example.

Your text prompt instructs the LLM’s responses, so tweaking it can get you vastly different output. In this tutorial, you’ll apply multiple prompt engineering techniques to a real-world example. You’ll experience prompt engineering as an iterative process, see the effects of applying various techniques, and learn about related concepts from machine learning and data engineering.

In this tutorial, you’ll learn how to:

  • Work with OpenAI’s GPT-3.5 and GPT-4 models through their API
  • Apply prompt engineering techniques to a practical, real-world example
  • Use numbered steps, delimiters, and few-shot prompting to improve your results
  • Understand and use chain-of-thought prompting to add more context
  • Tap into the power of roles in messages to go beyond using singular role prompts

You’ll work with a Python script that you can repurpose to fit your own LLM-assisted task. So if you’d like to use practical examples to discover how you can use prompt engineering to get better results from an LLM, then you’ve found the right tutorial!

Take the Quiz: Test your knowledge with our interactive “Practical Prompt Engineering” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Practical Prompt Engineering

In this quiz, you'll test your understanding of prompt engineering techniques with large language models (LLMs) like GPT-3.5 and GPT-4. You'll revisit how to work with OpenAI's API, apply prompt engineering techniques to practical examples, and use various strategies to improve your results.

Understand the Purpose of Prompt Engineering

Prompt engineering is more than a buzzword. You can get vastly different output from an LLM when using different prompts. That may seem obvious when you consider that you get different output when you ask different questions—but it also applies to phrasing the same conceptual question differently. Prompt engineering means constructing your text input to the LLM using specific approaches.

You can think of prompts as arguments and the LLM as the function to which you pass these arguments. Different input means different output:

Python
>>> def hello(name):
...     print(f"Hello, {name}!")
...
>>> hello("World")
Hello, World!
>>> hello("Engineer")
Hello, Engineer!

While an LLM is much more complex than the toy function above, the fundamental idea holds true. For a successful function call, you’ll need to know exactly which argument will produce the desired output. In the case of an LLM, that argument is text that consists of many different tokens, or pieces of words.

The field of prompt engineering is still changing rapidly, and there’s a lot of active research happening in this area. As LLMs continue to evolve, so will the prompting approaches that will help you achieve the best results.

In this tutorial, you’ll cover some prompt engineering techniques, along with approaches to iteratively developing prompts, that you can use to get better text completions for your own LLM-assisted projects:

There are more techniques to uncover, and you’ll also find links to additional resources in the tutorial. Applying the mentioned techniques in a practical example will give you a great starting point for improving your LLM-supported programs. If you’ve never worked with an LLM before, then you may want to peruse OpenAI’s GPT documentation before diving in, but you should be able to follow along either way.

Get to Know the Practical Prompt Engineering Project

You’ll explore various prompt engineering techniques in service of a practical example: sanitizing customer chat conversations. By practicing different prompt engineering techniques on a single real-world project, you’ll get a good idea of why you might want to use one technique over another and how you can apply them in practice.

Imagine that you’re the resident Python developer at a company that handles thousands of customer support chats on a daily basis. Your job is to format and sanitize these conversations. You also help with deciding which of them require additional attention.

Collect Your Tasks

Your big-picture assignment is to help your company stay on top of handling customer chat conversations. The conversations that you work with may look like the one shown below:

Text
[support_tom] 2023-07-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2023-07-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2023-07-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2023-07-24T10:04:03+00:00 : Blast! You're right!

You’re supposed to make these text conversations more accessible for further processing by the customer support department in a few different ways:

  • Remove personally identifiable information.
  • Remove swear words.
  • Clean the date-time information to only show the date.

The swear words that you’ll encounter in this tutorial won’t be spicy at all, but you can consider them stand-ins for more explicit phrasing that you might find out in the wild. After sanitizing the chat conversation, you’d expect it to look like this:

Text
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

Sure—you could handle it using Python’s str.replace() or show off your regular expression skills. But there’s more to the task than immediately meets the eye.

Your project manager isn’t a technical person, and they stuck another task at the end of this list. They may think of the task as a normal continuation of the previous tasks. But you know that it requires an entirely different approach and technology stack:

Mark the conversations as “positive” or “negative.”

That task lies in the realm of machine learning, namely text classification, and more specifically sentiment analysis. Even advanced regex skills won’t get you far in this challenge.

Additionally, you know that the customer support team that you’re preparing the data for will want to continue working on it programmatically. Plain text isn’t necessarily the best format for doing that. You want to do work that’s useful for others, so you add yet another stretch goal to your growing list of tasks:

Format the output as JSON.

This task list is quickly growing out of proportion! Fortunately, you’ve got access to the OpenAI API, and you’ll employ the help of their LLM to solve all of these challenges.

One of the impressive features of LLMs is the breadth of tasks that you can use them for. So you’ll cover a lot of ground and different areas of use. And you’ll learn how to tackle them all with prompt engineering techniques.

Prepare Your Tools

To follow along with this tutorial, you’ll need to know how to run a Python script from your command-line interface (CLI), and you’ll need an API key from OpenAI.

You’ll focus on prompt engineering, so you’ll only use the CLI app as a tool to demonstrate the different techniques. However, if you want to understand the code that you’ll be using, then it’ll help to have some experience with Python classes, defining your own Python functions, the name-main idiom, and using Python to interact with web APIs.

To get started, go ahead and download the example Python script that you’ll work with throughout the tutorial:

The codebase represents a light abstraction layer on top of the OpenAI API and exposes one function called get_chat_completion() that’ll be of primary interest for the tutorial. The function interacts with OpenAI’s /chat/completions endpoint to generate responses using different models, such as GPT-3.5-Turbo and GPT-4. You’ll explore both models, starting with GPT-3.5-Turbo, and eventually you’ll move on to the more powerful GPT-4 model.

Most of the code in app.py revolves around setting up and fetching the settings from settings.toml.

The script also parses a command-line argument to allow you to conveniently specify an input file. The input files that you’ll primarily work with contain LLM-generated customer support chat conversations, but feel free to reuse the script and provide your own input text files for additional practice.

The heart of the codebase is settings.toml. This TOML settings file hosts the prompts that you’ll use to sharpen your prompt engineering skills. It contains different prompts formatted in the human-readable settings format TOML.

Keeping your prompts in a dedicated settings file can help to put them under version control, which means you can keep track of different versions of your prompts, which will inevitably change during development.

Your Python script will read the prompts from settings.toml, assemble them meaningfully, and send an API requests to OpenAI.

Alternatively, you can also run all the text prompts directly in the OpenAI playground, which will give you the same functionality as the script. You could even paste the prompts into the ChatGPT interface. However, the results will vary because you’ll be interacting with a different model and won’t have the opportunity to change certain settings.

Set Up the Codebase

Make sure that you’re on Python 3.11 or higher, so that you can interact with TOML files using the standard library. If you haven’t downloaded the codebase yet, go ahead and click the link below:

Unzip the folder and use your CLI to navigate into the folder. You’ll see a handful of files. The most important ones are app.py and settings.toml:

./
├── LICENSE
├── README.md
├── app.py
├── chats.txt
├── requirements.txt
├── sanitized-chats.txt
├── sanitized-testing-chats.txt
├── settings.toml
├── settings-final.toml
└── testing-chats.txt

The file settings.toml contains placeholders for all the prompts that you’ll use to explore the different prompt engineering techniques. That’s the file that you’ll primarily work with, so open it up. You’ll use it to iteratively develop the prompts for your application.

The file app.py contains the Python code that ties the codebase together. You’ll run this script many times throughout the tutorial, and it’ll take care of pulling your prompts from settings.toml.

After you’ve downloaded and unpacked the codebase, create and activate a new virtual environment. Then use pip to install the required dependencies:

Shell
(venv) $ python -m pip install -r requirements.txt

Note that this tutorial uses openai version 1.13.3. OpenAI may introduce breaking changes between API versions, so make sure that you install the pinned dependencies from the requirements file. Then you’ll be able to work through the tutorial without any hiccups.

To run the script successfully, you’ll need an OpenAI API key with which to authenticate your API requests. Make sure to keep that key private and never commit it to version control! If you’re new to using API keys, then read up on best practices for API key safety.

To integrate your API key with the script and avoid leaking it publicly, you can export the API key as an environment variable:

Shell
(venv) $ export OPENAI_API_KEY="your-api-key"

After you’ve added your API key as an environment variable named OPENAI_API_KEY, the script will automatically pick it up during each run.

At this point, you’ve completed the necessary setup steps. You can now run the script using the command line and provide it with a file as additional input text:

Shell
(venv) $ python app.py chats.txt

The command shown above combines the customer support chat conversations in chats.txt with prompts and API call parameters that are saved in settings.toml, then sends a request to the OpenAI API. Finally, it prints the resulting text completion to your terminal.

From now on, you’ll primarily make changes in settings.toml. The code in app.py is just here for your convenience, and you won’t have to edit that file at all. The changes in the LLM’s output will come from changing the prompts and a few of the API call arguments.

Freeze Responses by Setting the Temperature to Zero

When you’re planning to integrate an LLM into a product or a workflow, then you’ll generally want deterministic responses. The same input should give you the same output. Otherwise, it gets hard to provide a consistent service or debug your program if something goes wrong.

Because of this, you’ll want to set the temperature argument of your API calls to 0. This value will mean that you’ll get mostly deterministic results.

LLMs do text completion by predicting the next token based on the probability that it follows the previous tokens. Higher temperature settings will introduce more randomness into the results by allowing the LLM to pick tokens with lower probabilities. Because there are so many token selections chained one after one the other, picking one different token can sometimes lead to vastly different results.

If you use the LLM to generate ideas or alternative implementations of a programming task, then higher values for temperature might be interesting. However, they’re generally undesirable when you build a product.

In the example codebase, you can adjust temperature right inside your settings.toml file:

TOML settings.toml
[general]
chat_models = ["gpt-3.5-turbo", "gpt-4"]
model = "gpt-3.5-turbo"
temperature = 0

The initial value is set at 0. All the examples in this tutorial assume that you leave temperature at 0 so that you’ll get mostly deterministic results. If you want to experiment with how a higher temperature changes the output, then feel free to play with it by changing the value for temperature in this settings file.

It’s important to keep in mind that you won’t be able to achieve true determinism with the current LLM models offered by OpenAI even if you keep temperature at 0:

An edge-case in GPT-3 with big implications: Inference is non-deterministic (even at temperature=0) when top-2 token probabilities are <1% different. So temperature=0 output is very close to deterministic, but actually isn’t. Worth remembering. (Source)

So, while you can’t entirely guarantee that the model will always return the same result, you can get much closer by setting temperature to 0.

Another approach that improves determinism in the results is to set a value for the seed parameter. The provided code sets the seed to 12345. However, this only has an effect on some of the models.

Start Engineering Your Prompts

Now that you have an understanding of prompt engineering and the practical project that you’ll be working with, it’s time to dive into some prompt engineering techniques. In this section, you’ll learn how to apply the following techniques to your prompts to get the desired output from the language model:

  • Zero-shot prompting: Giving the language model normal instructions without any additional context
  • Few-shot prompting: Conditioning the model on a few examples to boost its performance
  • Using delimiters: Adding special tokens or phrases to provide structure and instructions to the model
  • Detailed, numbered steps: Breaking down a complex prompt into a series of small, specific steps

By practicing these techniques with the customer chat conversation example, you’ll gain a deeper understanding of how prompt engineering can enhance the capabilities of language models and improve their usefulness in real-world applications.

Describe Your Task

You’ll start your prompt engineering journey with a concept called zero-shot prompting, which is just a fancy way of saying that you’re asking a question or describing a task:

Remove personally identifiable information, only show the date, and replace all swear words with “😤”

This task description focuses on the requested steps for sanitizing the customer chat conversation and literally spells them out. This is the prompt that’s currently saved as instruction_prompt in the settings.toml file:

TOML settings.toml
instruction_prompt = """
Remove personally identifiable information, only show the date,
and replace all swear words with "😤"
"""

If you run the Python script and provide the support chat file as an argument, then it’ll send this prompt together with the content of chats.txt to OpenAI’s text completion API:

Shell
(venv) $ python app.py chats.txt

If you correctly installed the dependencies and added your OpenAI API key as an environment variable, then all you need to do is wait until you see the API response pop up in your terminal:

Text
- 2023-07-24: 😤
- 2023-06-15: 😤
- 2023-05-05: 😤
- 2023-06-18: 😤
- 2023-06-29: 😤
- 2023-05-04: 😤
- 2023-06-15: 😤
- 2023-06-24: 😤

In the example output, you can see that the prompt that you provided didn’t do a good job tackling the tasks. And that’s putting it gently! It picked up that it should do something with the huffing emoji and reduce the ISO date-time to only a date. Your results might not have tackled all of that. Overall, nearly all of the work is left undone and the output is useless.

If you’re new to interacting with LLMs, then this may have been a first attempt at outsourcing your development work to the text completion model. But these initial results aren’t exactly exhilarating.

So you’ve described the task in natural language and gotten a bad result. But don’t fret—throughout the tutorial you’ll learn how you can get more useful responses for your task.

One way to do that is by increasing the number of shots, or examples, that you give to the model. When you’ve given the model zero shots, the only way to go is up! That’s why you’ll improve your results through few-shot prompting in the next section.

Use Few-Shot Prompting to Improve Output

Few-shot prompting is a prompt engineering technique where you provide example tasks and their expected solutions in your prompt. So, instead of just describing the task like you did before, you’ll now add an example of a chat conversation and its sanitized version.

Open up settings.toml and change your instruction_prompt by adding such an example:

TOML settings.toml
instruction_prompt = """
Remove personally identifiable information, only show the date,
and replace all swear words with "😤"

Example Input:
[support_tom] 2023-07-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2023-07-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2023-07-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2023-07-24T10:04:03+00:00 : Blast! You're right!

Example Output:
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!
"""

Once you’ve applied the change, give the LLM another chance to sanitize the chat conversations for you by running the script again:

Shell
(venv) $ python app.py chats.txt

You’ll have to wait for the LLM to predict all the tokens. When it’s done, you’ll see a fresh response pop up in your terminal:

Text
...

[Agent] 2023-05-05 : Hi, how can I help you today?
[Customer] 2023-05-05 : MY 😤 ORDER STILL HASN'T ARRIVED AND IT'S BEEN A WEEK!!!
[Agent] 2023-05-05 : I'm sorry to hear that, Karen. Let's look into this issue.
[Agent] 2023-05-05 : Can you please provide your order number so I can check the status for you?
[Customer] 2023-05-05 : Fine, it's 9876543.
[Agent] 2023-05-05 : Thank you, Karen. I see there was a delay in shipping. Your order will arrive within the next two days.

...

Okay, great! This time at least the LLM didn’t eat up all the information that you passed to it without giving anything useful back!

This time, the model tackled some of the tasks. For example, it sanitized the names in square brackets. However, the names of the customers are still visible in the actual conversations. It also didn’t censor the order numbers or the email address.

The model probably didn’t sanitize any of the names in the conversations or the order numbers because the chat that you provided didn’t contain any names or order numbers. In other words, the output that you provided didn’t show an example of redacting names, order numbers, or email addresses in the conversation text.

Here you can see how important it is to choose good examples that clearly represent the output that you want.

So far, you’ve provided one example in your prompt. To cover more ground, you’ll add another example so that this part of your prompt truly puts the few in few-shot prompting:

TOML settings.toml
instruction_prompt = """
Remove personally identifiable information, only show the date,
and replace all swear words with "😤"

Example Inputs:
[support_tom] 2023-07-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2023-07-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2023-07-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2023-07-24T10:04:03+00:00 : Blast! You're right!

[support_amy] 2023-06-15T14:45:35+00:00 : Hello! How can I assist you today?
[greg_stone] 2023-06-15T14:46:20+00:00 : I can't seem to find the download link for my purchased software.
[support_amy] 2023-06-15T14:47:01+00:00 : No problem, Greg. Let me find that for you. Can you please provide your order number?
[greg_stone] 2023-06-15T14:47:38+00:00 : It's 1245789. Thanks for helping me out!

Example Outputs:
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!
"""

You added a second example that contains both a customer name as well as an order number in the chat text body. The example of a sanitized chat shows both types of sensitive data replaced with a sequence of asterisks (****). Now you’ve given the LLM a good example to model.

After editing instruction_prompt in settings.toml, run your script again and wait for the response to print to your terminal:

Text
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

Wait? Where did most of the output go? You probably expected to see better results, but it looks like you’re getting only two of the conversations back this time!

You’ve added more text to your prompt. At this point, the task instructions probably make up proportionally too few tokens for the model to consider them in a meaningful way. The model lost track of what it was supposed to do with the text that you provided.

Adding more examples should make your responses stronger instead of eating them up, so what’s the deal? You can trust that few-shot prompting works—it’s a widely used and very effective prompt engineering technique. To help the model distinguish which part of your prompt contains the instructions that it should follow, you can use delimiters.

Use Delimiters to Clearly Mark Sections of Your Prompt

If you’re working with content that needs specific inputs, or if you provide examples like you did in the previous section, then it can be very helpful to clearly mark specific sections of the prompt. Keep in mind that everything you write arrives to an LLM as a single prompt—a long sequence of tokens.

You can improve the output by using delimiters to fence and label specific parts of your prompt. In fact, if you’ve been running the example code, then you’ve already used delimiters to fence the content that you’re reading from file.

The script adds the delimiters when assembling the prompt in app.py:

Python app.py
 1# ...
 2
 3def _assemble_chat_messages(content: str) -> list[dict]:
 4    """Combine all messages into a well-formatted list of dicts."""
 5    messages = [
 6        {"role": "system", "content": SETTINGS["prompts"]["role_prompt"]},
 7        {"role": "user", "content": SETTINGS["prompts"]["negative_example"]},
 8        {"role": "system", "content": SETTINGS["prompts"]["negative_reasoning"]},
 9        {"role": "assistant", "content": SETTINGS["prompts"]["negative_output"]},
10        {"role": "user", "content": SETTINGS["prompts"]["positive_example"]},
11        {"role": "system", "content": SETTINGS["prompts"]["positive_reasoning"]},
12        {"role": "assistant", "content": SETTINGS["prompts"]["positive_output"]},
13        {"role": "user", "content": f">>>>>\n{content}\n<<<<<"},
14        {"role": "user", "content": SETTINGS["prompts"]["instruction_prompt"]},
15    ]
16    return messages

In line 13, you wrap the chat content in between >>>>> and <<<<< delimiters. Marking parts of your prompt with delimiters can help the model keep track of which tokens it should consider as a single unit of meaning.

You’ve seen in the previous section that missing delimiters can lead to unexpected results. You might receive less output than expected, like in the previous example, or an empty response. But you might also receive output that’s quite different from what you want! For example, imagine that the chat content that you’re reformatting contains a question at the end, such as:

Can you give me your order number?

If this question is the last line of your prompt without delimiters, then the LLM might continue the imaginary chat conversation by answering the question with an imaginary order number. Give it a try by adding such a sentence to the end of your current prompt!

Delimiters can help to separate the content and examples from the task description. They can also make it possible to refer to specific parts of your prompt at a later point in the prompt.

A delimiter can be any sequence of characters that usually wouldn’t appear together, for example:

  • >>>>>
  • ====
  • ####

The number of characters that you use doesn’t matter too much, as long as you make sure that the sequence is relatively unique. Additionally, you can add labels just before or just after the delimiters:

  • START CONTENT>>>>> content <<<<<END CONTENT
  • ==== START content END ====
  • #### START EXAMPLES examples #### END EXAMPLES

The exact formatting also doesn’t matter so much. As long as you mark the sections so that a casual reader could understand where a unit of meaning begins and ends, then you’ve properly applied delimiters.

Edit your prompt in settings.toml to add a clear reference to your delimited content, and also include a delimiter for the examples that you’ve added:

TOML settings.toml
instruction_prompt = """Remove personally identifiable information
from >>>>>CONTENT<<<<<, only show the date,
and replace all swear words with "😤"

#### START EXAMPLES

------ Example Inputs ------
[support_tom] 2023-07-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2023-07-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2023-07-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2023-07-24T10:04:03+00:00 : Blast! You're right!

[support_amy] 2023-06-15T14:45:35+00:00 : Hello! How can I assist you today?
[greg_stone] 2023-06-15T14:46:20+00:00 : I can't seem to find the download link for my purchased software.
[support_amy] 2023-06-15T14:47:01+00:00 : No problem, Greg. Let me find that for you. Can you please provide your order number?
[greg_stone] 2023-06-15T14:47:38+00:00 : It's 1245789. Thanks for helping me out!

------ Example Outputs ------
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

#### END EXAMPLES
"""

With these adaptations to your instruction_prompt, you now specifically reference the content as >>>>>CONTENT<<<<< in your task description. These delimiters match the delimiters that the code in app.py adds when assembling the prompt.

You’ve also delimited the examples that you’re providing with #### START EXAMPLES and #### END EXAMPLES, and you differentiate between the inputs and expected outputs using multiple dashes (------) as delimiters.

While delimiters can help you to get better results, in this case your output is quite similar to before:

Text
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

It’s noticeable that the model only shows the two example data that you passed as examples. Could it be that your prompt leads to something similar like overfitting? Using the actual data that you want to sanitize as your training data is, anyway, not a good idea, so in the next section, you’ll make sure to change that.

In this section, you’ve learned how you can clarify the different parts of your prompt using delimiters. You marked which part of the prompt is the task description and which part contains the customer support chat conversations, as well as the examples of original input and expected sanitized output.

Test Your Prompt Across Different Data

So far, you’ve created your few-shot examples from the same data that you also run the sanitation on. This means that you’re effectively using your test data to provide context to the model. Mixing training, validation, and testing data is a bad practice in machine learning. You might wonder how well your prompt generalizes to different input.

To test this out, run the script another time with the same prompt using the second file that contains chat conversations, testing-chats.txt. The conversations in this file contain different names, and different—soft—swear words:

Shell
(venv) $ python app.py testing-chats.txt

You’ll keep running your script using testing-chats.txt moving forward, unless indicated differently.

Once you’ve waited for the LLM to generate and return the response, you’ll notice that the result isn’t very satisfying:

Text
>>>>>
[support_johnny] 2023-07-15: Hello! What can I help you with today?
[becky_h] 2023-07-15: Hey, my promo code isn't applying the discount in my cart.
[support_johnny] 2023-07-15: My apologies for the trouble, Becky. Could you tell me the promo code you're trying to use?
[becky_h] 2023-07-15: It's "SAVE20".

[support_peter] 2023-07-24: Good day! How can I help you?
[lucy_g] 2023-07-24: Hi "Peter", I can't update my darn credit card information. Do you want my darn money or not?
[support_peter] 2023-07-24: I'm sorry for the inconvenience, Lucy. Can you please confirm your account's email?
[lucy_g] 2023-07-24: Sure, you have all my darn data already anyways. It's lucy.g@email.com.

...

The model now understands that you meant the examples as examples to follow when applying edits and gives you back all of the new input data. However, it didn’t do a great job following the instructions.

The model didn’t identify new swear words and didn’t replace them. The model also didn’t redact the order numbers, nor did it anonymize the names. It looks like it only managed to reformat your date strings.

So your engineered prompt currently doesn’t work well, and generalizes even worse. If you built a pipeline based on this prompt, where new chats could contain new customer names, then the application would probably continue to perform poorly. How can you fix that?

You’ve grown your prompt significantly by providing more examples, but your task description is still largely just the question that you wrote right at the beginning. To get better results, you’ll need to do some prompt engineering on the task description as well.

Describe Your Request in Numbered Steps

If you break up your task instructions into a numbered sequence of small steps, then the model is a lot more likely to produce the results that you’re looking for.

Go back to your prompt in settings.toml and break your initial task description into more granular, specific substeps:

TOML settings.toml
instruction_prompt = """
Sanitize the text provided in >>>CONTENT<<< in multiple steps:

1. Replace personally identifiable information (customer names, agent names, email addresses, order numbers) with `****`
2. Replace names in [] with "Agent" and "Client", respectively
3. Replace the date-time information to only show the date in the format YYYY-mm-dd
4. Replace all soft and hard swear words with the following emoji: "😤"

#### START EXAMPLES

------ Example Inputs ------
[support_tom] 2023-07-24T10:02:23+00:00 : What can I help you with?
[johndoe] 2023-07-24T10:03:15+00:00 : I CAN'T CONNECT TO MY BLASTED ACCOUNT
[support_tom] 2023-07-24T10:03:30+00:00 : Are you sure it's not your caps lock?
[johndoe] 2023-07-24T10:04:03+00:00 : Blast! You're right!

[support_amy] 2023-06-15T14:45:35+00:00 : Hello! How can I assist you today?
[greg_stone] 2023-06-15T14:46:20+00:00 : I can't seem to find the download link for my purchased software.
[support_amy] 2023-06-15T14:47:01+00:00 : No problem, Greg. Let me find that for you. Can you please provide your order number?
[greg_stone] 2023-06-15T14:47:38+00:00 : It's 1245789. Thanks for helping me out!

------ Example Outputs ------
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

#### END EXAMPLES
"""

With these step-by-step instructions in place, you’re ready for another run of your script and another inspection of the newly generated output:

Text
...

[Agent] 2023-07-24 : Good day! How can I help you?
[Client] 2023-07-24 : Hi "😤", I can't update my darn credit card information. Do you want my 😤 money or not?
[Agent] 2023-07-24 : I'm sorry for the inconvenience, ****. Can you please confirm your account's email?
[Client] 2023-07-24 : Sure, you have all my 😤 data already anyways. It's ****.

[Agent] 2023-08-13 : Good morning! How may I assist you?
[Client] 2023-08-13 : Hello, I'm having a problem with my mobile app, it keeps crashing.
[Agent] 2023-08-13 : I'm sorry to hear that, ****. Could you tell me what device you're using?
[Client] 2023-08-13 : I have an iPhone 11.

[Agent] 2023-08-30 : Good evening! How may I assist you today?
[Client] 2023-08-30 : Hi ****, I've forgotten my 😤 password and I can't login into my account.
[Agent] 2023-08-30 : I'm sorry for the trouble, ****. Could you confirm your email address so we can reset your password?
[Client] 2023-08-30 : Definitely, it's ****.

...

That’s a significant improvement! The model managed to follow the pattern of replacing the names in square brackets with [Agent] and [Customer], respectively. It correctly identified some new swear words and replaced them with the huffing emoji. The model also redacted the order numbers, and anonymized the names in the conversation texts.

Often, one of the best approaches to get better results from an LLM is to make your instructions more specific.

Framing your tasks in even smaller and more specific steps, will generally get you better results. Don’t shy away from some repetition:

Increasing the specificity of your instructions, and introducing numbered steps, helped you create a well-performing prompt. Your prompt successfully removes personally identifiable information from the conversations, redacts swear words, and reformats the ISO date-time stamp, as well as the usernames.

You could consider your initial task as completed, but there’s more that you want to do, and more prompt engineering techniques to explore. You also know that there are newer models that you could work with, and your success has further piqued your curiosity. It’s time to switch to a different LLM, see how that influences your output, and then continue exploring other techniques.

Perform Chat Completions With GPT-4

You’ve decided to switch to an even more powerful LLM, GPT-4. In the rest of this tutorial, you’ll use GPT-4 to continue exploring other important prompt engineering techniques:

  1. Role prompting: Using a system message to set the tone of the conversation, and using different roles to give context through labeling
  2. Chain-of-thought prompting (CoT): Giving the model time to think by prompting it to reason about a task, then including the reasoning in the prompt

You’ll also use GPT-4 to classify the sentiment of each chat conversation and structure the output format as JSON.

Switch to a Different Model

If you’re working with the provided script, then all you need to do is pick a chat model from chat_models in settings.toml and use it as the new value for model:

TOML settings.toml
[general]
chat_models = ["gpt-3.5-turbo", "gpt-4"]
model = "gpt-4"

Changing these settings will send your request to a different model. Like before, it’ll assemble your prompt in the way necessary for a /chat/completions endpoint request, make that request for you, and print the response to your terminal.

For the rest of this tutorial, you’ll work with OpenAI’s latest version of the GPT-4 model. If you don’t have access to this model, then you can instead keep working with the model that you’ve been working with so far.

If you’ve been following along using ChatGPT, then you’re stuck with whatever model currently powers it. Unless you’re a ChatGPT Plus subscriber, then you can change the model to GPT-4 on the website.

Without changing your prompt, run your script another time to see the different results of the text completion based only on using a different LLM:

Text
...

[Agent] 2023-07-24: Good day! How can I help you?
[Client] 2023-07-24: Hi "****", I can't update my darn credit card information. Do you want my darn money or not?
[Agent] 2023-07-24: I'm sorry for the inconvenience, ****. Can you please confirm your account's email?
[Client] 2023-07-24: Sure, you have all my darn data already anyways. It's ****.

[Agent] 2023-08-13: Good morning! How may I assist you?
[Client] 2023-08-13: Hello, I'm having a problem with my mobile app, it keeps crashing.
[Agent] 2023-08-13: I'm sorry to hear that, ****. Could you tell me what device you're using?
[Client] 2023-08-13: I have an iPhone 11.

[Agent] 2023-08-30: Good evening! How may I assist you today?
[Client] 2023-08-30: Hi ****, I've forgotten my friggin password and I can't login into my account.
[Agent] 2023-08-30: I'm sorry for the trouble, ****. Could you confirm your email address so we can reset your password?
[Client] 2023-08-30: Definitely, it's ****.

...

Some responses may be relatively similar to the ones with the older model. However, you can also expect to receive results like the one shown above, where most swear words are still present.

It’s important to keep in mind that developing for a specific model will lead to specific results, and swapping the model may improve or deteriorate the responses that you get. Therefore, swapping to a newer and more powerful model won’t necessarily give you better results straight away.

Additionally, it’s also helpful to keep in mind that API calls to larger models will generally cost more money per request. While it can be fun to always use the latest and greatest LLM, it may be worthwhile to consider whether you really need to upgrade to tackle the task that you’re trying to solve.

Add a Role Prompt to Set the Tone

There are some additional possibilities when interacting with the API endpoint that you’ve only used implicitly, but haven’t explored yet, such as adding role labels to a part of the prompt. In this section, you’ll use the "system" role to create a system message, and you’ll revisit the concept later on when you add more roles to improve the output.

Role prompting usually refers to adding system messages, which represent information that helps to set the context for upcoming completions that the model will produce. System messages usually aren’t visible to the end user. Keep in mind that the /chat/completions endpoint models were initially designed for conversational interactions.

You can also use system messages to set a context for your completion task. You’ll craft a bespoke role prompt in a moment. However, for this specific task, the role prompt is likely less important than it might be for some other tasks. To explore the possible influence of a role prompt, you’ll take a little detour and ask your model to play a role:

TOML settings.toml
role_prompt = """You are a 16th century villain poet who treats
customers with nothing but contempt.
Rephrase every line spoken by an Agent with your unique voice."""

You keep instruction_prompt the same as you engineered it earlier in the tutorial. Additionally, you now add text to role_prompt. The role prompt shown above serves as an example for the impact that a misguided prompt can have on your application.

Unleash, thou shall, the parchment’s code and behold the marvels unexpected, as the results may stir wonderment and awe:

Text
[Agent] 2023-07-15: Hail! What troubles bring you to my lair?
[Client] 2023-07-15: Greetings, my discount code seems to be as useless as a jester in a nunnery.
[Agent] 2023-07-15: A thousand pardons for this inconvenience, ****. Pray, what is this code you speak of?
[Client] 2023-07-15: It goes by the name "SAVE20".

[Agent] 2023-07-24: Good morrow! What can this humble servant do for you?
[Client] 2023-07-24: Listen here, "Peter", I can't seem to update my blasted credit card information. Do you desire my coin or not?
[Agent] 2023-07-24: My deepest regrets for this vexation, ****. Could you confirm the raven's address where we send our scrolls?
[Client] 2023-07-24: Indeed, you already possess all my secrets. It's ****.

...

As you can see, a role prompt can have quite an impact on the language that the LLM uses to construct the response. This is great if you’re building a conversational agent that should speak in a certain tone or language. And you can also use system messages to keep specific setup information present.

For completion tasks like the one that you’re currently working on, you might, however, not need this type of role prompt. For now, you could give it a common boilerplate phrase, such as You’re a helpful assistant.

To practice writing a role prompt—and to see whether you can release your customer chat conversations from the reign of that 16th century villain poet—you’ll craft a more appropriate role prompt:

TOML settings.toml
role_prompt = """You are a helpful assistant with a vast knowledge
of customer chat conversations.
You diligently complete tasks as instructed.
You never make up any information that isn't there."""

This role prompt is more appropriate to your use case. You don’t want the model to introduce randomness or to change any of the language that’s used in the conversations. Instead, you just want it to execute the tasks that you describe. Run the script another time and take a look at the results:

Text
[Agent] 2023-07-15: Hello! What can I help you with today?
[Client] 2023-07-15: Hey, my promo code isn't applying the discount in my cart.
[Agent] 2023-07-15: My apologies for the trouble, ****. Could you tell me the promo code you're trying to use?
[Client] 2023-07-15: It's "SAVE20".

[Agent] 2023-07-24: Good day! How can I help you?
[Client] 2023-07-24: Hi "****", I can't update my 😤 credit card information. Do you want my 😤 money or not?
[Agent] 2023-07-24: I'm sorry for the inconvenience, ****. Can you please confirm your account's email?
[Client] 2023-07-24: Sure, you have all my 😤 data already anyways. It's ****.

...

That looks much better again! Abide concealed in yonder bygone era, ye villainous poet!

As you can see from these examples, role prompts can be a powerful way to change your output. Especially if you’re using the LLM to build a conversational interface, then they’re a force to consider.

For some reason, GPT-4 seems to consistently pick [Client] over [Customer], even though you’re specifying [Customer] in the few-shot examples. You’ll eventually get rid of these verbose names, so it doesn’t matter for your use case.

However, if you’re determined and curious—and manage to prompt [Client] away—then share the prompt that worked for you in the comments.

In the final section of this tutorial, you’ll revisit using roles and see how you can employ the power of conversation to improve your output even in a non-conversational completion task like the one you’re working on.

Classify the Sentiment of Chat Conversations

At this point, you’ve engineered a decent prompt that seems to perform quite well in sanitizing and reformatting the provided customer chat conversations. To fully grasp the power of LLM-assisted workflows, you’ll next tackle the tacked-on request by your manager to also classify the conversations as positive or negative.

Start by saving both sanitized conversation files into new files that will constitute the new inputs for your sentiment classification task:

Shell
(venv) $ python app.py chats.txt > sanitized-chats.txt
(venv) $ python app.py testing-chats.txt > sanitized-testing-chats.txt

You could continue to build on top of the previous prompt, but eventually you’ll hit a wall when you’re asking the model to do too many edits at once. The classification step is conceptually distinct from the text sanitation, so it’s a good cut-off point to start a new pipeline.

The sanitized chat conversation files are also included in the example codebase:

Again, you want the model to do the work for you. All you need to do is craft a prompt that spells out the task at hand, and provide examples. You can also edit the role prompt to set the context for this new task that the model should perform:

TOML settings.toml
instruction_prompt = """
Classify the sentiment of each conversation in >>>>>CONTENT<<<<<
with "🔥" for negative and "✅" for positive:

#### START EXAMPLES

------ Example Inputs ------
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

------ Example Outputs ------
🔥
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!


[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

#### END EXAMPLES
"""
role_prompt = """You are a thoroughly trained machine learning
model that is an expert at sentiment classification.
You diligently complete tasks as instructed.
You never make up any information that isn't there."""

You can now run the script and provide it with the sanitized conversations in sanitized-testing-chats.txt that were the output of your previously engineered prompt:

Shell
(venv) $ python app.py sanitized-testing-chats.txt

You added another step to your task description and slightly modified the few-shot examples in your prompt. Not a lot of extra work for a task that would have required a lot more work without the help of an LLM. But is this really sufficient? Take a look at the output once your script has finished running:

Text
🔥
[Agent] 2023-07-15: Hello! What can I help you with today?
[Client] 2023-07-15: Hey, my promo code isn't applying the discount in my cart.
[Agent] 2023-07-15: My apologies for the trouble, ****. Could you tell me the promo code you're trying to use?
[Client] 2023-07-15: It's "SAVE20".

🔥
[Agent] 2023-07-24: Good day! How can I help you?
[Client] 2023-07-24: Hi "****", I can't update my 😤 credit card information. Do you want my 😤 money or not?
[Agent] 2023-07-24: I'm sorry for the inconvenience, ****. Can you please confirm your account's email?
[Client] 2023-07-24: Sure, you have all my 😤 data already anyways. It's ****.

✅
[Agent] 2023-08-13: Good morning! How may I assist you?
[Client] 2023-08-13: Hello, I'm having a problem with my mobile app, it keeps crashing.
[Agent] 2023-08-13: I'm sorry to hear that, ****. Could you tell me what device you're using?
[Client] 2023-08-13: I have an iPhone 11.

...

The output is quite promising! The model correctly labeled conversations with angry customers with the fire emoji. However, the first conversation probably doesn’t entirely fit into the same bucket as the rest because the customer doesn’t display a negative sentiment towards the company.

Assume that all of these conversations were resolved positively by the customer service agents and that your company just wants to follow up with those customers who seemed noticeably angry with their situation. In that case, you might need to tweak your prompt a bit more to get the desired result.

You could add more examples, which is generally a good idea because it creates more context for the model to apply. Writing a more detailed description of your task helps as well, as you’ve seen before. However, to tackle this task, you’ll learn about another useful prompt engineering technique called chain-of-thought prompting.

Walk the Model Through Chain-of-Thought Prompting

A widely successful prompt engineering approach can be summed up with the anthropomorphism of giving the model time to think. You can do this with a couple of different specific techniques. Essentially, it means that you prompt the LLM to produce intermediate results that become additional inputs. That way, the reasoning doesn’t need to take distant leaps but only hop from one lily pad to the next.

One of these approaches is to use chain-of-thought (CoT) prompting techniques. To apply CoT, you prompt the model to generate intermediate results that then become part of the prompt in a second request. The increased context makes it more likely that the model will arrive at a useful output.

The smallest form of CoT prompting is zero-shot CoT, where you literally ask the model to think step by step. This approach yields impressive results for mathematical tasks that LLMs otherwise often solve incorrectly.

Chain-of-thought operations are technically split into two stages:

  1. Reasoning extraction, where the model generates the increased context
  2. Answer extraction, where the model uses the increased context to generate the answer

Reasoning extraction is useful across a variety of CoT contexts. You can generate few-shot examples from input, which you can then use for a separate step of extracting answers using more detailed chain-of-thought prompting.

You can try zero-shot CoT on the sanitized chat conversations to embellish the few-shot examples that you’ll use to classify the chat conversations more robustly. Remove the examples and replace the instructions describing the reasoning on how you would classify the conversations in more detail:

TOML settings.toml
instruction_prompt = """
Classify the sentiment of each conversation in >>>>>CONTENT<<<<<
with "🔥" for negative and "✅" for positive.

Follow these steps when classifying the conversations:
1. Does the customer use swear words or 😤?
2. Does the customer seem aggravated or angry?

If you answer "Yes" to one of the above questions,
then classify the conversation as negative with "🔥".
Otherwise classify the conversation as positive with "✅".

Let's think step by step
"""

You spelled out the criteria that you want the model to use to assess and classify sentiment. Then you add the sentence Let’s think step by step to the end of your prompt.

You want to use this zero-shot CoT approach to generate few-shot examples that you’ll then build into your final prompt. Therefore, you should run the script using the data in sanitized-chats.txt this time:

Shell
(venv) $ python app.py sanitized-chats.txt

You’ll get back a reference to the conversations, with the reasoning spelled out step by step to reach the final conclusion:

Text
1. Conversation 1:
   - Does the customer use swear words or 😤? Yes
   - Does the customer seem aggravated or angry? Yes
   - Sentiment: 🔥

2. Conversation 2:
   - Does the customer use swear words or 😤? No
   - Does the customer seem aggravated or angry? No
   - Sentiment: ✅

...

The reasoning is straightforward and sticks to your instructions. If the instructions accurately represent the criteria for marking a conversation as positive or negative, then you’ve got a good playbook at hand.

You can now use this information to improve the few-shot examples for your sentiment classification task:

TOML settings.toml
instruction_prompt = """
Classify the sentiment of each conversation in >>>>>CONTENT<<<<<
with "🔥" for negative and "✅" for positive.

Follow these steps when classifying the conversations:
1. Does the customer use swear words or 😤?
2. Does the customer seem aggravated or angry?

If you answer "Yes" to one of the above questions,
then classify the conversation as negative with "🔥".
Otherwise classify the conversation as positive with "✅".

Let's think step by step

#### START EXAMPLES

------ Example Inputs ------
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!
   - Does the customer use swear words or 😤? Yes
   - Does the customer seem aggravated or angry? Yes
   - Sentiment: 🔥

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!
   - Does the customer use swear words or 😤? No
   - Does the customer seem aggravated or angry? No
   - Sentiment: ✅

------ Example Outputs ------
🔥
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!


[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!

#### END EXAMPLES
"""

You’re using the same examples as previously, but you’ve enhanced each of the examples with a short chain of thought that you generated in the previous call. Give your script another spin using sanitized-testing-chats.txt as the input file and see whether the results have improved:

Text
✅
[Agent] 2023-07-15: Hello! What can I help you with today?
[Client] 2023-07-15: Hey, my promo code isn't applying the discount in my cart.
[Agent] 2023-07-15: My apologies for the trouble, ********. Could you tell me the promo code you're trying to use?
[Client] 2023-07-15: It's "SAVE20".

🔥
[Agent] 2023-07-24: Good day! How can I help you?
[Client] 2023-07-24: Hi "********", I can't update my 😤 credit card information. Do you want my 😤 money or not?
[Agent] 2023-07-24: I'm sorry for the inconvenience, ********. Can you please confirm your account's email?
[Client] 2023-07-24: Sure, you have all my 😤 data already anyways. It's ********.

✅
[Agent] 2023-08-13: Good morning! How may I assist you?
[Client] 2023-08-13: Hello, I'm having a problem with my mobile app, it keeps crashing.
[Agent] 2023-08-13: I'm sorry to hear that, ********. Could you tell me what device you're using?
[Client] 2023-08-13: I have an iPhone 11.

...

Great! Now the first conversation, which was initially classified as negative, has also received the green checkmark.

In this section, you’ve supported your examples with reasoning for why a conversation should be labeled as positive vs negative. You generated this reasoning with another call to the LLM.

At this point, it seems that your prompt generalizes well to the available data and classifies the conversations as intended. And you only needed to carefully craft your words to make it happen!

Structure Your Output Format as JSON

As a final showcase for effective prompting when incorporating an LLM into your workflow, you’ll tackle the last task, which you added to the list youself: to pass the data on in a structured format that’ll make it straightforward for the customer support team to process further.

You already specified a format to follow in the previous prompt, and the LLM returned what you asked for. So it might just be a matter of asking for a different, more structured format, for example JSON:

TOML settings.toml
instruction_prompt = """
Classify the sentiment of each conversation in >>>>>CONTENT<<<<<
as "negative" and "positive".
Return the output as valid JSON.

Follow these steps when classifying the conversations:
1. Does the customer use swear words or 😤?
2. Does the customer seem aggravated or angry?

If you answer "Yes" to one of the above questions,
then classify the conversation as "negative".
Otherwise classify the conversation as "positive".

Let's think step by step

#### START EXAMPLES

------ Example Inputs ------
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!
   - Does the customer use swear words or 😤? Yes
   - Does the customer seem aggravated or angry? Yes
   - Sentiment: "negative"

[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!
   - Does the customer use swear words or 😤? No
   - Does the customer seem aggravated or angry? No
   - Sentiment: "positive"

------ Example Output ------

{
  "negative": [
    {
      "date": "2023-07-24",
      "conversation": [
        "A: What can I help you with?",
        "C: I CAN'T CONNECT TO MY 😤 ACCOUNT",
        "A: Are you sure it's not your caps lock?",
        "C: 😤! You're right!"
      ]
    }
  ],
  "positive": [
    {
      "date": "2023-06-15",
      "conversation": [
        "A: Hello! How can I assist you today?",
        "C: I can't seem to find the download link for my purchased software.",
        "A: No problem, ****. Let me find that for you. Can you please provide your order number?",
        "C: It's ****. Thanks for helping me out!"
      ]
    }
  ]
}

#### END EXAMPLES
"""

In your updated instruction_prompt, you’ve explicitly asked the model to return the output as valid JSON. Then, you also adapted your few-shot examples to represent the JSON output that you want to receive. Note that you also applied additional formatting by removing the date from each line of conversation and truncating the [Agent] and [Customer] labels to single letters, A and C.

You’re still using example chat conversations from your sanitized chat data in sanitized-chats.txt, and you send the sanitized testing data from sanitized-testing-chats.txt to the model for processing.

In this case, you receive valid JSON, as requested. The classification still works as before and the output censors personally identifiable information, replaces swear words, and applies all the additional requested formatting:

JSON
{
  "negative": [
    {
      "date": "2023-07-24",
      "conversation": [
        "A: Good day! How can I help you?",
        "C: Hi \"********\", I can't update my 😤 credit card information. Do you want my 😤 money or not?",
        "A: I'm sorry for the inconvenience, ********. Can you please confirm your account's email?",
        "C: Sure, you have all my 😤 data already anyways. It's ********."
      ]
    },
    ...
  ],
  "positive": [
    {
      "date": "2023-07-15",
      "conversation": [
        "A: Hello! What can I help you with today?",
        "C: Hey, my promo code isn't applying the discount in my cart.",
        "A: My apologies for the trouble, ********. Could you tell me the promo code you're trying to use?",
        "C: It's \"SAVE20\"."
      ]
    },
    ...
  ]
}

Your output may be different and show some small hiccups, but overall, this output is quite impressive and useful! You could pass this JSON structure over to the customer support team, and they could quickly integrate it into their workflow to follow up with customers who displayed a negative sentiment in the chat conversation.

You could stop here, but the engineer in you isn’t quite satisfied yet. All the instructions just in a single prompt? Your premonition calls and tells you tales about maintainability. In the next section, you’ll refactor your prompts to apply role labels before you set up your LLM-assisted pipeline and call it a day.

Improve Your Output With the Power of Conversation

You added a role prompt earlier on, but otherwise you haven’t tapped into the power of conversations yet.

In this final section, you’ll learn how you can provide additional context to the model by splitting your prompt into multiple separate messages with different labels.

In calls to the /chat/completions endpoint, a prompt is split into several messages. Each message has its content, which represents the prompt text. Additionally, it also has a role. There are different roles that a message can have, and you’ll work with three of them:

  1. "system" gives context for the conversation and helps to set the overall tone.
  2. "user" represents the input that a user of your application might provide.
  3. "assistant" represents the output that the model would reply with.

So far, you’ve provided context for different parts of your prompt all mashed together in a single prompt, more or less well separated using delimiters. When you use a model that’s optimized for chat, such as GPT-4, then you can use roles to let the LLM know what type of message you’re sending.

For example, you can create some variables for your few-shot examples and separate variables for the associated CoT reasoning and outputs:

TOML settings.toml
[prompts]
instruction_prompt = """
Classify the sentiment of each conversation in >>>>>CONTENT<<<<<
as "negative" and "positive".
Return the output as valid JSON.
"""
role_prompt = """You are a thoroughly trained machine learning
model that is an expert at sentiment classification.
You diligently complete tasks as instructed.
You never make up any information that isn't there."""
positive_example = """
[Agent] 2023-06-15 : Hello! How can I assist you today?
[Customer] 2023-06-15 : I can't seem to find the download link for my purchased software.
[Agent] 2023-06-15 : No problem, ****. Let me find that for you. Can you please provide your order number?
[Customer] 2023-06-15 : It's ****. Thanks for helping me out!
"""
positive_reasoning = """
- Does the customer use swear words or 😤? No
- Does the customer seem aggravated or angry? No
- Sentiment: "positive"
"""
positive_output = """
"positive": [
  {
    "date": "2023-06-15",
    "conversation": [
      "A: Hello! How can I assist you today?",
      "C: I can't seem to find the download link for my purchased software.",
      "A: No problem, ****. Let me find that for you. Can you please provide your order number?",
      "C: It's ****. Thanks for helping me out!"
    ]
  }
]
"""
negative_example = """
[Agent] 2023-07-24 : What can I help you with?
[Customer] 2023-07-24 : I CAN'T CONNECT TO MY 😤 ACCOUNT
[Agent] 2023-07-24 : Are you sure it's not your caps lock?
[Customer] 2023-07-24 : 😤! You're right!
"""
negative_reasoning = """
- Does the customer use swear words or 😤? Yes
- Does the customer seem aggravated or angry? Yes
- Sentiment: "negative"
"""
negative_output = """
"negative": [
  {
    "date": "2023-07-24",
    "conversation": [
      "A: What can I help you with?",
      "C: I CAN'T CONNECT TO MY 😤 ACCOUNT",
      "A: Are you sure it's not your caps lock?",
      "C: 😤! You're right!"
    ]
  }
]
"""

You’ve disassembled your instruction_prompt into seven separate prompts, based on what role the messages have in your conversation with the LLM.

The helper function that builds a messages payload, _assemble_chat_messages(), is already set up to include all of these prompts in the API request. Take a look into app.py to check out the separate messages, with their fitting roles, that make up your overall prompt:

Python app.py
# ...

def _assemble_chat_messages(content: str) -> list[dict]:
    """Combine all messages into a well-formatted list of dicts."""
    messages = [
        {"role": "system", "content": settings.role_prompt},
        {"role": "user", "content": settings.negative_example},
        {"role": "system", "content": settings.negative_reasoning},
        {"role": "assistant", "content": settings.negative_output},
        {"role": "user", "content": settings.positive_example},
        {"role": "system", "content": settings.positive_reasoning},
        {"role": "assistant", "content": settings.positive_output},
        {"role": "user", "content": f">>>>>\n{content}\n<<<<<"},
        {"role": "user", "content": settings.instruction_prompt},
    ]
    return messages

Your prompt is now split into distinct parts, each of which has a certain role label:

  • Example input has the "user" role.
  • Reasoning that the model created has the "system" role.
  • Example output has the "assistant" role.

You’re now providing context for how user input might look, how the model can reason about classifying the input, and how your expected output should look. You removed the delimiters that you previously used for labeling the example sections. They aren’t necessary now that you’re providing context for the parts of your prompt through separate messages.

Give your script a final run to see whether the power of conversation has managed to improve the output:

JSON
{
  "positive": [
    {
      "date": "2023-07-15",
      "conversation": [
        "A: Hello! What can I help you with today?",
        "C: Hey, my promo code isn't applying the discount in my cart.",
        "A: My apologies for the trouble, ********. Could you tell me the promo code you're trying to use?",
        "C: It's \"SAVE20\"."
      ]
    },
    {
      "date": "2023-08-13",
      "conversation": [
        "A: Good morning! How may I assist you?",
        "C: Hello, I'm having a problem with my mobile app, it keeps crashing.",
        "A: I'm sorry to hear that, ********. Could you tell me what device you're using?",
        "C: I have an iPhone 11."
      ]
    },
    {
      "date": "2023-09-01",
      "conversation": [
        "A: Hello! How can I assist you this morning?",
        "C: Hi, I'm trying to make a purchase but it's not going through.",
        "A: I'm sorry to hear that, ********. Can you tell me what error message you're receiving?",
        "C: It's saying \"Payment method not valid\"."
      ]
    },
    {
      "date": "2023-10-11",
      "conversation": [
        "A: Good morning! How may I assist you?",
        "C: Hello, I'd like to know the status of my order.",
        "A: Of course, ********. Could you please provide me with the order number?",
        "C: It's ********."
      ]
    },
    {
      "date": "2023-10-29",
      "conversation": [
        "A: Hello! What can I help you with today?",
        "C: Hi ********, I was charged twice for my last order.",
        "A: I'm sorry to hear that, ********. Could you share your order number so I can look into this for you?",
        "C: Sure, it's ********."
      ]
    },
    {
      "date": "2023-11-08",
      "conversation": [
        "A: How can I help you today?",
        "C: Hi, I made an order last week but I need to change the sizing.",
        "A: Certainly, ********. Could you provide me the order number?",
        "C: Yes, it's ********. Thanks!"
      ]
    }
  ],
  "negative": [
    {
      "date": "2023-07-24",
      "conversation": [
        "A: Good day! How can I help you?",
        "C: Hi \"********\", I can't update my 😤 credit card information. Do you want my 😤 money or not?",
        "A: I'm sorry for the inconvenience, ********. Can you please confirm your account's email?",
        "C: Sure, you have all my 😤 data already anyways. It's ********."
      ]
    },
    {
      "date": "2023-08-30",
      "conversation": [
        "A: Good evening! How may I assist you today?",
        "C: Hi ********, I've forgotten my 😤 password and I can't login into my account.",
        "A: I'm sorry for the trouble, ********. Could you confirm your email address so we can reset your password?",
        "C: Definitely, it's ********."
      ]
    },
    {
      "date": "2023-10-19",
      "conversation": [
        "A: Welcome! How can I assist you right now?",
        "C: 😤! There's no option to change my profile picture. What kind of 😤 joint are you running?",
        "A: Let me help you with this, ********. Are you trying to update it from the mobile app or the website?",
        "C: I'm using the 😤 website"
      ]
    }
  ]
}

This JSON structure is looking legitimately great! The formatting that you wanted now shows up throughout, and the conversations are labeled correctly.

Additionally, you’ve improved the maintainability of your prompts by splitting them into separate labels. You can feel proud to pass on such a useful edit of the customer chat conversation data to your coworkers!

Key Takeaways

You’ve covered common prompt engineering techniques, and here, you’ll find a few questions and answers that sum up the most important concepts that you’ve covered in this tutorial.

You can use these questions to check your understanding or to recap and solidify what you’ve just learned. After each question, you’ll find a brief explanation hidden in a collapsible section. Click the Show/Hide toggle to reveal the answer. Time to dive in!

Knowledge about prompt engineering is crucial when you work with large language models (LLMs) because you can receive much better results with carefully crafted prompts.

The temperature setting controls the amount of randomness in your output. Setting the temperature argument of API calls to 0 will increase consistency in the responses from the LLM. Note that OpenAI’s LLMs are only ever mostly deterministic, even with the temperature set to 0.

Few-shot prompting is a common prompt engineering technique where you add examples of expected input and desired output to your prompt.

Using delimiters can be helpful when dealing with more complex prompts. Delimiters help to separate and label sections of the prompt, assisting the LLM in understanding its tasks better.

Testing your prompt with data that’s separate from the training data is important to see how well the model generalizes to new conditions.

Yes, generally adding more context will lead to more accurate results. However, it’s also important how you add the additional context. Just adding more text may lead to worse results.

Role prompting means providing a system message that sets the tone or context for a conversation. This can greatly impact how the model constructs the response. You can also use roles to provide context labels for parts of your prompt.

In chain-of-thought (CoT) prompting, you prompt the LLM to produce intermediate reasoning steps. You can then include these steps in the answer extraction step to receive better results.

Next Steps

In this tutorial, you’ve learned about various prompt engineering techniques, and you’ve built an LLM-assisted Python application along the way. If you’d like to learn more about prompt engineering, then check out some related questions, as well as some resources for further study below:

Yes, prompt engineer can be a real job, especially in the context of AI and machine learning. As a prompt engineer, you design and optimize prompts so that AI models like GPT-4 produce desired responses. However, it might not be a stand-alone job title everywhere. It could be a part of broader roles like machine learning engineer or data scientist.

Prompt engineering, like any other technical skill, requires time, effort, and practice to learn. It’s not necessarily easy, but it’s certainly possible for someone with the right mindset and resources to learn it. If you’ve enjoyed the iterative and text-based approach that you learned about in this tutorial, then prompt engineering might be a good fit for you.

The field of prompt engineering is quite new, and LLMs keep developing quickly as well. The landscape, best practices, and most effective approaches are therefore changing rapidly. To continue learning about prompt engineering using free and open-source resources, you can check out Learn Prompting and the Prompt Engineering Guide.

Have you found any interesting ways to incorporate an LLM into your workflow? Share your thoughts and experiences in the comments below.

Take the Quiz: Test your knowledge with our interactive “Practical Prompt Engineering” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Practical Prompt Engineering

In this quiz, you'll test your understanding of prompt engineering techniques with large language models (LLMs) like GPT-3.5 and GPT-4. You'll revisit how to work with OpenAI's API, apply prompt engineering techniques to practical examples, and use various strategies to improve your results.

🐍 Python Tricks 💌

Get a short & sweet Python Trick delivered to your inbox every couple of days. No spam ever. Unsubscribe any time. Curated by the Real Python team.

Python Tricks Dictionary Merge

About Martin Breuss

Martin likes automation, goofy jokes, and snakes, all of which fit into the Python community. He enjoys learning and exploring and is up for talking about it, too. He writes and records content for Real Python and CodingNomads.

» More about Martin

Each tutorial at Real Python is created by a team of developers so that it meets our high quality standards. The team members who worked on this tutorial are:

Master Real-World Python Skills With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

Master Real-World Python Skills
With Unlimited Access to Real Python

Locked learning resources

Join us and get access to thousands of tutorials, hands-on video courses, and a community of expert Pythonistas:

Level Up Your Python Skills »

What Do You Think?

Rate this article:

What’s your #1 takeaway or favorite thing you learned? How are you going to put your newfound skills to use? Leave a comment below and let us know.

Commenting Tips: The most useful comments are those written with the goal of learning from or helping out other students. Get tips for asking good questions and get answers to common questions in our support portal.


Looking for a real-time conversation? Visit the Real Python Community Chat or join the next “Office Hours” Live Q&A Session. Happy Pythoning!

Keep Learning

Related Tutorial Categories: data-science intermediate