Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prompting-command-r.mdx #69

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,21 +1,124 @@
---
title: "Prompting Command R"
title: "Prompting Command R and R+"
slug: "docs/prompting-command-r"

hidden: true
description: "This document discusses the importance of prompt engineering for LLMs, provides a structured prompt template for RAG tasks, and explains how to modify prompts for different tasks and styles. It also includes examples of changing the output format, style, and task context."
description: "This document provides detailed examples and guidelines on the prompt structure to usse with Command R/R+ across various tasks, including Retrieval-Augmented Generation (RAG), summarization, single-step and multi-step tool use, with comprehensive."
image: "../../../assets/images/b2b492c-cohere_meta_image.jpg"
keywords: "prompt engineering, large language model prompting"

createdAt: "Thu Mar 14 2024 17:14:34 GMT+0000 (Coordinated Universal Time)"
updatedAt: "Mon May 06 2024 19:22:34 GMT+0000 (Coordinated Universal Time)"
---

Getting an LLM to do what you want and perform well on your task often requires some amount of prompt engineering. Depending on the complexity of the task and the strength of the model, this can be time consuming. Similarly, if you are trying to compare two models in a fair way, it is hard to know what differences in performance are due to actual superiority of a model vs an unoptimized prompt. At minimum, it is important to do simple things like making sure you are using the correct special tokens which can change from one family of model to the next but can have an important impact on performance. These tokens do things like indicate the beginning and end of prompts and distinguish between user and chatbot utterances.
Effective prompt engineering is crucial to getting the desired performance from large language models (LLMs) like Command R/R+. This process can be time-consuming, especially for complex tasks or when comparing models. To ensure fair comparisons and optimize performance, it’s essential to use the correct special tokens, which may vary between models and significantly impact outcomes.

The easiest way to make sure your prompts will work well with Command R is to use our [tokenizer on Hugging Face](https://huggingface.co/CohereForAI/c4ai-command-r-v01) if your use-case is covered by the baked-in defaults. In this doc we will go over the structure of our prompts and general best practices on how to tweak it in a way that will have it performing best on your tasks. This gives you the control over how the model behaves to tweak and experiment what fits your unique use case the best.
Each task requires its own prompt template. This document outlines the structure and best practices for the following use cases:
- Retrieval-Augmented Generation (RAG) with Command R/R+
- Summarization with Command R/R+
- Single-Step Tool Use with Command R/R+ (Function Calling)
- Multi-Step Tool Use with Command R/R+ (Agents)

## Structured Prompts for RAG
The easiest way to make sure your prompts will work well with Command R/R+ is to use our [tokenizer on Hugging Face](https://huggingface.co/CohereForAI/c4ai-command-r-v01). Today, HuggingFace has prompt templates for:
- RAG with Command R/R+
- Single-Step Tool Use with Command R/R+ (Function Calling)

We are working on adding prompt templates in HuggingFace for Multi-Step Tool Use with Command R/R+ (Agents).

## High-Level Overview of Prompt Templates

The prompt for Command R/R+ is composed of structured sections, each serving a specific purpose. Below is an overview of the main components. We’ve color coded the different sections of the prompt to make them easy to pick out and we will go over them in more detail later.

### Augmented Generation Prompt Template (RAG and Summarization)

In RAG, the workflow involves two steps:
1. **Retrieval**: Retrieving the relevant snippets.
2. **Augmented Generation**: Generating a response based on these snippets.

Summarization is very similar to augmented generation: the model takes in some documents and its response (the summary) needs to be conditioned on those documents.

This way, RAG and Summarization follow a similar prompt template. It is the Augmented Generation prompt template and here’s what it looks like at a high level:

> augmented_gen_prompt_template =
>
> """<span class="dark-blue-text">\<BOS_TOKEN> </span><span class="brown-text">\<|START_OF_TURN_TOKEN|></span><span class="dark-orange-text">\<|SYSTEM_TOKEN|> </span><span class="red-text"># Safety Preamble</span> <span class="red-text">\{SAFETY_PREAMBLE}</span>
>
> <br />
>
> <span class="dark-green-text"># System Preamble</span>
>
> <span class="green-text">## Basic Rules</span>
> <span class="green-text">\{BASIC_RULES}</span>
>
> <br />
>
> <span class="dark-purple-text"># User Preamble</span>
> <span class="purple-text">## Task and Context</span>
> <span class="purple-text">\{TASK_CONTEXT}</span>
>
> <br />
>
> <span class="dark-sangria-text">## Style Guide</span>
> <span class="dark-sangria-text">\{STYLE_GUIDE}</span>
>
> <br />
>
> <span class="brown-text">\<|END_OF_TURN_TOKEN|></span>
> <span class="orange-text">\{CHAT_HISTORY}</span>
> <span class="brown-text">\<|START_OF_TURN_TOKEN|></span>
> <span class="dark-orange-text">\<|SYSTEM_TOKEN|></span>
> <span class="orange-text">\{RETRIEVED_SNIPPETS_FOR_RAG or TEXT_TO_SUMMARIZE}</span>
> <span class="brown-text">\<|END_OF_TURN_TOKEN|></span>
> <span class="brown-text">\<|START_OF_TURN_TOKEN|></span>
> <span class="dark-orange-text">\<|SYSTEM_TOKEN|></span>
> <span class="orange-text">\{INSTRUCTIONS}</span>
> <span class="brown-text">\<|END_OF_TURN_TOKEN|></span>
> <span class="brown-text">\<|START_OF_TURN_TOKEN|></span>
> <span class="dark-orange-text">\<|CHATBOT_TOKEN|></span>"""


We can see that the prompt is set up in a structured way where we have sections for things like the <span class="green-text">basic rules</span> we want the model to follow, the <span class="purple-text">task</span> we want it to solve, and the <span class="pink-text">style</span> in which it should write its output in.


### Single step Tool Use with Command R/R+ (Function Calling)

Single-step tool use (or “Function Calling”) allows Command R/R+ to interact with external tools like APIs, databases, or search engines. Single-step tool use is made of two model inferences:
1. **Tool Selection**: The model decides which tools to call and with what parameters. It’s then up to the developer to execute these tool calls and obtain tool results.
2. **Response Generation**: The model generates the final response given the tool results.

You can learn more about single step tool use [in our documentation](https://docs.cohere.com/docs/tool-use). Let’s go over the prompt template for Tool Section, and for Response Generation.































-----

Before going into detail on the different components of the prompt and how they fit together, let’s start by looking at a fully rendered prompt. Let’s take an example of using Command R for a simple RAG use case where we are given a user query like: <span class="orange-text">What’s the biggest penguin in the world?</span>

Expand All @@ -24,6 +127,8 @@ To solve this problem, we will use the model to perform the two steps of RAG:
- 1/ Retrieval
- 2/ Augmented Generation



### Fully Rendered Default Tool-use Prompt

Let’s start with retrieval, where the model will make calls to an <span class="quartz-text ">internet_search</span> tool to collect relevant documents needed to answer the user’s question. To enable that, we will create a rendered tool use prompt that will give the model access to two tools:
Expand Down Expand Up @@ -266,7 +371,7 @@ In addition to changing the format of the output, we can also easily change the
> <span class="extra-green">“””## Style Guide
> Answer in the style of David Attenborough.”””</span>

Which will have the model instead produce this majestic response:
Which will have the models instead produce this majestic response:

> <span class="extra-green">Grounded answer: And here, emerging from the icy waters, is the majestic emperor penguin, the largest species of its kind. Growing to an impressive height of 122 centimeters[0], these majestic birds rule the Antarctic[1] oceans. Their imposing stature and proud demeanor make them a sight to behold.</span>

Expand Down Expand Up @@ -375,4 +480,4 @@ def render_chat_history(_conversation: list[dict]) -> str:


rendered_chat_history = render_chat_history(conversation)
```
```
Loading