docs: update context length limits to 16384 for chat finetunes

cohere-ai · Oct 3, 2024 · 73bf834 · 73bf834
1 parent fb85b13
commit 73bf834
Show file tree

Hide file tree

Showing 4 changed files with 10 additions and 14 deletions.
diff --git a/fern/pages/fine-tuning/chat-fine-tuning/chat-preparing-the-data.mdx b/fern/pages/fine-tuning/chat-fine-tuning/chat-preparing-the-data.mdx
@@ -63,8 +63,7 @@ To pass the validation tests Cohere performs on uploaded data, ensure that:
 
 - You have the proper roles. There are only three acceptable values for the `role` field: `System`, `Chatbot` or `User`. There should be at least one instance of `Chatbot` and `User` in each conversation. If your dataset includes other roles, an error will be thrown.
 - A preamble should be uploaded as the first message in the conversation, with `role: System`. All other messages with `role: System` will be treated as speakers in the conversation.
-- The "System" preamble message is not longer than 4096 tokens, which is half the maximum training sequence length.
-- Each turn in the conversation should be within the training context length of 8192 tokens to avoid being dropped from the dataset. We explain a turn in the "Chat Customization Best Practices" section below.
+- Each turn in the conversation should be within the training context length of 16384 tokens to avoid being dropped from the dataset. We explain a turn in the "Chat Customization Best Practices" section below.
 - Your data is encoded in UTF-8.
 
 ### Evaluation Datasets
@@ -126,7 +125,7 @@ A turn includes all messages up to the Chatbot speaker. The following conversati
 
 A few things to bear in mind:
 
-- The preamble is always kept within the context window. This means that the preamble and _all turns within the context window_ should be within 8192 tokens.
+- The preamble is always kept within the context window. This means that the preamble and _all turns within the context window_ should be within 16384 tokens.
 - To check how many tokens your data is, you can use the [co.tokenize() api](/reference/tokenize).
-- If any turns are above the context length of 8192 tokens, we will drop them from the training data.
+- If any turns are above the context length of 16384 tokens, we will drop them from the training data.
 - If an evaluation file is not uploaded, we will make our best effort to automatically split your uploaded conversations into an 80/20 split. In other words, if you upload a training dataset containing only the minimum of two conversations, we'll randomly put one of them in the training set, and the other in the evaluation set.
diff --git a/fern/pages/fine-tuning/chat-fine-tuning/chat-starting-the-training.mdx b/fern/pages/fine-tuning/chat-fine-tuning/chat-starting-the-training.mdx
@@ -63,8 +63,7 @@ There a certain requirements for the data you use to fine-tune a model for Chat
 
 - There are only three acceptable values for the `role` field: `System`, `Chatbot` or `User`. There should be at least one instance of `Chatbot` and `User` in each conversation. If your dataset includes other roles, a validation error will be thrown.
 - A preamble should be uploaded as the first message in the conversation, with `role: System`. All other messages with `role: System` will be treated as speakers in the conversation.
-- Preambles should have a context length no longer than 4096 tokens.
-- What's more, each turn in the conversation should be within the context length of 4096 tokens to avoid being dropped from the dataset. We explain a turn in the ['Chat Customization Best Practices'](/docs/chat-preparing-the-data#:~:text=.await_validation()) section.
+- What's more, each turn in the conversation should be within the context length of 16384 tokens to avoid being dropped from the dataset. We explain a turn in the ['Chat Customization Best Practices'](/docs/chat-preparing-the-data#:~:text=.await_validation()) section.
 
 If you need more information, see ['Preparing the Data'](/docs/chat-preparing-the-data).
 
@@ -180,7 +179,7 @@ Below is a table of errors or warnings you may receive and how to fix them.
 | Error         | 'extra speaker in example: \<extra_speaker_name> (line : X)'                                                                                                         | This means that the uploaded training dataset has speakers which are not one of the allowed roles: `System`,`User` or `Chatbot`                                                                                                                     | Rename or remove the extra speaker and re-upload the dataset.                         |
 | Error         | 'missing Chatbot in example'  \nOR  \n'missing User in example'                                                                                                      | This means the uploaded training dataset is missing either `Chatbot` or `User` speaker, both of which are required.                                                                                                                                | Upload your dataset with required speakers `Chatbot` and `User`                       |
 | Warning       | 'dataset has 0 valid eval rows. dataset will be auto-split'                                                                                                          | This error is thrown when eval data was not uploaded, in which case the dataset will be auto-split with 80% going to training and 20% to evaluation.                                                                                               | None                                                                                  |
-| Warning       | 'train dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y'  \nOR  \n'eval dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y' | This means the train and/or eval dataset has turns which exceed the context length of 4096 tokens, and will be dropped for training. The message specifies the conversation index x (which starts at 0), as well as the number of turns over the context length in that conversation, y. | If you do not want any turns dropped, consider shortening turns.                      |
+| Warning       | 'train dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y'  \nOR  \n'eval dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y' | This means the train and/or eval dataset has turns which exceed the context length of 16384 tokens, and will be dropped for training. The message specifies the conversation index x (which starts at 0), as well as the number of turns over the context length in that conversation, y. | If you do not want any turns dropped, consider shortening turns.                      |
 
 
 

diff --git a/fern/pages/v2/fine-tuning/chat-fine-tuning/chat-preparing-the-data.mdx b/fern/pages/v2/fine-tuning/chat-fine-tuning/chat-preparing-the-data.mdx
@@ -62,8 +62,7 @@ To pass the validation tests Cohere performs on uploaded data, ensure that:
 
 - You have the proper roles. There are only three acceptable values for the `role` field: `System`, `Chatbot` or `User`. There should be at least one instance of `Chatbot` and `User` in each conversation. If your dataset includes other roles, an error will be thrown.
 - A preamble should be uploaded as the first message in the conversation, with `role: System`. All other messages with `role: System` will be treated as speakers in the conversation.
-- The "System" preamble message is not longer than 4096 tokens, which is half the maximum training sequence length.
-- Each turn in the conversation should be within the training context length of 8192 tokens to avoid being dropped from the dataset. We explain a turn in the "Chat Customization Best Practices" section below.
+- Each turn in the conversation should be within the training context length of 16384 tokens to avoid being dropped from the dataset. We explain a turn in the "Chat Customization Best Practices" section below.
 - Your data is encoded in UTF-8.
 
 ### Evaluation Datasets
@@ -125,7 +124,7 @@ A turn includes all messages up to the Chatbot speaker. The following conversati
 
 A few things to bear in mind:
 
-- The preamble is always kept within the context window. This means that the preamble and _all turns within the context window_ should be within 8192 tokens.
+- The preamble is always kept within the context window. This means that the preamble and _all turns within the context window_ should be within 16384 tokens.
 - To check how many tokens your data is, you can use the [Tokenize API](/reference/tokenize).
-- If any turns are above the context length of 8192 tokens, we will drop them from the training data.
+- If any turns are above the context length of 16384 tokens, we will drop them from the training data.
 - If an evaluation file is not uploaded, we will make our best effort to automatically split your uploaded conversations into an 80/20 split. In other words, if you upload a training dataset containing only the minimum of two conversations, we'll randomly put one of them in the training set, and the other in the evaluation set.
diff --git a/fern/pages/v2/fine-tuning/chat-fine-tuning/chat-starting-the-training.mdx b/fern/pages/v2/fine-tuning/chat-fine-tuning/chat-starting-the-training.mdx
@@ -65,8 +65,7 @@ There a certain requirements for the data you use to fine-tune a model for Chat
 
 - There are only three acceptable values for the `role` field: `System`, `Chatbot` or `User`. There should be at least one instance of `Chatbot` and `User` in each conversation. If your dataset includes other roles, a validation error will be thrown.
 - A preamble should be uploaded as the first message in the conversation, with `role: System`. All other messages with `role: System` will be treated as speakers in the conversation.
-- Preambles should have a context length no longer than 4096 tokens.
-- What's more, each turn in the conversation should be within the context length of 4096 tokens to avoid being dropped from the dataset. We explain a turn in the ['Chat Customization Best Practices'](/v2/docs/chat-preparing-the-data#chat-customization-best-practices) section.
+- What's more, each turn in the conversation should be within the context length of 16384 tokens to avoid being dropped from the dataset. We explain a turn in the ['Chat Customization Best Practices'](/v2/docs/chat-preparing-the-data#chat-customization-best-practices) section.
 
 If you need more information, see ['Preparing the Data'](/v2/docs/chat-preparing-the-data).
 
@@ -182,7 +181,7 @@ Below is a table of errors or warnings you may receive and how to fix them.
 | Error         | 'extra speaker in example: \<extra_speaker_name> (line : X)'                                                                                                         | This means that the uploaded training dataset has speakers which are not one of the allowed roles: `System`,`User` or `Chatbot`                                                                                                                     | Rename or remove the extra speaker and re-upload the dataset.                         |
 | Error         | 'missing Chatbot in example'  \nOR  \n'missing User in example'                                                                                                      | This means the uploaded training dataset is missing either `Chatbot` or `User` speaker, both of which are required.                                                                                                                                | Upload your dataset with required speakers `Chatbot` and `User`                       |
 | Warning       | 'dataset has 0 valid eval rows. dataset will be auto-split'                                                                                                          | This error is thrown when eval data was not uploaded, in which case the dataset will be auto-split with 80% going to training and 20% to evaluation.                                                                                               | None                                                                                  |
-| Warning       | 'train dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y'  \nOR  \n'eval dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y' | This means the train and/or eval dataset has turns which exceed the context length of 4096 tokens, and will be dropped for training. The message specifies the conversation index x (which starts at 0), as well as the number of turns over the context length in that conversation, y. | If you do not want any turns dropped, consider shortening turns.                      |
+| Warning       | 'train dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y'  \nOR  \n'eval dataset has conversations with too many tokens. conversation number: number of turns with too many tokens is as follows, x:y' | This means the train and/or eval dataset has turns which exceed the context length of 16384 tokens, and will be dropped for training. The message specifies the conversation index x (which starts at 0), as well as the number of turns over the context length in that conversation, y. | If you do not want any turns dropped, consider shortening turns.                      |