diff --git a/fern/assets/images/multi-modal-guide-header.png b/fern/assets/images/multi-modal-guide-header.png new file mode 100644 index 00000000..5866bb32 Binary files /dev/null and b/fern/assets/images/multi-modal-guide-header.png differ diff --git a/fern/pages/changelog/2024-10-22-Embed-v3-is-multimodal.mdx b/fern/pages/changelog/2024-10-22-Embed-v3-is-multimodal.mdx new file mode 100644 index 00000000..17556d30 --- /dev/null +++ b/fern/pages/changelog/2024-10-22-Embed-v3-is-multimodal.mdx @@ -0,0 +1,36 @@ +--- +title: "Embed v3.0 Models are now Multimodal" +slug: "changelog/embed-v3-is-multimodal" +createdAt: "Tues Oct 22 2024 05:30:00 (MST)" +hidden: false +description: >- + Launch of multimodal embeddings for our Embed models, plus some code to help get started. +--- + +Today we’re announcing updates to our embed-v3.0 family of models. These models now have the ability to process images into embeddings. There is no change to existing text capabilities which means there is no need to re-embed texts you have already processed with our `embed-v3.0` models. + +In the rest of these release notes, we’ll provide more details about technical enhancements, new features, and new pricing. + +## Technical Details +### API Changes: +The Embed API has two major changes: +- Introduced a new `input_type` called `image` +- Introduced a new parameter called `images` + +Example request on how to process + +```Text cURL +POST https://api.cohere.ai/v1/embed +{ + "model": "embed-multilingual-v3.0", + "input_type": "image", + "embedding_types": ["float"], + "images": [enc_img] +} +``` +### Restrictions: +- The API only accepts images in the base format of the following: `png`, `jpeg`,`Webp`, and `gif` +- Image embeddings currently does not support batching so the max images sent per request is 1 +- The maximum image sizez is `5mb` +- The `images` parameter only accepts a base64 encoded image formatted as a Data Url + diff --git a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx index 55e7d33b..9b2a475a 100644 --- a/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx +++ b/fern/pages/deployment-options/cohere-on-aws/amazon-sagemaker-setup-guide.mdx @@ -77,6 +77,8 @@ result = co.embed( print(result) ``` +Note that we've released multimodal embeddings models that are able to handle images in addition to text. Find [more information here](https://docs.cohere.com/docs/multimodal-embeddings). + ## Text Generation You can use this code to invoke Cohere's Command models on Amazon SageMaker: diff --git a/fern/pages/deployment-options/cohere-on-microsoft-azure.mdx b/fern/pages/deployment-options/cohere-on-microsoft-azure.mdx index ab454424..58e769d9 100644 --- a/fern/pages/deployment-options/cohere-on-microsoft-azure.mdx +++ b/fern/pages/deployment-options/cohere-on-microsoft-azure.mdx @@ -99,6 +99,7 @@ Though this section is called "Text Generation", it's worth pointing out that th We expose two routes for Embed v3 - English and Embed v3 - Multilingual inference: - `v1/embeddings` adheres to the Azure AI Generative Messages API schema; + - Use `v1/images/embeddings` if you want to use one of our [multimodal embeddings models](/docs/multimodal-embeddings). - ` v1/embed` supports Cohere's native API schema. You can find more information about Azure's API [here](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-cohere-embed#embed-api-reference-for-cohere-embed-models-deployed-as-a-service). diff --git a/fern/pages/models/cohere-embed.mdx b/fern/pages/models/cohere-embed.mdx index 9e88bccf..9d95cdb2 100644 --- a/fern/pages/models/cohere-embed.mdx +++ b/fern/pages/models/cohere-embed.mdx @@ -14,22 +14,22 @@ Embed models can be used to generate embeddings from text or classify it based o ## English Models -| Latest Model | Description | Dimensions | Max Tokens (Context Length) | Similarity Metric | Endpoints | -|-----------------------------|------------------------------------------------------------------------------------------------------|------------|-----------------------------|-------------------|-------------------------------------------------------------------------------------------| -| `embed-english-v3.0` | A model that allows for text to be classified or turned into embeddings. English only. | 1024 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| `embed-english-light-v3.0` | A smaller, faster version of `embed-english-v3.0`. Almost as capable, but a lot faster. English only.| 384 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| `embed-english-v2.0` | Our older embeddings model that allows for text to be classified or turned into embeddings. English only | 4096 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | -| `embed-english-light-v2.0` | A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only. | 1024 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| Latest Model | Description | Modality | Dimensions | Max Tokens (Context Length) | Similarity Metric | Endpoints | +|-----------------------------|----------------------------------------------------------------------------------------------------------|--------------|------------|-----------------------------|---------------------------------------------------------------|------------------------------------------------------------------------------------| +| `embed-english-v3.0` | A model that allows for text to be classified or turned into embeddings. English only. | Text, Images | 1024 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| `embed-english-light-v3.0` | A smaller, faster version of `embed-english-v3.0`. Almost as capable, but a lot faster. English only. | Text, Images | 384 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| `embed-english-v2.0` | Our older embeddings model that allows for text to be classified or turned into embeddings. English only.| Text | 4096 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| `embed-english-light-v2.0` | A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only. | Text | 1024 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | ## Multi-Lingual Models -| Latest Model | Description | Dimensions | Max Tokens (Context Length) | Similarity Metric | Endpoints | -|----------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|------------|-----------------------------|-------------------------|---------------------------------------------------------------------------------------------------| -| `embed-multilingual-v3.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | 1024 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed), [Embed Jobs](/reference/embed-jobs) | -| `embed-multilingual-light-v3.0` | A smaller, faster version of `embed-multilingual-v3.0`. Almost as capable, but a lot faster. Supports multiple languages. | 384 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| `embed-multilingual-v2.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | 768 | 256 | Dot Product Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| Latest Model | Description | Modality | Dimensions | Max Tokens (Context Length) | Similarity Metric | Endpoints | +|----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------|------------|-----------------------------|---------------------------------------------------------------------|------------------------------------------------------------------------| +| `embed-multilingual-v3.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | Text, Images | 1024 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed), [Embed Jobs](/reference/embed-jobs) | +| `embed-multilingual-light-v3.0` | A smaller, faster version of `embed-multilingual-v3.0`. Almost as capable, but a lot faster. Supports multiple languages. | Text, Images | 384 | 512 | Cosine Similarity, Dot Product Similarity, Euclidean Distance | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| `embed-multilingual-v2.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | Text | 768 | 256 | Dot Product Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | diff --git a/fern/pages/models/models.mdx b/fern/pages/models/models.mdx index af58a2e9..0107ee3b 100644 --- a/fern/pages/models/models.mdx +++ b/fern/pages/models/models.mdx @@ -36,21 +36,23 @@ In this section, we'll provide some high-level context on Cohere's offerings, an Command is Cohere's default generation model that takes a user instruction (or command) and generates text following the instruction. Our Command models also have conversational capabilities which means that they are well-suited for chat applications. -| Model Name | Description | Context Length | Maximum Output Tokens | Endpoints | -|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------------------|-------------------------------------------------------------------------------------------| -| `command-r-plus-08-2024` | `command-r-plus-08-2024` is an update of the Command R+ model, delivered in August 2024. Find more information [here](https://docs.cohere.com/changelog/command-gets-refreshed) | 128k | 4k | [Chat](/reference/chat) | -| `command-r-plus-04-2024` | Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use. | 128k | 4k | [Chat](/reference/chat) | -| `command-r-plus` | `command-r-plus` is an alias for `command-r-plus-04-2024`, so if you use `command-r-plus` in the API, that's the model you're pointing to. | 128k | 4k | [Chat](/reference/chat) | -| `command-r-08-2024` | `command-r-08-2024` is an update of the Command R model, delivered in August 2024. Find more information [here](https://docs.cohere.com/changelog/command-gets-refreshed) | 128k | 4k | [Chat](/reference/chat) | -| `command-r-03-2024` | Command R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. | 128k | 4k | [Chat](/reference/chat) | -| `command-r` | `command-r` is an alias for `command-r-03-2024`, so if you use `command-r` in the API, that's the model you're pointing to. | 128k | 4k | [Chat](/reference/chat) | -| | | | | | -| `command` | An instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize) | -| `command-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command`, that is `command-nightly`.

Be advised that `command-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | 128k | 128k | [Chat](/reference/chat) | -| `command-light` | A smaller, faster version of `command`. Almost as capable, but a lot faster. | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize-2) | -| `command-light-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command-light`, that is `command-light-nightly`.

Be advised that `command-light-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | 4k | 4k | [Chat](/reference/chat) | -| `c4ai-aya-23-35b` | The 35B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-35B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | 8k | 8k | [Chat](/reference/chat) | -| `c4ai-aya-23-8b` | The 8B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-8B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | 8k | 8k | [Chat](/reference/chat) | + +| Model Name | Description | Modality | Context Length | Maximum Output Tokens | Endpoints | +|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------|----------------|-----------------------|-------------------------------------------------------------------------------------------| +| `command-r-plus-08-2024` | `command-r-plus-08-2024` is an update of the Command R+ model, delivered in August 2024. Find more information [here](https://docs.cohere.com/changelog/command-gets-refreshed) | Text | 128k | 4k | [Chat](/reference/chat) | +| `command-r-plus-04-2024` | Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use. | Text | 128k | 4k | [Chat](/reference/chat) | +| `command-r-plus` | `command-r-plus` is an alias for `command-r-plus-04-2024`, so if you use `command-r-plus` in the API, that's the model you're pointing to. | Text | 128k | 4k | [Chat](/reference/chat) | +| `command-r-08-2024` | `command-r-08-2024` is an update of the Command R model, delivered in August 2024. Find more information [here](https://docs.cohere.com/changelog/command-gets-refreshed) | Text | 128k | 4k | [Chat](/reference/chat) | +| `command-r-03-2024` | Command R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. | Text | 128k | 4k | [Chat](/reference/chat) | +| `command-r` | `command-r` is an alias for `command-r-03-2024`, so if you use `command-r` in the API, that's the model you're pointing to. | Text | 128k | 4k | [Chat](/reference/chat) | +| | | | | | | +| `command` | An instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. | Text | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize) | +| `command-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command`, that is `command-nightly`.

Be advised that `command-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | Text | 128k | 128k | [Chat](/reference/chat) | +| `command-light` | A smaller, faster version of `command`. Almost as capable, but a lot faster. | Text | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize-2) | +| `command-light-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command-light`, that is `command-light-nightly`.

Be advised that `command-light-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | Text | 4k | 4k | [Chat](/reference/chat) | +| `c4ai-aya-23-35b` | The 35B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-35B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | Text | 8k | 8k | [Chat](/reference/chat) | +| `c4ai-aya-23-8b` | The 8B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-8B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | Text | 8k | 8k | [Chat](/reference/chat) | + ### Using Command Models on Different Platforms @@ -71,16 +73,16 @@ In this table, we provide some important context for using Cohere Command models These models can be used to generate embeddings from text or classify it based on various parameters. Embeddings can be used for estimating semantic similarity between two sentences, choosing a sentence which is most likely to follow another sentence, or categorizing user feedback, while outputs from the Classify endpoint can be used for any classification or analysis task. The Representation model comes with a variety of helper functions, such as for detecting the language of an input. -| Model Name | Description | Dimensions | Context Length | Similarity Metric | Endpoints | -|-------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|----------------|---------------------|-------------------------------------------------------------------------------------------------------------| -| `embed-english-v3.0` | A model that allows for text to be classified or turned into embeddings. English only. | 1024 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| `embed-english-light-v3.0` | A smaller, faster version of `embed-english-v3.0`. Almost as capable, but a lot faster. English only. | 384 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| `embed-multilingual-v3.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | 1024 | 512 | Cosine Similarity | [Embed](/reference/embed), [Embed Jobs](/reference/embed-jobs) | -| `embed-multilingual-light-v3.0` | A smaller, faster version of `embed-multilingual-v3.0`. Almost as capable, but a lot faster. Supports multiple languages. | 384 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | -| | | | | | | -| `embed-english-v2.0` | Our older embeddings model that allows for text to be classified or turned into embeddings. English only | 4096 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | -| `embed-english-light-v2.0` | A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only. | 1024 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | -| `embed-multilingual-v2.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | 768 | 256 | Dot Product Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| Model Name | Description | Modalities | Dimensions | Context Length | Similarity Metric | Endpoints | +|-------------------------------|-------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------|------------|----------------|---------------------|----------------------------------------------------------------------| +| `embed-english-v3.0` | A model that allows for text to be classified or turned into embeddings. English only. | Text, Images | 1024 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| `embed-english-light-v3.0` | A smaller, faster version of `embed-english-v3.0`. Almost as capable, but a lot faster. English only. | Text, Images | 384 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| `embed-multilingual-v3.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | Text, Images | 1024 | 512 | Cosine Similarity | [Embed](/reference/embed), [Embed Jobs](/reference/embed-jobs) | +| `embed-multilingual-light-v3.0` | A smaller, faster version of `embed-multilingual-v3.0`. Almost as capable, but a lot faster. Supports multiple languages. | Text, Images | 384 | 512 | Cosine Similarity | [Embed](/reference/embed),
[Embed Jobs](/reference/embed-jobs) | +| | | | | | | | +| `embed-english-v2.0` | Our older embeddings model that allows for text to be classified or turned into embeddings. English only | Text | 4096 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| `embed-english-light-v2.0` | A smaller, faster version of embed-english-v2.0. Almost as capable, but a lot faster. English only. | Text | 1024 | 512 | Cosine Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | +| `embed-multilingual-v2.0` | Provides multilingual classification and embedding support. [See supported languages here.](/docs/supported-languages) | Text | 768 | 256 | Dot Product Similarity | [Classify](/reference/classify), [Embed](/reference/embed) | In this table we've listed older `v2.0` models alongside the newer `v3.0` models, but we recommend you use the `v3.0` versions. @@ -103,13 +105,13 @@ In this table, we provide some important context for using Cohere Embed models o The Rerank model can improve created models by re-organizing their results based on certain parameters. This can be used to improve search algorithms. -| Model Name | Description | Context Length | Endpoints | -| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | -------------------------------------------------- | -| `rerank-english-v3.0` | A model that allows for re-ranking English Language documents and semi-structured data (JSON). This model has a context length of 4096 tokens. | 4k | [Rerank](/reference/rerank) | -| `rerank-multilingual-v3.0` | A model for documents and semi-structure data (JSON) that are not in English. Supports the same languages as embed-multilingual-v3.0. This model has a context length of 4096 tokens. | 4k | [Rerank](/reference/rerank) | -| | | | | -| `rerank-english-v2.0` | A model that allows for re-ranking English language documents. | 512 | [Rerank](/reference/rerank) | -| `rerank-multilingual-v2.0` | A model for documents that are not in English. Supports the same languages as `embed-multilingual-v3.0`. | 512 | [Rerank](/reference/rerank) | +| Model Name | Description | Modalities | Context Length | Endpoints | +| -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------- | ---------------|---------------------------- | +| `rerank-english-v3.0` | A model that allows for re-ranking English Language documents and semi-structured data (JSON). This model has a context length of 4096 tokens. | Text | 4k | [Rerank](/reference/rerank) | +| `rerank-multilingual-v3.0` | A model for documents and semi-structure data (JSON) that are not in English. Supports the same languages as embed-multilingual-v3.0. This model has a context length of 4096 tokens. | Text | 4k | [Rerank](/reference/rerank) | +| | | | | | +| `rerank-english-v2.0` | A model that allows for re-ranking English language documents. | Text | 512 | [Rerank](/reference/rerank) | +| `rerank-multilingual-v2.0` | A model for documents that are not in English. Supports the same languages as `embed-multilingual-v3.0`. | Text | 512 | [Rerank](/reference/rerank) | ### Using Rerank Models on Different Platforms diff --git a/fern/pages/models/rerank-2.mdx b/fern/pages/models/rerank-2.mdx index 7cfc22c2..6cd64da3 100644 --- a/fern/pages/models/rerank-2.mdx +++ b/fern/pages/models/rerank-2.mdx @@ -13,12 +13,12 @@ updatedAt: "Mon Apr 08 2024 17:42:11 GMT+0000 (Coordinated Universal Time)" --- Rerank models sort text inputs by semantic relevance to a specified query. They are often used to sort search results returned from an existing search solution. Learn more about using Rerank in the [best practices guide](/docs/reranking-best-practices). -| Latest Model | Description | Max Tokens | Endpoints | -| -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------- | -------------------------------------------------- | -| `rerank-english-v3.0` | A model that allows for re-ranking English Language documents and semi-structured data (JSON). This model has a context length of 4096 tokens.. | N/A | [Rerank](/reference/rerank) | -| `rerank-multilingual-v3.0` | A model for documents and semi-structure data (JSON) that are not in English. Supports the same languages as `embed-multilingual-v3.0`. This model has a context length of 4096 tokens. | N/A | [Rerank](/reference/rerank) | -| `rerank-english-v2.0` | A model that allows for re-ranking English language documents. This model has a context length of 512 tokens. | N/A | [Rerank](/reference/rerank) | -| `rerank-multilingual-v2.0` | A model for documents that are not in English. Supports the same languages as `embed-multilingual-v3.0`. This model has a context length of 512 tokens. | N/A | [Rerank](/reference/rerank) | +| Latest Model | Description | Modality | Max Tokens | Endpoints | +| -------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ---------|------------|-------------------| +| `rerank-english-v3.0` | A model that allows for re-ranking English Language documents and semi-structured data (JSON). This model has a context length of 4096 tokens. | Text | N/A | [Rerank](/reference/rerank) | +| `rerank-multilingual-v3.0` | A model for documents and semi-structure data (JSON) that are not in English. Supports the same languages as `embed-multilingual-v3.0`. This model has a context length of 4096 tokens.| Text | N/A | [Rerank](/reference/rerank) | +| `rerank-english-v2.0` | A model that allows for re-ranking English language documents. This model has a context length of 512 tokens. | Text | N/A | [Rerank](/reference/rerank) | +| `rerank-multilingual-v2.0` | A model for documents that are not in English. Supports the same languages as `embed-multilingual-v3.0`. This model has a context length of 512 tokens. | Text | N/A | [Rerank](/reference/rerank) | Rerank accepts full strings and than tokens, so the token limit works a little differently. Rerank will automatically chunk documents longer than 4096 tokens, and there is therefore no explicit limit to how long a document can be when using rerank. See our [best practice guide](/docs/reranking-best-practices) for more info about formatting documents for the Rerank endpoint. diff --git a/fern/pages/models/the-command-family-of-models/command-beta.mdx b/fern/pages/models/the-command-family-of-models/command-beta.mdx index ebcc5e20..e12b98fc 100644 --- a/fern/pages/models/the-command-family-of-models/command-beta.mdx +++ b/fern/pages/models/the-command-family-of-models/command-beta.mdx @@ -16,12 +16,12 @@ updatedAt: 'Tue Jun 04 2024 18:34:22 GMT+0000 (Coordinated Universal Time)' -| Latest Model | Description | Context Length | Maximum Output Tokens | Endpoints | -|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------------------|-------------------------------------------------------------------------------------------| -| `command` | An instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize) | -| `command-light` | A smaller, faster version of `command`. Almost as capable, but a lot faster. | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize-2) | -| `command-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command`, that is `command-nightly`.

Be advised that `command-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | 128K | 4k | [Chat](/reference/chat) | -| `command-light-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command-light`, that is `command-light-nightly`.

Be advised that `command-light-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | 4k | 4k | [Chat](/reference/chat) | +| Latest Model | Description | Modality | Context Length | Maximum Output Tokens | Endpoints | +|---------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|----------------|-----------------------|-------------------------------------------------------------------| +| `command` | An instruction-following conversational model that performs language tasks with high quality, more reliably and with a longer context than our base generative models. | Text | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize) | +| `command-light` | A smaller, faster version of `command`. Almost as capable, but a lot faster. | Text | 4k | 4k | [Chat](/reference/chat),
[Summarize](/reference/summarize-2)| +| `command-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command`, that is `command-nightly`.

Be advised that `command-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | Text | 128K | 4k | [Chat](/reference/chat) | +| `command-light-nightly` | To reduce the time between major releases, we put out nightly versions of command models. For `command-light`, that is `command-light-nightly`.

Be advised that `command-light-nightly` is the latest, most experimental, and (possibly) unstable version of its default counterpart. Nightly releases are updated regularly, without warning, and are not recommended for production use. | Text | 4k | 4k | [Chat](/reference/chat) | diff --git a/fern/pages/models/the-command-family-of-models/command-r-plus.mdx b/fern/pages/models/the-command-family-of-models/command-r-plus.mdx index dab8e58c..2670f9fb 100644 --- a/fern/pages/models/the-command-family-of-models/command-r-plus.mdx +++ b/fern/pages/models/the-command-family-of-models/command-r-plus.mdx @@ -17,11 +17,11 @@ Command R+ is Cohere's newest large language model, optimized for conversational We recommend using Command R+ for those workflows that lean on complex RAG functionality and [multi-step tool use (agents)](/docs/multi-hop-tool-use). Command R, on the other hand, is great for simpler [retrieval augmented generation](/docs/retrieval-augmented-generation-rag) (RAG) and [single-step tool use](/docs/tool-use) tasks, as well as applications where price is a major consideration. ### Model Details -| Model Name | Description | Context Length | Maximum Output Tokens | Endpoints| -|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------------------|----------| -| `command-r-plus-08-2024` | `command-r-plus-08-2024` is an update of the Command R+ model, delivered in August 2024. | 128k | 4k | [Chat](/reference/chat) | | -| `command-r-plus-04-2024` | Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use. | 128k | 4k | [Chat](/reference/chat) | | -| `command-r-plus` | `command-r-plus` is an alias for `command-r-plus-04-2024`, so if you use `command-r-plus` in the API, that's the model you're pointing to. | 128k | 4k | [Chat](/reference/chat) | | +| Model Name | Description | Modality | Context Length | Maximum Output Tokens | Endpoints | +|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|----------------|-----------------------|------------------------| +| `command-r-plus-08-2024` | `command-r-plus-08-2024` is an update of the Command R+ model, delivered in August 2024. | Text | 128k | 4k | [Chat](/reference/chat)| +| `command-r-plus-04-2024` | Command R+ is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It is best suited for complex RAG workflows and multi-step tool use. | Text | 128k | 4k | [Chat](/reference/chat)| +| `command-r-plus` | `command-r-plus` is an alias for `command-r-plus-04-2024`, so if you use `command-r-plus` in the API, that's the model you're pointing to. | Text | 128k | 4k | [Chat](/reference/chat)| ## Command R+ August 2024 Release Cohere's flagship text-generation models, Command R and Command R+, received a substantial update in August 2024. We chose to designate these models with time stamps, so in the API Command R+ 08-2024 is accesible with `command-r-plus-08-2024`. diff --git a/fern/pages/models/the-command-family-of-models/command-r.mdx b/fern/pages/models/the-command-family-of-models/command-r.mdx index d6135876..0a881dfe 100644 --- a/fern/pages/models/the-command-family-of-models/command-r.mdx +++ b/fern/pages/models/the-command-family-of-models/command-r.mdx @@ -18,13 +18,13 @@ Command R is a large language model optimized for conversational interaction and Command R boasts high precision on [retrieval augmented generation](/docs/retrieval-augmented-generation-rag) (RAG) and tool use tasks, low latency and high throughput, a long 128,000-token context length, and strong capabilities across 10 key languages. ### Model Details -| Model Name | Description | Context Length | Maximum Output Tokens | Endpoints| -|--------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|-----------------------|----------| -| `command-r-08-2024` | `command-r-08-2024` is an update of the Command R model, delivered in August 2024. | 128k | 4k | [Chat](/reference/chat) | | -| `command-r-03-2024` | Command R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. | 128k | 4k | [Chat](/reference/chat) | | -| `command-r` | `command-r` is an alias for `command-r-03-2024`, so if you use `command-r` in the API, that's the model you're pointing to. | 128k | 4k | [Chat](/reference/chat) | | -| `c4ai-aya-23-35b` | The 35B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-35B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | 8k | 8k | [Chat](/reference/chat) | -| `c4ai-aya-23-8b` | The 8B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-8B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | 8k | 8k | [Chat](/reference/chat) | +| Model Name | Description | Modality | Context Length | Maximum Output Tokens | Endpoints| +|--------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|----------------|-----------------------|----------| +| `command-r-08-2024` | `command-r-08-2024` is an update of the Command R model, delivered in August 2024. | Text | 128k | 4k | [Chat](/reference/chat) | | +| `command-r-03-2024` | Command R is an instruction-following conversational model that performs language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents. | Text | 128k | 4k | [Chat](/reference/chat) | | +| `command-r` | `command-r` is an alias for `command-r-03-2024`, so if you use `command-r` in the API, that's the model you're pointing to. | Text | 128k | 4k | [Chat](/reference/chat) | | +| `c4ai-aya-23-35b` | The 35B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-35B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | Text | 8k | 8k | [Chat](/reference/chat) | +| `c4ai-aya-23-8b` | The 8B version of the [Aya 23 model](https://huggingface.co/CohereForAI/aya-23-8B). Pairs a highly performant pre-trained Command family of models with the [Aya Collection](https://huggingface.co/datasets/CohereForAI/aya_collection). Serves 23 languages. | Text | 8k | 8k | [Chat](/reference/chat) | ## Command R August 2024 Release Cohere's flagship text-generation models, Command R and Command R+, received a substantial update in August 2024. We chose to designate these models with time stamps, so in the API Command R 08-2024 is accesible with `command-r-08-2024`. diff --git a/fern/pages/text-embeddings/embeddings.mdx b/fern/pages/text-embeddings/embeddings.mdx index 528f6c17..6b14be43 100644 --- a/fern/pages/text-embeddings/embeddings.mdx +++ b/fern/pages/text-embeddings/embeddings.mdx @@ -46,7 +46,12 @@ calculate_similarity(soup1, london) # 0.16 - not similar! ## The `input_type` parameter -Cohere embeddings are optimized for different types of inputs. For example, when using embeddings for semantic search, the search query should be embedded by setting `input_type="search_query"` whereas the text passages that are being searched over should be embedded with `input_type="search_document"`. You can find more details and a code snippet in the [Semantic Search guide](/docs/semantic-search). Similarly, the input type can be set to `classification` ([example](/page/text-classification-using-embeddings)) and `clustering` to optimize the embeddings for those use cases. +Cohere embeddings are optimized for different types of inputs. + +- When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"` +- When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`. +- When using embedding for [`classification`](/docs/text-classification-with-embed) and `clustering` tasks, you can set `input_type` to either 'classification' or 'clustering' to optimize the embeddings appropriately. +- When `input_type='image'`, the expected input to be embedded is an image instead of text. ## Multilingual Support @@ -73,6 +78,51 @@ print(embeddings[0][:5]) # Print embeddings for the first text ``` +## Image Embeddings + +The Cohere embedding platform supports image embeddings for the entire of `embed-v3.0` family. This functionality can be utilized with the following steps: + +- Pass `image` to the `input_type` parameter (as discussed above). +- Pass your image URL to the new `images` parameter. + +Be aware that image embedding has the following restrictions: + +- If `input_type='image'`, the `texts` field must be empty. +- The original image file type must be `png` or `jpeg`. +- The image must be base64 encoded and sent as a Data URL to the `images` parameter. +- Our API currently does not support batch image embeddings. + +```python PYTHON +import cohere +co = cohere.Client(api_key="") + +# The model accepts input in base64 as a Data URL + +def image_to_base64_data_url(image_path): + # Open the image file + with Image.open(image_path) as img: + # Create a BytesIO object to hold the image data in memory + buffered = BytesIO() + # Save the image as PNG to the BytesIO object + img.save(buffered, format="PNG") + # Encode the image data in base64 + img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8") + + # Create the Data URL and assumes the original image file type was png + data_url = f"data:image/png;base64,{img_base64}" + return data_url + +processed_image = image_to_base64_data_url("") + +ret = co.embed( + images=[processed_image], + model='embed-english-v3.0', + embedding_types= ["float"], + input_type='image') + +ret.embeddings.float +``` + ## Compression Levels The Cohere embeddings platform now supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output. diff --git a/fern/pages/text-embeddings/multimodal-embeddings.mdx b/fern/pages/text-embeddings/multimodal-embeddings.mdx new file mode 100644 index 00000000..9ff75a50 --- /dev/null +++ b/fern/pages/text-embeddings/multimodal-embeddings.mdx @@ -0,0 +1,77 @@ +--- +title: "Multimodal Embeddings" +slug: "docs/multimodal-embeddings" + +hidden: false +description: "Multimodal embeddings convert text and images into embeddings for search and classification." +image: "../../assets/images/fa074c3-cohere_docs_preview_image_1200x630_copy.jpg" +keywords: "vector embeddings, image embeddings, images, multimodal, multimodal embeddings, embeddings, natural language processing" + +createdAt: "Tues Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time)" +updatedAt: "Tues Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time)" +--- +embeddings. + + + You can find the API reference for the api [here](/reference/embed) + + Image capabilities is only compatible with our embed v3.0 models + + +In this guide, we show you how to use the embed endpoint to embed a series of images. This guide uses a simple dataset of graphs to illustrate how semantic search can be done over images with Cohere. To see an end-to-end example of retrieval, check out this [notebook](https://github.com/cohere-ai/notebooks/blob/main/notebooks/Multimodal_Semantic_Search.ipynb). + +### Introduction to Multimodal Embeddings + +Information is often represented in multiple modalities. A document, for instance, may contain text, images, and graphs, while a product can be described through images, its title, and a written description. This combination of elements often leads to a comprehensive semantic understanding of the subject matter. Traditional embedding models have been limited to a single modality, and even multimodal embedding models often suffer from degradation in `text-to-text` or `text-to-image` retrieval tasks. The `embed-v3.0` series of models, however, is fully multimodal, enabling it to embed both images and text effectively. We have achieved state-of-the-art performance without compromising text-to-text retrieval capabilities. + +### How to use Multimodal Embeddings + +#### 1\. Prepare your Image for Embeddings + +The Embed API takes in images with the following file formats: `png`, `jpeg`,`Webp`, and `gif`. The images must then be formatted as a Data URL. + +```python PYTHON +# Import the necessary packages +import os +import base64 + +# Defining the function to convert an image to a base 64 Data URL +def image_to_base64_data_url(image_path): + _, file_extension = os.path.splitext(image_path) + file_type=(file_extension[1:]) + + with open(image_path, "rb") as f: + enc_img = base64.b64encode(f.read()).decode('utf-8') + enc_img = f"data:image/{file_type};base64,{enc_img}" + return enc_img + +image_path='' +processed_image=image_to_base64_data_url(image_path) +``` +#### 2\. Call the Embed Endpoint +```python PYTHON +# Import the necessary packages +import cohere +co = cohere.Client(api_key="") + +co.embed( + model='embed-english-v3.0', + images=[processed_image], + input_type='image' +) +``` +## Sample Output +Below is a sample of what the output would look like if you passed in a `jpeg` with original dimensions of `1080x1350` with a standard bit-depth of 24. +```json JSON +{ + 'id': '0d9bb922-f15f-4b8b-9a2f-72577324528f', + 'texts': [], + 'images': [{'width': 1080, 'height': 1350, 'format': 'jpeg', 'bit_depth': 24}], + 'embeddings': {'float': [[-0.035369873, 0.040740967, 0.008262634, -0.008766174, .....]]}, + 'meta': { + 'api_version': {'version': '1'}, + 'billed_units': {'images': 1} + }, + 'response_type': 'embeddings_by_type' +} +``` diff --git a/fern/v1.yml b/fern/v1.yml index 48ffd9bd..33f814b6 100644 --- a/fern/v1.yml +++ b/fern/v1.yml @@ -137,8 +137,8 @@ navigation: contents: - page: Introduction to Embeddings at Cohere path: pages/text-embeddings/embeddings.mdx - - page: Semantic Search with Embeddings - path: pages/text-embeddings/semantic-search-embed.mdx + - page: Multimodal Embeddings + path: pages/text-embeddings/multimodal-embeddings.mdx - page: Batch Embedding Jobs path: pages/text-embeddings/embed-jobs-api.mdx - section: Reranking diff --git a/fern/v2.yml b/fern/v2.yml index 5e6031f6..60b5fa62 100644 --- a/fern/v2.yml +++ b/fern/v2.yml @@ -117,9 +117,9 @@ navigation: - section: Text Embeddings (Vectors, Search, Retrieval) contents: - page: Introduction to Embeddings at Cohere - path: pages/v2/text-embeddings/embeddings.mdx - - page: Semantic Search with Embeddings - path: pages/v2/text-embeddings/semantic-search-embed.mdx + path: pages/text-embeddings/embeddings.mdx + - page: Multimodal Embeddings + path: pages/text-embeddings/multimodal-embeddings.mdx - page: Batch Embedding Jobs path: pages/v2/text-embeddings/embed-jobs-api.mdx - section: Reranking