Update embeddings docs (#217)

* update embeddings docs * fix path image
cohere-ai · Oct 28, 2024 · 737741c · 737741c
1 parent ecc1faf
commit 737741c
Show file tree

Hide file tree

Showing 4 changed files with 169 additions and 22 deletions.
diff --git a/fern/pages/text-embeddings/embeddings.mdx b/fern/pages/text-embeddings/embeddings.mdx
@@ -50,7 +50,7 @@ Cohere embeddings are optimized for different types of inputs.
 
 - When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"`
 - When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`.
-- When using embedding for [`classification`](/docs/text-classification-with-embed) and `clustering` tasks, you can set `input_type` to either 'classification' or 'clustering' to optimize the embeddings appropriately.
+- When using embedding for `classification` and `clustering` tasks, you can set `input_type` to either 'classification' or 'clustering' to optimize the embeddings appropriately.
 - When `input_type='image'`, the expected input to be embedded is an image instead of text.
 
 ## Multilingual Support
@@ -93,7 +93,11 @@ Be aware that image embedding has the following restrictions:
 - Our API currently does not support batch image embeddings.
 
 ```python PYTHON
-import cohere  
+import cohere
+from PIL import Image
+from io import BytesIO
+import base64 
+
 co = cohere.Client(api_key="<YOUR API KEY>")
 
 # The model accepts input in base64 as a Data URL
@@ -125,9 +129,9 @@ ret.embeddings.float
 
 ## Compression Levels
 
-The Cohere embeddings platform now supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.  
+The Cohere embeddings platform supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.  
 
-The following embedding types are now supported: 
+The following embedding types are supported: 
 
 - `float`
 - `int8`
@@ -145,7 +149,7 @@ ret = co.embed(texts=phrases,
 ret.embeddings # This contains the float embeddings
 ```
 
-However we recommend being explicit about the `embedding type(s)`. To specify an `embedding type`, pass one of the types from the list above in as list containing a string:
+However we recommend being explicit about the `embedding type(s)`. To specify an embedding types, pass one of the types from the list above in as list containing a string:
 
 ```python PYTHON
 ret = co.embed(texts=phrases,

diff --git a/fern/pages/v2/text-embeddings/embeddings.mdx b/fern/pages/v2/text-embeddings/embeddings.mdx
@@ -5,7 +5,7 @@ slug: "v2/docs/embeddings"
 hidden: false
 description: >-
   Embeddings transform text into numerical data, enabling language-agnostic
-  similarity searches and efficient storage with compression.
+  similarity searches and efficient storage with compression (API v2).
 image: "../../../assets/images/fa074c3-cohere_docs_preview_image_1200x630_copy.jpg"  
 keywords: "vector embeddings, embeddings, natural language processing"
 
@@ -48,11 +48,16 @@ calculate_similarity(soup1, london) # 0.16 - not similar!
 
 ## The `input_type` parameter
 
-Cohere embeddings are optimized for different types of inputs. For example, when using embeddings for semantic search, the search query should be embedded by setting `input_type="search_query"` whereas the text passages that are being searched over should be embedded with `input_type="search_document"`.  You can find more details and a code snippet in the [Semantic Search guide](/v2/docs/semantic-search). Similarly, the input type can be set to `classification` ([example](/v2/docs/text-classification-with-cohere)) and `clustering` to optimize the embeddings for those use cases.
+Cohere embeddings are optimized for different types of inputs.
+
+- When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"`
+- When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`.
+- When using embedding for `classification` and `clustering` tasks, you can set `input_type` to either 'classification' or 'clustering' to optimize the embeddings appropriately.
+- When `input_type='image'`, the expected input to be embedded is an image instead of text.
 
 ## Multilingual Support
 
-In addition to `embed-english-v3.0` we offer a best-in-class multilingual model [embed-multilingual-v3.0](/v2/docs/embed-2#multi-lingual-models)  with support for over 100 languages, including Chinese, Spanish, and French. This model can be used with the Embed API, just like its English counterpart:
+In addition to `embed-english-v3.0` we offer a best-in-class multilingual model [embed-multilingual-v3.0](/docs/embed-2#multi-lingual-models)  with support for over 100 languages, including Chinese, Spanish, and French. This model can be used with the Embed API, just like its English counterpart:
 
 ```python PYTHON
 import cohere  
@@ -75,44 +80,90 @@ print(embeddings[0][:5]) # Print embeddings for the first text
 
 ```
 
+## Image Embeddings
+
+The Cohere embedding platform supports image embeddings for the entire of `embed-v3.0` family. This functionality can be utilized with the following steps:
+
+- Pass `image` to the `input_type` parameter (as discussed above). 
+- Pass your image URL to the new `images` parameter.
+
+Be aware that image embedding has the following restrictions:
+
+- If `input_type='image'`, the `texts` field must be empty.
+- The original image file type must be `png` or `jpeg`.
+- The image must be base64 encoded and sent as a Data URL to the `images` parameter. 
+- Our API currently does not support batch image embeddings.
+
+```python PYTHON
+import cohere
+from PIL import Image
+from io import BytesIO
+import base64 
+
+co = cohere.ClientV2(api_key="<YOUR API KEY>")
+
+# The model accepts input in base64 as a Data URL
+
+def image_to_base64_data_url(image_path):
+    # Open the image file
+    with Image.open(image_path) as img:
+        # Create a BytesIO object to hold the image data in memory
+        buffered = BytesIO()
+        # Save the image as PNG to the BytesIO object
+        img.save(buffered, format="PNG")
+        # Encode the image data in base64
+        img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
+
+    # Create the Data URL and assumes the original image file type was png
+    data_url = f"data:image/png;base64,{img_base64}"
+    return data_url
+
+processed_image = image_to_base64_data_url("<PATH_TO_IMAGE>")
+
+ret = co.embed(images=[processed_image],
+               model='embed-english-v3.0',
+               embedding_types= ["float"],
+               input_type='image')
+
+ret.embeddings.float
+```
+
 ## Compression Levels
 
-The Cohere embeddings platform supports compression. The Embed API features a required parameter, `embeddings_types`, which allows the user to specify various ways of compressing the output.  
+The Cohere embeddings platform supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.
 
-The following embedding types are now supported: 
+In the v2 API, this is a required parameter for calling the Embed endpoint.
+
+The following embedding types are supported:
 
 - `float`
 - `int8`
 - `unint8`
 - `binary`
 - `ubinary`
 
-To specify an `embedding type`, pass one of the types from the list above in as list containing a string:
+ To specify an embedding types, pass one of the types from the list above in as list containing a string:
 
 ```python PYTHON
 ret = co.embed(texts=phrases,
                model=model,
                input_type=input_type,
-               embedding_types=['int8'])
+               embedding_types= ["float"])
 
-ret.embeddings.int8 # This contains your int8 embeddings
-ret.embeddings.float # This will be empty
-ret.embeddings.uint8 # This will be empty
-ret.embeddings.ubinary # This will be empty
-ret.embeddings.binary # This will be empty
+ret.embeddings # This contains the float embeddings
 ```
 
-Finally, you can also pass several `embedding_types` in as a list, in which case the endpoint will return a dictionary with both types available:
+You can specify multiple embedding types in a single call. For example, the following call will return both `int8` and `float` embeddings:
 
 ```python PYTHON
 ret = co.embed(texts=phrases,
                model=model,
                input_type=input_type,
-               embedding_types=['int8', 'float'])
+               embedding_types=["int8", "float"])
 
 ret.embeddings.int8 # This contains your int8 embeddings
 ret.embeddings.float # This contains your float embeddings
 ret.embeddings.uint8 # This will be empty
 ret.embeddings.ubinary # This will be empty
 ret.embeddings.binary # This will be empty
-```
+```
diff --git a/fern/pages/v2/text-embeddings/multimodal-embeddings.mdx b/fern/pages/v2/text-embeddings/multimodal-embeddings.mdx
@@ -0,0 +1,92 @@
+---
+title: "Multimodal Embeddings"
+slug: "v2/docs/multimodal-embeddings"
+
+hidden: false
+description: "Multimodal embeddings convert text and images into embeddings for search and classification (API v2)."
+image: "../../../assets/images/fa074c3-cohere_docs_preview_image_1200x630_copy.jpg"  
+keywords: "vector embeddings, image embeddings, images, multimodal, multimodal embeddings, embeddings, natural language processing"
+
+createdAt: "Tues Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time)"
+updatedAt: "Tues Sep 17 2024 00:00:00 GMT+0000 (Coordinated Universal Time)"
+---
+<img src='../../../assets/images/multi-modal-guide-header.png' alt='embeddings.' />
+
+<Note title="This Guide Uses the Embed API.">  
+ You can find the API reference for the api [here](/reference/embed)
+
+ Image capabilities is only compatible with our embed v3.0 models
+</Note>
+
+In this guide, we show you how to use the embed endpoint to embed a series of images. This guide uses a simple dataset of graphs to illustrate how semantic search can be done over images with Cohere. To see an end-to-end example of retrieval, check out this [notebook](https://github.com/cohere-ai/notebooks/blob/main/notebooks/Multimodal_Semantic_Search.ipynb).
+
+### Introduction to Multimodal Embeddings
+
+Information is often represented in multiple modalities. A document, for instance, may contain text, images, and graphs, while a product can be described through images, its title, and a written description. This combination of elements often leads to a comprehensive semantic understanding of the subject matter. Traditional embedding models have been limited to a single modality, and even multimodal embedding models often suffer from degradation in `text-to-text` or `text-to-image` retrieval tasks. The `embed-v3.0` series of models, however, is fully multimodal, enabling it to embed both images and text effectively. We have achieved state-of-the-art performance without compromising text-to-text retrieval capabilities.
+
+### How to use Multimodal Embeddings
+
+#### 1\. Prepare your Image for Embeddings
+
+The Embed API takes in images with the following file formats: `png`, `jpeg`,`Webp`, and `gif`. The images must then be formatted as a Data URL.
+
+```python PYTHON
+# Import the necessary packages
+import os
+import base64
+
+# Defining the function to convert an image to a base 64 Data URL
+def image_to_base64_data_url(image_path):
+  _, file_extension = os.path.splitext(image_path)
+  file_type=(file_extension[1:])
+
+  with open(image_path, "rb") as f:
+    enc_img = base64.b64encode(f.read()).decode('utf-8')
+    enc_img = f"data:image/{file_type};base64,{enc_img}"
+  return enc_img
+
+image_path='<YOUR IMAGE PATH>'
+processed_image=image_to_base64_data_url(image_path)
+```
+#### 2\. Call the Embed Endpoint
+```python PYTHON
+# Import the necessary packages
+import cohere
+co = cohere.ClientV2(api_key="<YOUR API KEY>")
+
+co.embed(
+    model='embed-english-v3.0',
+    images=[processed_image],
+    input_type='image',
+    embedding_types=['float']
+)
+```
+## Sample Output
+Below is a sample of what the output would look like if you passed in a `jpeg` with original dimensions of `1080x1350` with a standard bit-depth of 24.
+```json JSON
+{
+    "id": "d8f2b461-79a4-44ee-82e4-be601bbb07be",
+    "embeddings": {
+        "float_": [[-0.025604248, 0.0154418945, ...]],
+        "int8": null,
+        "uint8": null,
+        "binary": null,
+        "ubinary": null,
+    },
+    "texts": [],
+    "meta": {
+        "api_version": {"version": "2", "is_deprecated": null, "is_experimental": null},
+        "billed_units": {
+            "input_tokens": null,
+            "output_tokens": null,
+            "search_units": null,
+            "classifications": null,
+            "images": 1,
+        },
+        "tokens": null,
+        "warnings": null,
+    },
+    "images": [{"width": 1080, "height": 1080, "format": "jpeg", "bit_depth": 24}],
+    "response_type": "embeddings_by_type",
+}
+```
diff --git a/fern/v2.yml b/fern/v2.yml
@@ -117,9 +117,9 @@ navigation:
       - section: Text Embeddings (Vectors, Search, Retrieval)
         contents:
           - page: Introduction to Embeddings at Cohere
-            path: pages/text-embeddings/embeddings.mdx
+            path: pages/v2/text-embeddings/embeddings.mdx
           - page: Multimodal Embeddings
-            path: pages/text-embeddings/multimodal-embeddings.mdx
+            path: pages/v2/text-embeddings/multimodal-embeddings.mdx
           - page: Batch Embedding Jobs
             path: pages/v2/text-embeddings/embed-jobs-api.mdx
           - section: Reranking