Updating the embeddings doc.

cohere-ai · Sep 2, 2024 · 8205162 · 8205162
1 parent b673768
commit 8205162
Showing 1 changed file with 47 additions and 1 deletion.
diff --git a/fern/pages/text-embeddings/embeddings.mdx b/fern/pages/text-embeddings/embeddings.mdx
@@ -46,7 +46,12 @@ calculate_similarity(soup1, london) # 0.16 - not similar!
 
 ## The `input_type` parameter
 
-Cohere embeddings are optimized for different types of inputs. For example, when using embeddings for semantic search, the search query should be embedded by setting `input_type="search_query"` whereas the text passages that are being searched over should be embedded with `input_type="search_document"`.  You can find more details and a code snippet in the [Semantic Search guide](/docs/semantic-search). Similarly, the input type can be set to `classification` ([example](/docs/text-classification-with-embed)) and `clustering` to optimize the embeddings for those use cases.
+Cohere embeddings are optimized for different types of inputs.
+
+- When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"`
+- When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`.
+- When using embedding for `classification` ([example](/docs/text-classification-with-embed)) and `clustering` tasks, you can set `input_type` to either of these values to optimize the embeddings appropriately.
+- When `input_type=image`, the expected input to be embedded is an image instead of text.
 
 ## Multilingual Support
 
@@ -73,6 +78,47 @@ print(embeddings[0][:5]) # Print embeddings for the first text
 
 ```
 
+## Image Embeddings
+
+The Cohere embedding platform supports image embeddings for `embed-english-v3.0` and `embed-multilingual-v3.0`. This functionality can be utilized with the following steps:
+
+- Pass `image` to the `input_type` parameter (as discussed above). 
+- Pass your image URL to the new `images` parameter.
+
+Be aware that image embedding has the following restrictions:
+
+- If `input_type='images'`, the `texts` field must be empty.
+- The original image file type must be `png` or `jpeg`.
+- The image must be base64 encoded and sent as a URL to the `images` parameter. 
+- Our API currently does not support batch image embeddings.
+
+```python PYTHON
+# The model accepts input in base64
+
+def image_to_base64_data_url(image_path):
+    # Open the image file
+    with Image.open(image_path) as img:
+        # Create a BytesIO object to hold the image data in memory
+        buffered = BytesIO()
+        # Save the image as PNG to the BytesIO object
+        img.save(buffered, format="PNG")
+        # Encode the image data in base64
+        img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8")
+
+    # Create the Data URL
+    data_url = f"data:image/png;base64,{img_base64}"
+    return data_url
+
+data_url = image_to_base64_data_url("<PATH_TO_IMAGE>")
+
+ret = co.embed(
+							 images=[encoded_image]
+               model='embed-english-v3.0',
+               input_type='image')
+
+ret.embeddings
+```
+
 ## Compression Levels
 
 The Cohere embeddings platform now supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.