diff --git a/fern/pages/text-embeddings/embeddings.mdx b/fern/pages/text-embeddings/embeddings.mdx index 7c30f6bc..fb8aa22a 100644 --- a/fern/pages/text-embeddings/embeddings.mdx +++ b/fern/pages/text-embeddings/embeddings.mdx @@ -46,7 +46,12 @@ calculate_similarity(soup1, london) # 0.16 - not similar! ## The `input_type` parameter -Cohere embeddings are optimized for different types of inputs. For example, when using embeddings for semantic search, the search query should be embedded by setting `input_type="search_query"` whereas the text passages that are being searched over should be embedded with `input_type="search_document"`. You can find more details and a code snippet in the [Semantic Search guide](/docs/semantic-search). Similarly, the input type can be set to `classification` ([example](/docs/text-classification-with-embed)) and `clustering` to optimize the embeddings for those use cases. +Cohere embeddings are optimized for different types of inputs. + +- When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"` +- When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`. +- When using embedding for `classification` ([example](/docs/text-classification-with-embed)) and `clustering` tasks, you can set `input_type` to either of these values to optimize the embeddings appropriately. +- When `input_type=image`, the expected input to be embedded is an image instead of text. ## Multilingual Support @@ -73,6 +78,47 @@ print(embeddings[0][:5]) # Print embeddings for the first text ``` +## Image Embeddings + +The Cohere embedding platform supports image embeddings for `embed-english-v3.0` and `embed-multilingual-v3.0`. This functionality can be utilized with the following steps: + +- Pass `image` to the `input_type` parameter (as discussed above). +- Pass your image URL to the new `images` parameter. + +Be aware that image embedding has the following restrictions: + +- If `input_type='images'`, the `texts` field must be empty. +- The original image file type must be `png` or `jpeg`. +- The image must be base64 encoded and sent as a URL to the `images` parameter. +- Our API currently does not support batch image embeddings. + +```python PYTHON +# The model accepts input in base64 + +def image_to_base64_data_url(image_path): + # Open the image file + with Image.open(image_path) as img: + # Create a BytesIO object to hold the image data in memory + buffered = BytesIO() + # Save the image as PNG to the BytesIO object + img.save(buffered, format="PNG") + # Encode the image data in base64 + img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8") + + # Create the Data URL + data_url = f"data:image/png;base64,{img_base64}" + return data_url + +data_url = image_to_base64_data_url("") + +ret = co.embed( + images=[encoded_image] + model='embed-english-v3.0', + input_type='image') + +ret.embeddings +``` + ## Compression Levels The Cohere embeddings platform now supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.