Skip to content

Commit

Permalink
Updating the embeddings doc.
Browse files Browse the repository at this point in the history
  • Loading branch information
Trent Fowler authored and Trent Fowler committed Sep 2, 2024
1 parent b673768 commit 8205162
Showing 1 changed file with 47 additions and 1 deletion.
48 changes: 47 additions & 1 deletion fern/pages/text-embeddings/embeddings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ calculate_similarity(soup1, london) # 0.16 - not similar!

## The `input_type` parameter

Cohere embeddings are optimized for different types of inputs. For example, when using embeddings for semantic search, the search query should be embedded by setting `input_type="search_query"` whereas the text passages that are being searched over should be embedded with `input_type="search_document"`. You can find more details and a code snippet in the [Semantic Search guide](/docs/semantic-search). Similarly, the input type can be set to `classification` ([example](/docs/text-classification-with-embed)) and `clustering` to optimize the embeddings for those use cases.
Cohere embeddings are optimized for different types of inputs.

- When using embeddings for [semantic search](/docs/semantic-search), the search query should be embedded by setting `input_type="search_query"`
- When using embeddings for semantic search, the text passages that are being searched over should be embedded with `input_type="search_document"`.
- When using embedding for `classification` ([example](/docs/text-classification-with-embed)) and `clustering` tasks, you can set `input_type` to either of these values to optimize the embeddings appropriately.
- When `input_type=image`, the expected input to be embedded is an image instead of text.

## Multilingual Support

Expand All @@ -73,6 +78,47 @@ print(embeddings[0][:5]) # Print embeddings for the first text

```

## Image Embeddings

The Cohere embedding platform supports image embeddings for `embed-english-v3.0` and `embed-multilingual-v3.0`. This functionality can be utilized with the following steps:

- Pass `image` to the `input_type` parameter (as discussed above).
- Pass your image URL to the new `images` parameter.

Be aware that image embedding has the following restrictions:

- If `input_type='images'`, the `texts` field must be empty.
- The original image file type must be `png` or `jpeg`.
- The image must be base64 encoded and sent as a URL to the `images` parameter.
- Our API currently does not support batch image embeddings.

```python PYTHON
# The model accepts input in base64

def image_to_base64_data_url(image_path):
# Open the image file
with Image.open(image_path) as img:
# Create a BytesIO object to hold the image data in memory
buffered = BytesIO()
# Save the image as PNG to the BytesIO object
img.save(buffered, format="PNG")
# Encode the image data in base64
img_base64 = base64.b64encode(buffered.getvalue()).decode("utf-8")

# Create the Data URL
data_url = f"data:image/png;base64,{img_base64}"
return data_url

data_url = image_to_base64_data_url("<PATH_TO_IMAGE>")

ret = co.embed(
images=[encoded_image]
model='embed-english-v3.0',
input_type='image')

ret.embeddings
```

## Compression Levels

The Cohere embeddings platform now supports compression. The Embed API features an `embeddings_types` parameter which allows the user to specify various ways of compressing the output.
Expand Down

0 comments on commit 8205162

Please sign in to comment.