Skip to content

Commit

Permalink
Update hybrid example (#2861)
Browse files Browse the repository at this point in the history
  • Loading branch information
databyjp authored Dec 10, 2024
1 parent 9cf4160 commit 6301f5c
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 5 deletions.
25 changes: 23 additions & 2 deletions _includes/code/howto/search.hybrid.py
Original file line number Diff line number Diff line change
Expand Up @@ -302,10 +302,10 @@
# End test

# =========================================
# ===== Hybrid with vector similarity =====
# ===== Hybrid with vector parameters =====
# =========================================

# START VectorSimilarityPython
# START VectorParametersPython
from weaviate.classes.query import HybridVector, Move, HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
Expand All @@ -321,6 +321,27 @@
alpha=0.75,
limit=5,
)
# END VectorParametersPython

assert len(response.objects) <= 5
assert len(response.objects) > 0

# =========================================
# ===== Hybrid with vector similarity threshold =====
# =========================================

# START VectorSimilarityPython
from weaviate.classes.query import HybridVector, Move, HybridFusion

jeopardy = client.collections.get("JeopardyQuestion")
response = jeopardy.query.hybrid(
query="California",
# highlight-start
max_vector_distance=0.4, # Maximum threshold for the vector search component
# highlight-end
alpha=0.75,
limit=5,
)
# END VectorSimilarityPython

assert len(response.objects) <= 5
Expand Down
14 changes: 14 additions & 0 deletions developers/weaviate/concepts/search/hybrid-search.md
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,20 @@ The alpha value determines the weight of the vector search results in the final
- `alpha > 0.5`: More weight to vector search
- `alpha < 0.5`: More weight to keyword search

## Search Thresholds

Hybrid search supports a maximum vector distance threshold through the `max vector distance` parameter.

This threshold applies only to the vector search component of the hybrid search, allowing you to filter out results that are too dissimilar in vector space, regardless of their keyword search scores.

For example, consider a maximum vector distance of `0.3`. This means objects with a vector distance higher than `0.3` will be excluded from the hybrid search results, even if they have high keyword search scores.

This can be useful when you want to ensure semantic similarity meets a minimum standard while still taking advantage of keyword matching.

There is no equivalent threshold parameter for the keyword (BM25) component of hybrid search or the final combined scores.

This is because BM25 scores are not normalized or bounded like vector distances, making a universal threshold less meaningful.

### Further resources

- [How-to: Search](../../search/index.md)
Expand Down
32 changes: 29 additions & 3 deletions developers/weaviate/search/hybrid.md
Original file line number Diff line number Diff line change
Expand Up @@ -623,14 +623,14 @@ The output is like this:
:::info Added in `v1.25`
:::

You can specify [vector similarity search](/developers/weaviate/search/similarity) parameters similar to [near text](/developers/weaviate/search/similarity.md#search-with-text) or [near vector](/developers/weaviate/search/similarity.md#search-with-a-vector) searches, such as `group by` and `move to` / `move away`. An equvalent `distance` [threshold for vector search](./similarity.md#set-a-similarity-threshold) can be specified with the `max vector distance` parameter.
You can specify [vector similarity search](/developers/weaviate/search/similarity) parameters similar to [near text](/developers/weaviate/search/similarity.md#search-with-text) or [near vector](/developers/weaviate/search/similarity.md#search-with-a-vector) searches, such as `group by` and `move to` / `move away`. An equivalent `distance` [threshold for vector search](./similarity.md#set-a-similarity-threshold) can be specified with the `max vector distance` parameter.

<Tabs groupId="languages">
<TabItem value="py" label="Python Client v4">
<FilteredTextBlock
text={PyCode}
startMarker="# START VectorSimilarityPython"
endMarker="# END VectorSimilarityPython"
startMarker="# START VectorParametersPython"
endMarker="# END VectorParametersPython"
language="python"
/>
</TabItem>
Expand Down Expand Up @@ -667,6 +667,32 @@ The output is like this:

</details>

## Hybrid search thresholds

:::info Added in `v1.25`
:::

The only available search threshold is `max vector distance`, which will set the maximum allowable distance for the vector search component.

<Tabs groupId="languages">
<TabItem value="py" label="Python Client v4">
<FilteredTextBlock
text={PyCode}
startMarker="# START VectorSimilarityPython"
endMarker="# END VectorSimilarityPython"
language="python"
/>
</TabItem>

<TabItem value="js" label="JS/TS Client v3">

```ts
// TS support coming soon
```

</TabItem>
</Tabs>

## Group results

:::info Added in `v1.25`
Expand Down

0 comments on commit 6301f5c

Please sign in to comment.