Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Haystack OPEA Integration #222

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 59 additions & 0 deletions community/rfcs/24-10-20-OPEA-001-Haystack-Integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# 24-10-20-OPEA-001-Haystack-Integration

## Author

[gadmarkovits](https://github.com/gadmarkovits)

## Status

Under Review

## Objective

Create a Haystack integration for OPEA that will enable the use of OPEA components within a Haystack pipeline.

## Motivation

Haystack is a production-ready open source AI framework that is used by many AI practitioners. It has over 70 integrations with various GenAI components such as document stores, model providers and evaluation frameworks from companies such as Amazon, Microsoft, Nvidia and more. Creating an integration for OPEA will allow Haystack customers to use OPEA components in their pipelines. This RFC is used to present a high-level overview of the Haystack integration.

## Design Proposal

The idea is to create thin wrappers for OPEA components that will enable communicating with them using the existing REST API. The wrappers will match Haystack's API so that they could be used within Haystack pipelines. This will allow developers to seamlessly use OPEA components alongside other Haystack components.

The integration will be implemented as a Python package (similar to other Haystack integrations). The source code will be hosted in OPEA's GenAIComps repo under a new directory called Integrations. The package itself will be uploaded to [PyPi](https://pypi.org/) to allow for easy installation.

Following a discussion with Haystack's technical team, it was agreed that a ChatQnA example, using this OPEA integration, would be a good way to showcase its capabilities. To support this, several component wrappers need to be implemented in the first version of the integration (other wrappers will be added gradually):

1. OPEA Document Embedder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any inclusions/exclusions with respect to document types? Word, pdf, ppt, images, ..?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Embedding documents that are not purely textual is beyond the scope of this integration. We can think about adding document parsers/preprocessors as additional wrappers to OPEA's dataprep components at a later stage.


This component will receive a Haystack Document and embed it using an OPEA embedding microservice.

2. OPEA Text Embedder

This component will receive text input and embed it using an OPEA embedding microservice.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between text versus document embedder. If the text is long, it too might need chunking?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're very similar, it's mainly done to conform with similar Haystack integrations and allow for embedding of both raw text and Document objects.


3. OPEA Generator

This component will receive a text prompt and generate a reponse using an OPEA LLM microservice.

4. OPEA Retriever

This component will receive an embedding and retrieve documents with similar emebddings using an OPEA retrieval microservice.

4. GenAIEval

The evaluation, benchmark, and scorecard suite for OPEA, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feels like a dangling sentence .. what if anything will be delivered as part of the integration here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct, this shouldn't be here - removed.


## Alternatives Considered

n/a

## Compatibility

n/a

## Miscs

Once implemented, the Haystack team list the OPEA integration on their [integrations page](https://haystack.deepset.ai/integrations) which will allow for easier discovery. Haystack, in collaboration with Intel, will also publish a technical blog post showcasing a ChatQnA example using this integration (similar to this [NVidia NIM post](https://haystack.deepset.ai/blog/haystack-nvidia-nim-rag-guide)).