Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArangoDB Integration #3

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from
Draft

ArangoDB Integration #3

wants to merge 16 commits into from

Conversation

aMahanna
Copy link
Member

@aMahanna aMahanna commented Oct 23, 2024

This PR tracks the in-progress & completed ArangoDB Microservices for GenAIComps

Status:

  1. Dataprep (ArangoDB: Dataprep #12)
  2. Retriever (ArangoDB: Retriever #2 )
  3. Chat History (ArangoDB: Chathistory #10)
  4. Feedback Management (ArangoDB: Feedback management #11)
  5. Prompt Registry (ArangoDB: PromptRegistry #8)
  6. Vector Stores (ArangoDB: Vector Store #13)

Development Setup for using new LangChain functionality

Depends on arangoml/langchain#1

  1. Clone this repository

  2. Switch to the arangodb branch

  3. Create a virtual environment:

python -m venv .venv

source .venv/bin/activate
  1. Install the required packages:
pip install python-arango
pip install langchain_openai
pip install git+https://github.com/arangoml/langchain.git@arangodb#subdirectory=libs/community

Note: Check out the contents in arangoml/langchain#1 to better understand the 3 different langchain classes we'll be using in this repo (ArangoGraph, ArangoGraphQAChain, and ArangoVector)

  1. Provision the ArangoDB with Vector Index image:

For ARM:

docker create --name arangodb -p 8529:8529 -e ARANGO_ROOT_PASSWORD=test jbajic/arangodb-arm:vector-index-preview

docker start arangodb

For AMD:

docker create --name arangodb -p 8529:8529 -e ARANGO_ROOT_PASSWORD=test jbajic/arangodb:vector-index-preview

docker start arango-vector

Note: This is an ArangoDB Image that is based off of an ArangoDB PR that introduces Vector Indexing and Vector Similarity support via FAISS. Ask Anthony for more details.

  1. Set your OPENAI_API_KEY environment variable (contact Anthony for access)

  2. Run the test script to confirm LangChain is working:

python langchain_test.py

@aMahanna aMahanna mentioned this pull request Oct 23, 2024
@aMahanna aMahanna marked this pull request as draft October 23, 2024 12:54
ajaykallepalli and others added 3 commits November 25, 2024 14:28
* initial commit

* updating feedback management readme to match arango

* Removing comments above import

* Working API test and updated readme

* Working docker compose file

* Docker compose creating network and docker image

* code review

* update readme & dev yaml

* delete dev files

* Delete arango_store.py

---------

Co-authored-by: Anthony Mahanna <[email protected]>
* Initial commit

* remove unnecessary files

* code review

* update: `prompt_search`

* new: `ARANGO_PROTOCOL`

* README

* cleanup

---------

Co-authored-by: lasyasn <[email protected]>
Co-authored-by: Anthony Mahanna <[email protected]>
aMahanna pushed a commit that referenced this pull request Nov 26, 2024
* Adds an endpoint for image ingestion

Signed-off-by: Melanie Buehler <[email protected]>

* Combined image and video endpoint

Signed-off-by: Melanie Buehler <[email protected]>

* Add test and update README

Signed-off-by: Melanie Buehler <[email protected]>

* fixed variable name for embedding model (#1)

Signed-off-by: okhleif-IL <[email protected]>

* Fixed test script

Signed-off-by: Melanie Buehler <[email protected]>

* Remove redundant function

Signed-off-by: Melanie Buehler <[email protected]>

* get_videos, delete_videos --> get_files, delete_files (#3)

Signed-off-by: okhleif-IL <[email protected]>

* Updates test per review feedback

Signed-off-by: Melanie Buehler <[email protected]>

* Fixed test

Signed-off-by: Melanie Buehler <[email protected]>

* Add support for audio files multimodal data ingestion (#4)

* Add support for audio files multimodal data ingestion

Signed-off-by: dmsuehir <[email protected]>

* Update function name

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>

* Change videos_with_transcripts to ingest_with_text

Signed-off-by: Melanie Buehler <[email protected]>

* Add image support to video ingestion with transcript functionality

Signed-off-by: Melanie Buehler <[email protected]>

* Update test and README

Signed-off-by: Melanie Buehler <[email protected]>

* Updated for review suggestions

Signed-off-by: Melanie Buehler <[email protected]>

* Add two tests for ingest_with_text

Signed-off-by: Melanie Buehler <[email protected]>

* LVM TGI Gaudi update for prompts without images (#7)

* LVM Gaudi TGI update for prompts without images

Signed-off-by: dmsuehir <[email protected]>

* Wording

Signed-off-by: dmsuehir <[email protected]>

* Add a test

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: dmsuehir <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Change dummy image to be b64 encoded instead of the url (#9)

Signed-off-by: dmsuehir <[email protected]>

* Updates based on review feedback (#10)

Signed-off-by: dmsuehir <[email protected]>

* Test fix (#11)

Signed-off-by: dmsuehir <[email protected]>

---------

Signed-off-by: Melanie Buehler <[email protected]>
Signed-off-by: okhleif-IL <[email protected]>
Signed-off-by: dmsuehir <[email protected]>
Co-authored-by: dmsuehir <[email protected]>
Co-authored-by: Omar Khleif <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Abolfazl Shahbazi <[email protected]>
aMahanna and others added 7 commits November 26, 2024 18:12
* initial commit

* updating feedback management readme to match arango

* Removing comments above import

* Working API test and updated readme

* Working docker compose file

* Docker compose creating network and docker image

* code review

* update readme & dev yaml

* delete dev files

* Delete arango_store.py

---------

Co-authored-by: Anthony Mahanna <[email protected]>
* Initial commit

* remove unnecessary files

* code review

* update: `prompt_search`

* new: `ARANGO_PROTOCOL`

* README

* cleanup

---------

Co-authored-by: lasyasn <[email protected]>
Co-authored-by: Anthony Mahanna <[email protected]>
* Initial chat history implementation without API and docker implementation

* make copy and remove async

* API functionality matching MongoDB implementation

Working API functionality, update to dockerfile required, and additional checks when updating document required.

* Delete temp.py

* Push changes and reset repo

* Async definitions working in curl calls, updated read me to ArangoDB setup

* Working docker container with network

* Removing need for network to be created before docker compose

* Cleanup async files and backup files

* code review

* fix: typo

* revert mongo changes

---------

Co-authored-by: Anthony Mahanna <[email protected]>
Comment on lines -6 to +13
feedbackmanagement:
feedbackmanagement-mongo-server:
build:
dockerfile: comps/feedback_management/mongo/Dockerfile
image: ${REGISTRY:-opea}/feedbackmanagement:${TAG:-latest}
image: ${REGISTRY:-opea}/feedbackmanagement-mongo-server:${TAG:-latest}
feedbackmanagement-arango-server:
build:
dockerfile: comps/feedback_management/arango/Dockerfile
image: ${REGISTRY:-opea}/feedbackmanagement-arango-server:${TAG:-latest}
Copy link
Member Author

@aMahanna aMahanna Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to double check w/ the OPEA team if it's okay to add -mongo-server to feedbackmanagement

@aMahanna aMahanna marked this pull request as ready for review November 27, 2024 13:56
@aMahanna aMahanna marked this pull request as draft November 27, 2024 13:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants