Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataprep updated readme and parametrized prepare_doc_arango.py #1036

Closed

Conversation

ajaykallepalli
Copy link

Changes Made

  • Updated Dockerfile to include git for dependency installation
  • Updated Docker compose with required environment variables

Testing Done

  • Running service with Docker compose and python files.
  • Tested pdf with table parsing
    -- takes 25-30 mins with 500 chunk size and 50 overlap, 10 mins with 2000 chunk size and 200 overlap.
  • Tested time required for parsing sample pdf files
  • Embeddings working, with ability to set embeddings to False

Setup Instructions

  1. Set environment variables:
export ARANGO_URL="http://localhost:8529"
export ARANGO_USERNAME="root"
export ARANGO_PASSWORD="password"
export ARANGO_DB_NAME="opea"
export PYTHONPATH='{path to comps file}'
  1. Start services:
python prepare_doc_arango.py
  1. Curl command to run:
curl -X POST \
    -H "Content-Type: multipart/form-data" \
    -F "files=@./your_file.pdf" \
    -F "graph_name=${your_graph_name}" \
    http://localhost:6007/v1/dataprep
    
curl -X POST \
    -H "Content-Type: multipart/form-data" \
    -F "files=@./file1.txt" \
    -F "graph_name=${your_graph_name}" \
    http://localhost:6007/v1/dataprep

Notes

  • None of the services that required TGI Gaudi service tested, if required commented out before running.
  • More detailed instructions in the readme

Testing

  • Tested standalone deployment
  • Tested docker-compose deployment

aMahanna and others added 25 commits November 26, 2024 18:12
* initial commit

* updating feedback management readme to match arango

* Removing comments above import

* Working API test and updated readme

* Working docker compose file

* Docker compose creating network and docker image

* code review

* update readme & dev yaml

* delete dev files

* Delete arango_store.py

---------

Co-authored-by: Anthony Mahanna <[email protected]>
* Initial commit

* remove unnecessary files

* code review

* update: `prompt_search`

* new: `ARANGO_PROTOCOL`

* README

* cleanup

---------

Co-authored-by: lasyasn <[email protected]>
Co-authored-by: Anthony Mahanna <[email protected]>
* Initial chat history implementation without API and docker implementation

* make copy and remove async

* API functionality matching MongoDB implementation

Working API functionality, update to dockerfile required, and additional checks when updating document required.

* Delete temp.py

* Push changes and reset repo

* Async definitions working in curl calls, updated read me to ArangoDB setup

* Working docker container with network

* Removing need for network to be created before docker compose

* Cleanup async files and backup files

* code review

* fix: typo

* revert mongo changes

---------

Co-authored-by: Anthony Mahanna <[email protected]>
This reverts commit 8f750e4.
…arametrized variables in prepare_doc_arango.py
@ajaykallepalli
Copy link
Author

ajaykallepalli commented Dec 16, 2024

Will be merging changes into Arango branch first, apologies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants