Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR: OpenSearch integration components for OPEA #908

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

cameronmorin
Copy link

@cameronmorin cameronmorin commented Nov 15, 2024

Description

These proposed changes create the OPEA components necessary to integrate OpenSearch with GenAIExamples, specifically tested & validated with ChatQnA

Issues

n/a

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

opensearch-py

Tests

I tested the components independently, as well as end-to-end by creating my own ChatQnA cluster replacing the redis microservices (dataprep, retriever, and vectorstore) with the OpenSearch components I've created here.

@lvliang-intel
Copy link
Collaborator

@cameronmorin,
Thanks for your contribution.
Please check the pre-commit CI issues, looks like the yaml file has format issue.
image

comps/dataprep/opensearch/README.md Outdated Show resolved Hide resolved
comps/retrievers/opensearch/langchain/ingest.py Outdated Show resolved Hide resolved
comps/dataprep/opensearch/README.md Outdated Show resolved Hide resolved
comps/dataprep/opensearch/README.md Show resolved Hide resolved
@letonghan
Copy link
Collaborator

Hi @cameronmorin , thanks for your contribution. Please check out the comments above, and remember to add CI test scripts like this for each of the new components. Please name the test scripts following the schema of test/${component}/test_${component}_${storage}_${framework}.sh.
Thank you.

@cameronmorin
Copy link
Author

Hi @letonghan ! Thank you so much for reviewing my PR and leaving your helpful comments. I've addressed the simple ones of removing unnecessary comments & files, and I'm in the process of adding the test scripts you mentioned that are required.

Copy link
Collaborator

@ashahba ashahba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @cameronmorin and thanks for this PR!
I just did a quick look and it needs those 2 files fixed for formatting.

I need to do deeper review once those changes are made.

Thanks.

tests/dataprep/test_dataprep_opensearch_langchain.sh Outdated Show resolved Hide resolved
tests/retrievers/test_retrievers_opensearch_langchain.sh Outdated Show resolved Hide resolved
@ashahba ashahba self-assigned this Nov 22, 2024
@joshuayao joshuayao added this to the v1.1 milestone Nov 23, 2024
@joshuayao
Copy link
Collaborator

Hi @cameronmorin

Thanks for your PR. Could you please check the failures in the CI test? Thanks.

@ftian1 ftian1 modified the milestones: v1.1, v1.2 Nov 25, 2024
Comment on lines 12 to 16
apt update
apt install default-jre
apt-get install tesseract-ocr -y
apt-get install libtesseract-dev -y
apt-get install poppler-utils -y
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not combine some of these commands together like:

apt-get update
apt-get install default-jre libtesseract-dev poppler-utils tesseract-ocr

Copy link

@smguggen smguggen Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind linking me to the OPEA style guidelines because I'm confused as to what the standards are. This snippet is identical to existing code:

image (35)

comps/dataprep/opensearch/README.md Show resolved Hide resolved
comps/dataprep/opensearch/README.md Outdated Show resolved Hide resolved
comps/dataprep/opensearch/README.md Outdated Show resolved Hide resolved
comps/dataprep/opensearch/README.md Outdated Show resolved Hide resolved
comps/retrievers/opensearch/langchain/Dockerfile Outdated Show resolved Hide resolved
@cameronmorin
Copy link
Author

Lastly, I've tried to address the DCO comment by rebasing my branch in order to sign all of my commits, however now I'm seeing issues with commits that I didn't author after addressing & signing all of my commits. I'll try to see if I can fix these as well, but if it comes to it after I address all of the comments and get the tests working, can I move my changes to a new PR with a single signed commit from me to be merged instead?

If not, some help or advice with the commit signing would also be appreciated :) Thanks!

Copy link

codecov bot commented Dec 2, 2024

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/gateway.py 0.00% 1 Missing ⚠️
comps/version.py 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/proto/api_protocol.py 96.12% <100.00%> (+0.06%) ⬆️
comps/cores/mega/gateway.py 28.14% <0.00%> (ø)
comps/version.py 0.00% <0.00%> (ø)

@chensuyue
Copy link
Collaborator

Lastly, I've tried to address the DCO comment by rebasing my branch in order to sign all of my commits, however now I'm seeing issues with commits that I didn't author after addressing & signing all of my commits. I'll try to see if I can fix these as well, but if it comes to it after I address all of the comments and get the tests working, can I move my changes to a new PR with a single signed commit from me to be merged instead?

If not, some help or advice with the commit signing would also be appreciated :) Thanks!

The rebase seems not correct, the diff file including some history change.

@chensuyue
Copy link
Collaborator

cameronmorin and others added 10 commits December 2, 2024 20:16
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
* Pass down model id for ChatQnA

Signed-off-by: lvliang-intel <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* update logic

Signed-off-by: lvliang-intel <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: lvliang-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Cameron Morin <[email protected]>
* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
cameronmorin and others added 21 commits December 2, 2024 20:16
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
…roject#915)

* fix retriever and reranker to process chat completion request

Signed-off-by: minmin-intel <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: minmin-intel <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: Cameron Morin <[email protected]>
* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add empty list check (opea-project#914)

* Add outputs.

Signed-off-by: ZePan110 <[email protected]>

* Add empty list check

Signed-off-by: ZePan110 <[email protected]>

* test CI.

Signed-off-by: ZePan110 <[email protected]>

* Remove test files

Signed-off-by: ZePan110 <[email protected]>

* remove debug code

Signed-off-by: chensuyue <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: chensuyue <[email protected]>

* Fix hardware tag retrieval issue (opea-project#916)

Signed-off-by: ZePan110 <[email protected]>

* fix html content loading problem

Signed-off-by: letonghan <[email protected]>

* fix milvus connection issue

Signed-off-by: letonghan <[email protected]>

* update parse_html function for all dbs

Signed-off-by: letonghan <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: letonghan <[email protected]>
Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: ZePan110 <[email protected]>
Co-authored-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
* Add PREDICTIONGUARD_API_KEY

Signed-off-by: ZePan110 <[email protected]>

* Increase timeout

Signed-off-by: ZePan110 <[email protected]>

* Fix log name issue

Signed-off-by: ZePan110 <[email protected]>

* Remove useless code.

Signed-off-by: ZePan110 <[email protected]>

---------

Signed-off-by: ZePan110 <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
…#928)

* Update requirements to pin protobuf version and fix grpc conflict, and limit vdms version

Signed-off-by: Lacewell, Chaunte W <[email protected]>

* Update fix by removing grpcio pin and pinning opentelemetry-proto to 1.23.0

Signed-off-by: Lacewell, Chaunte W <[email protected]>

---------

Signed-off-by: Lacewell, Chaunte W <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: chensuyue <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: isaacncz <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
Signed-off-by: Cameron Morin <[email protected]>
@cameronmorin
Copy link
Author

Lastly, I've tried to address the DCO comment by rebasing my branch in order to sign all of my commits, however now I'm seeing issues with commits that I didn't author after addressing & signing all of my commits. I'll try to see if I can fix these as well, but if it comes to it after I address all of the comments and get the tests working, can I move my changes to a new PR with a single signed commit from me to be merged instead?
If not, some help or advice with the commit signing would also be appreciated :) Thanks!

The rebase seems not correct, the diff file including some history change.

I agree, I thought the rebase seemed off when I initially performed it because I saw a lot of historical changes when I rebased to sign all of the commits in the log. I tried to rebase manually and only sign the commits that I authored, but I still saw the historical changes being added to my PR in addition to the DCO still failing. I just pushed a change that fixed the DCO, however I still see the historical changes added to the PR. Would it be a bad idea to resolve all of the reviewer comments, and split my changes into a fresh branch / PR that doesn't have to rebase historical changes in order to merge that instead of this PR? I'd want to link this PR so that all conversations and reviews persist and are tracked historically, but I'm not sure how to clear up the other changes flowing into my branch.

@cameronmorin
Copy link
Author

I've moved the changes over to a new PR in order to resolve the history and DCO issues. Please review & approve that PR instead since all tests are passing with the same file changes and new git log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] CloudFormation stack for deploying ChatQnA using Amazon OpenSearch