Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding additional default use cases #731

Merged
merged 3 commits into from
Jun 8, 2024

Conversation

amitgalitz
Copy link
Member

Description

  • Adding two additional default use cases for pretrained model for hybrid and semantic search
  • removing id field from some of the use case templates
  • removing weights from hybrid search template
  • making the text and image type dynamic for multimodal search as most customers don't actually just use binary for images.
  • Adding e2e integration test for local semantic search template all the way to neural search

Issues Resolved

part of #666

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added the backport 2.x backport PRs to 2.x branch label Jun 6, 2024
@amitgalitz amitgalitz force-pushed the default-improvements branch 2 times, most recently from 7ce4099 to bca3abe Compare June 6, 2024 21:59
Copy link

codecov bot commented Jun 6, 2024

Codecov Report

Attention: Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.

Project coverage is 74.35%. Comparing base (a941b68) to head (50b44eb).

Files Patch % Lines
...opensearch/flowframework/workflow/ReindexStep.java 28.57% 2 Missing and 3 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #731      +/-   ##
============================================
- Coverage     74.41%   74.35%   -0.06%     
  Complexity      744      744              
============================================
  Files            84       84              
  Lines          3811     3822      +11     
  Branches        330      333       +3     
============================================
+ Hits           2836     2842       +6     
- Misses          823      825       +2     
- Partials        152      155       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@joshpalis
Copy link
Member

joshpalis commented Jun 6, 2024

Interesting that the windows single node integration tests are failing due to :

EPRODUCE WITH: gradlew ':integTest' --tests "org.opensearch.flowframework.rest.FlowFrameworkRestApiIT.testSemanticSearchWithLocalModelEndToEnd" -Dtests.seed=F9C1F84B83F9D1C6 -Dtests.security.manager=false -Dtests.locale=mk -Dtests.timezone=PLT -Druntime.java=11

org.opensearch.flowframework.rest.FlowFrameworkRestApiIT > testSemanticSearchWithLocalModelEndToEnd FAILED
    java.lang.AssertionError: expected:<0> but was:<1>
Suite: Test class org.opensearch.flowframework.rest.FlowFrameworkRestApiIT
        at __randomizedtesting.SeedInfo.seed([F9C1F84B83F9D1C6:9469A614D9123AA7]:0)
  2> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".  2> 
        at org.junit.Assert.fail(Assert.java:89)

Looking at the test, its failing during this line : assertEquals(neuralSearchResponse.getHits().getHits().length, 1);
Given that the other integration tests passed, perhaps this failure is due to the GHA runner memory limitation. I see that the model used in the semantic search local model template is msmarco-distilbert-base-tas-b, which is about 254 mb.

Can we use a smaller model here for this test? I think that should help mitigate the flakiness.

Edit :
Smallest sentence transformer pretrained model is huggingface/sentence-transformers/paraphrase-MiniLM-L3-v2, which is about 67 MB

Signed-off-by: Amit Galitzky <[email protected]>
@amitgalitz amitgalitz force-pushed the default-improvements branch from c7b533e to 1f9d215 Compare June 7, 2024 23:32
Copy link
Member

@ohltyler ohltyler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a high level LGTM.

Signed-off-by: Amit Galitzky <[email protected]>
@amitgalitz amitgalitz force-pushed the default-improvements branch from 81e761e to 50b44eb Compare June 8, 2024 00:21
@amitgalitz amitgalitz merged commit 53126cb into opensearch-project:main Jun 8, 2024
19 of 20 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/flow-framework/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/flow-framework/backport-2.x
# Create a new branch
git switch --create backport/backport-731-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 53126cbe1a7b6bb6bf41196039b811d5436328a5
# Push it to GitHub
git push --set-upstream origin backport/backport-731-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/flow-framework/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-731-to-2.x.

amitgalitz added a commit to amitgalitz/opensearch-ai-flow-framework that referenced this pull request Jun 10, 2024
* adding pretrained model templates

Signed-off-by: Amit Galitzky <[email protected]>

* adding reindex

Signed-off-by: Amit Galitzky <[email protected]>

* changing file structure for bwc

Signed-off-by: Amit Galitzky <[email protected]>

---------

Signed-off-by: Amit Galitzky <[email protected]>
amitgalitz added a commit that referenced this pull request Jun 10, 2024
Adding additional default use cases (#731)

* adding pretrained model templates



* adding reindex



* changing file structure for bwc



---------

Signed-off-by: Amit Galitzky <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport PRs to 2.x branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants