Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenSearch migration for Alegre, including dedicated migration tasks in ECS #364

Open
wants to merge 27 commits into
base: develop
Choose a base branch
from

Conversation

sonoransun
Copy link
Contributor

Description

This change migrates Alegre over to OpenSearch with limited backwards compatibility for the legacy model name elasticsearch for certain queires.

References:
https://meedan.atlassian.net/browse/DEVOPS-517
https://meedan.atlassian.net/browse/CV2-3255

How has this been tested?

This has been tested extensively in QA to verify the new migration task process, and the updating opensearch bindings, and the configuration and naming changes.

Have you considered secure coding practices when writing this code?

I am mostly concerned with breaking existing behavior. I tried to cover as much as possible, and will see how automated tests look when triggered for this pull request.

@sonoransun sonoransun changed the title Bugfix/no migrations test OpenSearch migration for Alegre, including dedicated migration tasks in ECS Dec 4, 2023
Copy link
Contributor

@skyemeedan skyemeedan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this amazing, huge amount of work!

  • do we actually need to change the name of the model? ie 'ELASTICSEARCH_SIMILARITY' could still be used to describe the 'model' (indexing structure) even if it is hosted in OpenSearch?
  • If we are changing the name we use to refer to doing a lookup using Open/ElasticSearch full text indexes, maybe we should change it to something like 'FULLTEXT_INDEX' so that it describes the kind of index (vs the hosting technology)

.gitlab-ci.yml Outdated Show resolved Hide resolved
@sonoransun
Copy link
Contributor Author

  • do we actually need to change the name of the model? ie 'ELASTICSEARCH_SIMILARITY' could still be used to describe the 'model' (indexing structure) even if it is hosted in OpenSearch?

for sanity sake, I wanted to keep things consistent once we migrate. I did wonder about naming this something more generic, like SIMILARITY_MODEL_PREFIX due to the way it is used. Hmm...

  • If we are changing the name we use to refer to doing a lookup using Open/ElasticSearch full text indexes, maybe we should change it to something like 'FULLTEXT_INDEX' so that it describes the kind of index (vs the hosting technology)

FULLTEXT_INDEX is a better name :) This brings to mind a discussion about how the language specific indices are generated (_fr, _de, _en, etc...) but that's not a refactoring to tackle here...

Copy link
Contributor

@computermacgyver computermacgyver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide exactly the scope of the changes here.

  • We definitely want to shift to using the Opensearch docker image
  • Do we want to rename all the secrets? E.g., change ELASTICSEARCH_URL to OPENSEARCH_URL? I don't have a strong preference. We just have to make sure not only the .env_files but also the actual secrets in AWS get changed if we do so.
  • I do NOT think we should change the Alegre API. The alegre API currently has a model called elasticsearch . If we change the name of that model, it will break the Alegre API and all current uses of it across Check, Timpani, etc. We would also need to migrate existing data to rename the model, which is painful. Per my comments in this review, I think the Alegre API should not change. This should be a change internal to Alegre and not something that breaks the API for services external to Alegre.

app/main/controller/similarity_async_controller.py Outdated Show resolved Hide resolved
app/main/controller/similarity_controller.py Outdated Show resolved Hide resolved
app/main/controller/similarity_controller.py Outdated Show resolved Hide resolved
app/main/controller/similarity_controller.py Outdated Show resolved Hide resolved
app/main/lib/graph_writer.py Outdated Show resolved Hide resolved
app/main/lib/text_similarity.py Outdated Show resolved Hide resolved
app/main/lib/text_similarity.py Outdated Show resolved Hide resolved
app/main/lib/text_similarity.py Outdated Show resolved Hide resolved
app/main/lib/text_similarity.py Outdated Show resolved Hide resolved
app/main/lib/text_similarity.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants