-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: incremental reindex_studio management command [FC-0062] #35864
feat: incremental reindex_studio management command [FC-0062] #35864
Conversation
Thanks for the pull request, @DanielVZ96! What's next?Please work through the following steps to get your changes ready for engineering review: 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. 🔘 Let us know that your PR is ready for review:Who will review my changes?This repository is currently maintained by Where can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:
When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
5ffe2d5
to
bd37e4c
Compare
assert IncrementalIndexCompleted.objects.all().count() == 1 | ||
api.rebuild_index(incremental=True) | ||
assert IncrementalIndexCompleted.objects.all().count() == 0 | ||
assert mock_meilisearch.return_value.index.return_value.add_documents.call_count == 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanielVZ96 could you add a comment here? It is not easy at first glance to realize why this call_count
is 7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I modified the mocking mechanism to make it easier to understand and also added some comments.
status_cb = log.info | ||
|
||
status_cb("Creating new empty index...") | ||
with _using_temp_index(status_cb) as temp_index_name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanielVZ96 @bradenmacdonald @pomegranited I am concerned about the functionality of this reset.
If we need to create and populate a new index on a large instance, we need to run the --reset,
which removes the old index, and then I need to run the --incremental
. However, the search results will be broken when the new index is populated.
Wouldn't it be better to add an option to --incremental
, so that it creates a temporary index and does swap, so as not to break the search results, as the non-incremental form does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisChV I'm assuming that most times an administrator is running this incremental build, they either have no existing index or their existing index is from an old version and is missing some configuration/columns/etc. (so it actually is broken). In that case, removing the old index and starting an incremental build will result in incomplete search results for a while, but the search will be working without errors. And, the incremental index rebuilds the newest courses first, so the results should fill in relatively quickly.
I think for large instances, (where the reindex can take several days) it's better to have a working search with incomplete results, than to have a totally broken search (that displays errors because the old index does not exist or has the wrong configuration).
For Teak I would like to find a way to simplify this, maybe by only having the incremental option and allowing it to be either using a temporary index or not. But I think we'll need to see how this works first and hear from people testing it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK that's fine for me 👍
24da2c3
to
776b1d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanielVZ96 Looks good 👍 Some nits:
- I tested this: I followed the testing instructions
- I read through the code and considered the security, stability and performance implications of the changes.
- Includes tests for bugfixes and/or features added.
- Includes documentation
When running ./manage.py cms reindex_studio --experimental --init
with the existing index, the message is lost in the logs, is it possible to set the output message to have a different color, yellow or red?
Nice nit. I'll send it to stderr then |
@@ -473,10 +535,16 @@ def add_with_children(block): | |||
status_cb( | |||
f"{num_contexts_done + 1}/{num_contexts}. Now indexing course {course.display_name} ({course.id})" | |||
) | |||
if course.id in keys_indexed: | |||
num_contexts_done += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skipped courses are still included in the total count and in the num_contexts_done
. I like that. I think we should do the same for skipped (already-indexed) libraries. Currently, they're excluded altogether and won't be include din the num_contexts_done
count.
@DanielVZ96 Very nice work here! I have a couple requests but I'm very happy with how this is looking. |
0bf5368
to
73957b6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks! I like that approach you took. Just a few more tweaks I'm suggesting.
bbcf4d5
to
c41c44d
Compare
@bradenmacdonald this is ready for a re-review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for those changes! I tested the various modes (normal, init, and incremental) and it seems to be working well!
I'll wait to see if @ormsbee wants to comment before we merge, but can you please open the backport PR in the meantime? I think this version is ready to go.
@bradenmacdonald here's the backport: #35981 |
2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production. |
2U Release Notice: This PR has been deployed to the edX production environment. |
1 similar comment
2U Release Notice: This PR has been deployed to the edX production environment. |
Description
Adds an incremental mode to the reindex studio management command, and also other few utilities for managing the index: reset and init.
Supporting information
Testing instructions
First open the home page of any library. eg.: http://apps.local.openedx.io:2001/course-authoring/library/lib:test:2test
tutor dev exec cms bash
./manage.py cms reindex_studio
should do nothing, and only reindex if you pass the--experimental
flag./manage.py cms reindex_studio --experimental --init
: Should do nothing since the index already exist./manage.py cms reindex_studio --experimental --reset
: Should recreate an empty index. This means that searching in studio page should not return any results../manage.py cms reindex_studio --experimental --incremental
right after it finishes indexing collections. Run it again and assert it continues from where it was interrupted. Try searching content now../manage.py cms reindex_studio --experimental
and verify the index is recreated by searching blocks after it finishes. And also assert that there are no records of a current incremental update lingering:./manage.py cms shell -c 'from openedx.core.djangoapps.content.search.models import IncrementalIndexCompleted; print(IncrementalIndexCompleted.objects.all())'
Deadline
Asap
Other information
Private-Ref: https://tasks.opencraft.com/browse/FAL-3902