-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migration: performance issues #1640
Comments
By manually adding a cache to >>> from inspirehep.modules.cache import current_cache
>>> app.extensions['invenio-collections'].cache = current_cache we were able to work around the performance issue in This workaround has to be translated in a proper solution. |
@jirikuncar suggests:
|
Despite what the docstring says, setting Now I just manually started a nightly build. Let's see how many records are going to be there at 9. |
The problem with this is that the percolator puts too much pressure on ES, and record inserts are lost, so we cannot go this route. I outlined an alternative fix here: inveniosoftware/invenio-collections#72 |
For the past week nightly migration has been unusually slow. While previously migration of all records terminated around 3:00 AM, now, by the time we get back at the office, only half of it is done.
Here's a profile of
migrate('dumps/all_XXX.xml.gz', wait_for_results=True)
, obtained with the techniques described in 9b0f4a8: https://cernbox.cern.ch/index.php/s/D0rPrCM68FaqGPhIt's clear that the problem lies in
get_record_collections
, part in_build_cache
, and part in_find_matching_collections_internally
._build_cache
(ext: configurable cache addition inveniosoftware/invenio-collections#69)_find_matching_collections_internally
(ext: configurable matcher addition inveniosoftware/invenio-collections#73)invenio-collections
(WIP release: v1.0.0a4 inveniosoftware/invenio-collections#74)inspire-next
The text was updated successfully, but these errors were encountered: