Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameLookup can end up in a weird state with two pods, but where restarting the Solr pod causes it to download the database again #897

Open
gaurav opened this issue May 3, 2024 · 0 comments
Assignees

Comments

@gaurav
Copy link
Collaborator

gaurav commented May 3, 2024

When NameRes is run with LOAD_DATA=yes, we create three pods:

  • A web pod, which acts independently of the others
  • A restore job, which completes its work and goes to "Succeeded"
  • A Solr pod, which has:
    • An init-container that downloads the Solr database from RENCI, and then
    • A Solr database that the restore job will talk to in order to start the restoration process

Somehow, on ITRB CI on May 2-3, 2024, we ran into a situation where

  • On May 2, 2024, NameRes was restarted with LOAD_DATA=yes
  • Some updates might have caused it to restart -- it's unclear whether or not it was restarted in LOAD_DATA=yes or no mode, but let's assume LOAD_DATA=no, as that's the default for
  • On May 3, 2024, @pabbathreddya2 and I found that there were two pods (solr and web). We observed that there was around 158G of data in the Solr pod. Updating it with LOAD_DATA=no did not change the pods, so @pabbathreddya2 did a helm uninstall and then restarted it with LOAD_DATA=yes, which restarted in the Solr pod becoming essentially empty of data. My theory is that the 158G of data was the download, which is deleted at the start of a new download (since I think Rewrite NameRes script to delete the database later in the download process #842 is fixed now?), so that the database had in fact been wiped previously -- but how? We would see this if the PVC was wiped, but it's unclear how that would happen.

Essentially, this boils down to: how can the pods be in LOAD_DATA=no state (with two pods instead of three), but then restarting the Solr job causes it to start the download as if it's in LOAD_DATA=yes state?

@gaurav gaurav self-assigned this May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant