-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decommissioning a broken backend takes too long #549
Comments
The alba version on functional cluster A is 1.3.0, @domsj asked me to upgrade it to 1.3.2 because of improvements to alba handling disk/data loss (https://github.com/openvstorage/alba/releases/tag/1.3.2) |
After updating from alba 1.3.0 to 1.3.1 the decommissioned alba backend is gone and the proxy does not try to connect anymore to the old backend. To try to reproduce this issue with alba 1.3.1 I will recreate the situation with the current OVH setup, shutdown 1 backend and remove it. |
PLease reproduce with latest alba |
I've tried to reproduce the issue and today we've observed the following: Steps to reproduce
Conclusionthe maintenance agent should notify the namespace quicker that the old bucket is gone for good. |
Discussed this with @toolslive, we can (and will) make an improvement here in the near future |
Is that near future already over? Near future sounds like days or weeks, not 3-4 months :) |
Sorry I can't recall what improvements we had in mind. @toolslive perhaps you can remember? |
We've got cluster A & B. In this situation cluster A is connected with cluster B through a global backend and external local backend (and 2,1,2,1 preset).
We saw that cluster B was broken. We unlinked the external local backend from the global backend. But after listing the osds on the global backend, after an hour we still saw the backend in decommissioned mode on the global proxies.
I investigated together with @domsj, the maintenance is not doing anything important and is not consuming much resources. What we do see is a lot of connections still to the old backend. (Connections refused because the cluster B is totally dead)
The text was updated successfully, but these errors were encountered: