Decommissioning a broken backend takes too long #549

kinvaris · 2017-01-05T14:58:05Z

We've got cluster A & B. In this situation cluster A is connected with cluster B through a global backend and external local backend (and 2,1,2,1 preset).
We saw that cluster B was broken. We unlinked the external local backend from the global backend. But after listing the osds on the global backend, after an hour we still saw the backend in decommissioned mode on the global proxies.

I investigated together with @domsj, the maintenance is not doing anything important and is not consuming much resources. What we do see is a lot of connections still to the old backend. (Connections refused because the cluster B is totally dead)

kinvaris · 2017-01-05T14:59:53Z

The alba version on functional cluster A is 1.3.0, @domsj asked me to upgrade it to 1.3.2 because of improvements to alba handling disk/data loss (https://github.com/openvstorage/alba/releases/tag/1.3.2)

kinvaris · 2017-01-05T16:54:31Z

After updating from alba 1.3.0 to 1.3.1 the decommissioned alba backend is gone and the proxy does not try to connect anymore to the old backend.

To try to reproduce this issue with alba 1.3.1 I will recreate the situation with the current OVH setup, shutdown 1 backend and remove it.

wimpers · 2017-01-16T13:50:47Z

PLease reproduce with latest alba

kinvaris · 2017-02-14T16:10:41Z

I've tried to reproduce the issue and today we've observed the following:

Steps to reproduce

Create a global backend
Add 2 local backends & 1 external local backend with policy (1, 2, 1, 3)
Create some vdisks and add some data to the vdisks (in my case I wrote approx. 10GB of data)
I broke 1 external local backend (lazy umount of asd mountpoints)
I deleted the external local backend (success)
Checked the proxy list osds to see if the osd is gone, but after 15 min. it was still present. (but in decommissioned state)
After discussion with @domsj we saw that the old bucket was still present in some namespaces.
After the old bucket was gone (after 30 min.) the OSD was gone in the proxy

Conclusion

the maintenance agent should notify the namespace quicker that the old bucket is gone for good.

domsj · 2017-02-14T16:42:28Z

Discussed this with @toolslive, we can (and will) make an improvement here in the near future

wimpers · 2017-05-30T12:35:14Z

Is that near future already over? Near future sounds like days or weeks, not 3-4 months :)

domsj · 2017-05-30T13:11:23Z

Sorry I can't recall what improvements we had in mind. @toolslive perhaps you can remember?
Looking at the release notes I don't see it either

kinvaris added the type_bug label Jan 5, 2017

wimpers assigned kinvaris Jan 16, 2017

wimpers added the state_question label Jan 16, 2017

kinvaris removed the state_question label Feb 14, 2017

wimpers added this to the Gilbert milestone Feb 23, 2017

wimpers removed this from the G milestone May 30, 2017

wimpers added this to the I milestone Jun 15, 2017

wimpers modified the milestones: I, J Oct 19, 2017

wimpers modified the milestones: J, Roadmap Mar 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decommissioning a broken backend takes too long #549

Decommissioning a broken backend takes too long #549

kinvaris commented Jan 5, 2017 •

edited by wimpers

Loading

kinvaris commented Jan 5, 2017 •

edited

Loading

kinvaris commented Jan 5, 2017

wimpers commented Jan 16, 2017 •

edited

Loading

kinvaris commented Feb 14, 2017 •

edited

Loading

domsj commented Feb 14, 2017

wimpers commented May 30, 2017

domsj commented May 30, 2017

Decommissioning a broken backend takes too long #549

Decommissioning a broken backend takes too long #549

Comments

kinvaris commented Jan 5, 2017 • edited by wimpers Loading

kinvaris commented Jan 5, 2017 • edited Loading

kinvaris commented Jan 5, 2017

wimpers commented Jan 16, 2017 • edited Loading

kinvaris commented Feb 14, 2017 • edited Loading

Steps to reproduce

Conclusion

domsj commented Feb 14, 2017

wimpers commented May 30, 2017

domsj commented May 30, 2017

kinvaris commented Jan 5, 2017 •

edited by wimpers

Loading

kinvaris commented Jan 5, 2017 •

edited

Loading

wimpers commented Jan 16, 2017 •

edited

Loading

kinvaris commented Feb 14, 2017 •

edited

Loading