Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(autoheal): attempt healing complex asymmetric partitions #240

Merged
merged 10 commits into from
Dec 18, 2024

Conversation

keynslug
Copy link
Contributor

@keynslug keynslug commented Dec 13, 2024

Fixes EMQX-13693.

Also introduce additional retry mechanisms:
* On autoheal process crash.
* On unreachable nodes during gathering cluster views.
Assuming complex asymmetric cluster partition may need more than one
iteration to heal completely. Therefore, ensure that autoheal kicks in
when only part of the partition has been healed.
@keynslug keynslug force-pushed the fix/EEC-112/autoheal-asymm branch from e9cf087 to 9c98202 Compare December 13, 2024 17:40
@keynslug keynslug force-pushed the fix/EEC-112/autoheal-asymm branch from 9c98202 to f93e5e2 Compare December 13, 2024 17:50
@keynslug keynslug marked this pull request as ready for review December 13, 2024 17:54
thalesmg
thalesmg previously approved these changes Dec 13, 2024
src/ekka_autoheal.erl Outdated Show resolved Hide resolved
src/ekka_autoheal.erl Outdated Show resolved Hide resolved
Specifically, do not perform autoheal in small steps because it may lead
to cluster inconsistencies. Sometimes Mnesia decides that cluster
partitions are healed too early, before coordinator has a chance to
reboot all of the nodes that have been partitioned off the cluster.
src/ekka_autoheal.erl Outdated Show resolved Hide resolved
Specifically, stop using locality as a sorting criterion and avoid
deduplicating entries for better diagnostics.
@keynslug keynslug merged commit 31017f9 into emqx:main-emqx-4.3 Dec 18, 2024
2 checks passed
@keynslug keynslug deleted the fix/EEC-112/autoheal-asymm branch December 18, 2024 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants