feat: Possibility to Bootstrap from different node after full crash #98

sebadob · 2025-01-01T17:51:29Z

If we get into the situation where an existing cluster crashes before a new leader could have been elected, and maybe the leader volume got corrupted and cannot be recovered, a cold start after everything works again could get stuck.

Let's say we have nodes 1, 2, 3 and node 1 is the current leader.
If all nodes crash at the exact same time and the volume of node 1 get corrupted so badly, that it cannot be recovered and it happened in the middle of a log replication, so not all nodes are on the exact same log id, then nodes 2 and 3 could get into a situation where they try to re-connect to node 1 (which is dead), because they are lagging behind in log id.
If a situation like this appears, we need a way to force another node to become the new leader, basically ignoring any log id they have not received yet, even though they know that the leader had a higher one just before the crash.

This situaion is super rare, but I have been able to produce it in manual testing, even though it needed a few tries to get into it, even on purpose. However, a solution for something like this should exist.

sebadob added the enhancement New feature or request label Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Possibility to Bootstrap from different node after full crash #98

feat: Possibility to Bootstrap from different node after full crash #98

sebadob commented Jan 1, 2025

feat: Possibility to Bootstrap from different node after full crash #98

feat: Possibility to Bootstrap from different node after full crash #98

Comments

sebadob commented Jan 1, 2025