Skip to content

Recovering from a failed node

ragsns edited this page Oct 11, 2021 · 9 revisions

3. Auto-Recovery from Failures

With K8ssandra, recovering from a failed node is very easy, because it takes care of the operations. As it fully automated, this exercise is going to be the shortest one, simple as 1, 2, 3:

✅ Step 1:Get the list of the pods:

kubectl get pods

✅ Step 2: Terminate a Cassandra pod

Terminate a Cassandra pod abruptly (for example, the one numbered 0 in the stateful set):

kubectl delete pod k8ssandra-dc1-default-sts-0 --grace-period 5

5 seconds is not enough for Cassandra to terminate gracefully so it will lead to an emergency stop.

✅ Step 3: Self Healing

Watch k8ssandra taking care of this emergency by matching the current state with the desired state.

watch kubectl get pods

You will see how it schedules a new pod and matches the desired state with the current state.

Hint To see the whole process including termination, you can execute this process like this:

kubectl delete pod k8ssandra-dc1-default-sts-0 --grace-period 10 &
watch kubectl get pods

Next Step

Proceed to the Step IV