You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was following the instructions for multi-node K3s from your documentation and installed the tigera-operator resource manifests for version 3.29.1 and then deployed my custom resource manifests of kind=Installation and kind=APIServer.
In my test setup, there is an edge node which fails the readiness probe for it's calico-node container because it can't use ipset for some reason:
2024-11-28 12:21:04.487 [ERROR][147075] felix/ipsets.go 671: Bad return code from 'ipset list -name'. error=exit status 1 family="inet" stderr="ipset v7.11: Kernel error received: Invalid argument\n" │
2024-11-28 12:21:04.487 [ERROR][147075] felix/ipsets.go 409: Failed to get the list of ipsets error=exit status 1 family="inet"
In turn, this triggers the tigera-operator being blocked with the deployment because it waits for ALL daemonset pods to report ready state. In result, calico is not rolled out and it looks like the CNI on the cluster is completely broken.
Expected Behavior
From my point of view this is a bug: In a distributed system, there's always a chance that a node is not working for some reason. In our case, the node in question is an IoT device where we have only limited control over the Linux Kernel (because it's part of a Linux distribution provided by the hardware vendor). The tigera-operator should tolerate the outage of nodes to some extent and just continue with the deployment.
In my case, I had to delete the edge node from the cluster to see the tigera-operator advancing with the rollout for the cloud nodes and then add the edge node back later.
I was following the instructions for multi-node K3s from your documentation and installed the tigera-operator resource manifests for version 3.29.1 and then deployed my custom resource manifests of kind=Installation and kind=APIServer.
In my test setup, there is an edge node which fails the readiness probe for it's calico-node container because it can't use ipset for some reason:
In turn, this triggers the tigera-operator being blocked with the deployment because it waits for ALL daemonset pods to report ready state. In result, calico is not rolled out and it looks like the CNI on the cluster is completely broken.
Expected Behavior
From my point of view this is a bug: In a distributed system, there's always a chance that a node is not working for some reason. In our case, the node in question is an IoT device where we have only limited control over the Linux Kernel (because it's part of a Linux distribution provided by the hardware vendor). The tigera-operator should tolerate the outage of nodes to some extent and just continue with the deployment.
In my case, I had to delete the edge node from the cluster to see the tigera-operator advancing with the rollout for the cloud nodes and then add the edge node back later.
Your Environment
The text was updated successfully, but these errors were encountered: