Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico-node Frequently enters the Completed state and restarts #9524

Open
sunminming opened this issue Nov 24, 2024 · 0 comments
Open

Calico-node Frequently enters the Completed state and restarts #9524

sunminming opened this issue Nov 24, 2024 · 0 comments

Comments

@sunminming
Copy link

root@worker-01:/home/sunminming# kubectl -n kube-system describe pod calico-node-cbd5s
Name:                 calico-node-cbd5s
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      calico-node
Node:                 k8s-10-10-40-33/10.10.40.33
Start Time:           Sun, 24 Nov 2024 15:42:12 +0800
Labels:               controller-revision-hash=56f9dcc8f
                      k8s-app=calico-node
                      pod-template-generation=2
Annotations:          <none>
Status:               Running
IP:                   10.10.40.33
IPs:
  IP:           10.10.40.33
Controlled By:  DaemonSet/calico-node
Init Containers:
  install-cni:
    Container ID:  containerd://3ae90d7f54c60c7e647000b001264bc6886fbe27d6fb3085d0e3bbf98574a767
    Image:         easzlab.io.local:5000/calico/cni:v3.26.4
    Image ID:      sha256:17d35f5bad38f1d00ee41111d6655540797ec5011740a733b706b4717d300ede
    Port:          <none>
    Host Port:     <none>
    Command:
      /opt/cni/bin/install
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:06:49 +0800
      Finished:     Sun, 24 Nov 2024 16:06:50 +0800
    Ready:          True
    Restart Count:  7
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      CNI_CONF_NAME:       10-calico.conflist
      CNI_NETWORK_CONFIG:  <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
      ETCD_ENDPOINTS:      <set to the key 'etcd_endpoints' of config map 'calico-config'>      Optional: false
      CNI_MTU:             <set to the key 'veth_mtu' of config map 'calico-config'>            Optional: false
      SLEEP:               false
    Mounts:
      /calico-secrets from etcd-certs (rw)
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
 mount-bpffs:
    Container ID:  containerd://b322c5f181a565125ce28171c209dbfa9cc11f14b7b4f66853f5d0cc95f01f64
    Image:         easzlab.io.local:5000/calico/node:v3.26.4
    Image ID:      sha256:ded66453eb630bd4d4efddee2ccf290cbca4c67bca07c2d53c35c35dd0251136
    Port:          <none>
    Host Port:     <none>
    Command:
      calico-node
      -init
      -best-effort
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:06:51 +0800
      Finished:     Sun, 24 Nov 2024 16:06:51 +0800
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /nodeproc from nodeproc (ro)
      /sys/fs from sys-fs (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
Containers:
  calico-node:
    Container ID:   containerd://bac41554456221baba6f963d8c84fccb7bf9cf5fc38d744263761e5f84858d98
    Image:          easzlab.io.local:5000/calico/node:v3.26.4
    Image ID:       sha256:ded66453eb630bd4d4efddee2ccf290cbca4c67bca07c2d53c35c35dd0251136
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 24 Nov 2024 16:03:51 +0800
      Finished:     Sun, 24 Nov 2024 16:06:48 +0800
    Ready:          False
    Restart Count:  6
    Limits:
      cpu:     500m
      memory:  1Gi
    Requests:
      cpu:      500m
      memory:   1Gi
    Liveness:   exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=10s period=10s #success=1 #failure=6
    Readiness:  exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=10s period=10s #success=1 #failure=3
    Environment Variables from:
      kubernetes-services-endpoint  ConfigMap  Optional: true
    Environment:
      ETCD_ENDPOINTS:                     <set to the key 'etcd_endpoints' of config map 'calico-config'>  Optional: false
      ETCD_CA_CERT_FILE:                  <set to the key 'etcd_ca' of config map 'calico-config'>         Optional: false
      ETCD_KEY_FILE:                      <set to the key 'etcd_key' of config map 'calico-config'>        Optional: false
      ETCD_CERT_FILE:                     <set to the key 'etcd_cert' of config map 'calico-config'>       Optional: false
      CALICO_K8S_NODE_REF:                 (v1:spec.nodeName)
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       k8s,bgp
      IP:                                 autodetect
      IP_AUTODETECTION_METHOD:            can-reach=10.10.40.34
      CALICO_IPV4POOL_IPIP:               Always
      FELIX_IPINIPMTU:                    <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_VXLANMTU:                     <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      FELIX_WIREGUARDMTU:                 <set to the key 'veth_mtu' of config map 'calico-config'>  Optional: false
      CALICO_IPV4POOL_CIDR:               172.20.0.0/16
      CALICO_DISABLE_FILE_LOGGING:        true
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      FELIX_IPV6SUPPORT:                  false
      FELIX_HEALTHENABLED:                true
      FELIX_KUBENODEPORTRANGES:           30000:32767
      FELIX_PROMETHEUSMETRICSENABLED:     false
    Mounts:
      /calico-secrets from etcd-certs (rw)
      /host/etc/cni/net.d from cni-net-dir (rw)
      /lib/modules from lib-modules (ro)
      /run/xtables.lock from xtables-lock (rw)
      /sys/fs/bpf from bpffs (rw)
      /var/lib/calico from var-lib-calico (rw)
      /var/log/calico/cni from cni-log-dir (ro)
      /var/run/calico from var-run-calico (rw)
      /var/run/nodeagent from policysync (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zrfvj (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  xtables-lock:
    Type:          HostPath (bare host directory volume)
    Path:          /run/xtables.lock
    HostPathType:  FileOrCreate
  sys-fs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/
    HostPathType:  DirectoryOrCreate
  bpffs:
    Type:          HostPath (bare host directory volume)
    Path:          /sys/fs/bpf
    HostPathType:  Directory
  nodeproc:
    Type:          HostPath (bare host directory volume)
    Path:          /proc
    HostPathType:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  cni-log-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/calico/cni
    HostPathType:
  etcd-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-etcd-secrets
    Optional:    false
  policysync:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/nodeagent
    HostPathType:  DirectoryOrCreate
  kube-api-access-zrfvj:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              kubernetes.io/os=linux
Tolerations:                 :NoSchedule op=Exists
                             :NoExecute op=Exists
                             CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  24m   default-scheduler  Successfully assigned kube-system/calico-node-cbd5s to k8s-10-10-40-33
  Warning  Unhealthy  24m   kubelet            Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:42:16.221537      48 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  24m  kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:42:18.006339     124 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  24m  kubelet  Readiness probe failed: 2024-11-24 07:42:22.723 [INFO][253] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.10.40.34
W1124 07:42:22.715553     253 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Normal   Killing         23m                kubelet  Stopping container calico-node
  Normal   Pulled          23m (x2 over 24m)  kubelet  Container image "easzlab.io.local:5000/calico/cni:v3.26.4" already present on machine
  Normal   Created         23m (x2 over 24m)  kubelet  Created container install-cni
  Normal   Started         23m (x2 over 24m)  kubelet  Started container install-cni
  Normal   SandboxChanged  23m                kubelet  Pod sandbox changed, it will be killed and re-created.
  Normal   Pulled          23m (x2 over 24m)  kubelet  Container image "easzlab.io.local:5000/calico/node:v3.26.4" already present on machine
  Normal   Started         23m (x2 over 24m)  kubelet  Started container mount-bpffs
  Normal   Created         23m (x2 over 24m)  kubelet  Created container mount-bpffs
  Normal   Started         23m (x2 over 24m)  kubelet  Started container calico-node
  Normal   Created         23m (x2 over 24m)  kubelet  Created container calico-node
  Warning  Unhealthy       23m                kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:43:43.579392      99 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  Unhealthy  23m  kubelet  Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
W1124 07:43:44.185882     142 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Normal   Pulled     19m (x4 over 24m)     kubelet  Container image "easzlab.io.local:5000/calico/node:v3.26.4" already present on machine
  Warning  Unhealthy  9m57s (x37 over 19m)  kubelet  (combined from similar events): Readiness probe failed: 2024-11-24 07:57:12.867 [INFO][717] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.10.40.34
W1124 07:57:12.779037     717 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
  Warning  BackOff  4m48s (x24 over 23m)  kubelet  Back-off restarting failed container calico-node in pod calico-node-cbd5s_kube-system(588856b9-3c97-4630-bd0b-39e66b0c24e8)
 2024-11-24 08:04:54.984 [INFO][100] monitor-addresses/reachaddr.go 47: Auto-detected address by connecting to remote Destination="10.10.40.34" IP=10.10.40.33
2024-11-24 08:04:54.985 [INFO][100] monitor-addresses/autodetection_methods.go 143: Using autodetected IPv4 address 10.10.40.33/22, detected by connecting to 10.10.40.34
2024-11-24 08:04:59.161 [INFO][104] felix/summary.go 100: Summarising 20 dataplane reconciliation loops over 1m4s: avg=50ms longest=516ms (resync-filter-v4,resync-ipsets-v4,resync-mangle-v4,resync-nat-v4,resync-raw-v4,resync-routes-v4,resync-routes-v4,resync-rules-v4,update-filter-v4,update-ipsets-4,update-mangle-v4,update-nat-v4,update-raw-v4)
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1289: Linux interface state changed. ifIndex=74 ifaceName="nodelocaldns" state=""
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1325: Linux interface addrs changed. addrs=<nil> ifaceName="nodelocaldns"
2024-11-24 08:05:16.243 [INFO][104] felix/iface_monitor.go 235: Netlink address update but interface isn't yet known.  Will handle when interface is signalled. addr="169.254.20.10" exists=false ifIndex=74
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"nodelocaldns", State:"", Index:74}
2024-11-24 08:05:16.243 [INFO][104] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Set[string](nil)}
2024-11-24 08:05:16.243 [INFO][104] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Set[string](nil)}
2024-11-24 08:05:16.243 [INFO][104] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2024-11-24 08:05:16.245 [INFO][104] felix/ipsets.go 778: Doing full IP set rewrite family="inet" numMembersInPendingReplace=5 setID="this-host"
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1289: Linux interface state changed. ifIndex=77 ifaceName="nodelocaldns" state="down"
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1325: Linux interface addrs changed. addrs=set.Set{169.254.20.10} ifaceName="nodelocaldns"
2024-11-24 08:05:36.394 [INFO][104] felix/iface_monitor.go 238: Netlink address update for known interface.  addr="169.254.20.10" exists=true ifIndex=77
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1893: Received interface update msg=&intdataplane.ifaceStateUpdate{Name:"nodelocaldns", State:"down", Index:77}
2024-11-24 08:05:36.394 [INFO][104] felix/int_dataplane.go 1913: Received interface addresses update msg=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Typed[string]{"169.254.20.10":set.v{}}}
2024-11-24 08:05:36.394 [INFO][104] felix/hostip_mgr.go 84: Interface addrs changed. update=&intdataplane.ifaceAddrsUpdate{Name:"nodelocaldns", Addrs:set.Typed[string]{"169.254.20.10":set.v{}}}
2024-11-24 08:05:36.395 [INFO][104] felix/ipsets.go 130: Queueing IP set for creation family="inet" setID="this-host" setType="hash:ip"
2024-11-24 08:05:36.396 [INFO][104] felix/ipsets.go 778: Doing full IP set rewrite family="inet" numMembersInPendingReplace=6 setID="this-host"
2024-11-24 08:05:54.988 [INFO][100] monitor-addresses/reachaddr.go 47: Auto-detected address by connecting to remote Destination="10.10.40.34" IP=10.10.40.33
2024-11-24 08:05:54.990 [INFO][100] monitor-addresses/autodetection_methods.go 143: Using autodetected IPv4 address 10.10.40.33/22, detected by connecting to 10.10.40.34
bird: Mesh_10_10_40_34: State changed to stop
bird: Mesh_10_10_40_34: State changed to down
bird: Mesh_10_10_40_34: Starting
bird: Mesh_10_10_40_34: State changed to start
2024-11-24 08:06:03.090 [INFO][104] felix/summary.go 100: Summarising 13 dataplane reconciliation loops over 1m3.9s: avg=20ms longest=70ms (resync-filter-v4)

Expected Behavior

keep Running

Current Behavior

completed and restart

Possible Solution

Steps to Reproduce (for bugs)

Context

Your Environment

  • Calico version 3.26.4
  • Calico dataplane (iptables, windows etc.) dataplane
  • Orchestrator version (e.g. kubernetes, mesos, rkt): 1.30.1
  • Operating System and version:Linux worker-01 6.5.0-18-generic Clean up k8s docs #18~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Wed Feb 7 11:40:03 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
  • Link to your project (optional):
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant