Extend replicaStatus object with information about evicted pods caused by exceeding emptyDir sizeLimit #621

nilsgstrabo · 2024-04-23T12:04:16Z

Related to use ReadOnlyFileSystem

If an emptyDir in a container exceeds the sizeLimit, Kubernetes will forcefully kill the container and set the pod phase to Failed. K8S then creates a new Pod to run the container istead of reusing the existing Pod.

We should include info about these events (stored in pod.status) in replicaList returned by radix-api.

Another issue is that these failed pods interfere with the caulculation of the component status. A pod in a failed phase due to emptyDir violations will cause the component status to be Reconciling. Not sure what the status should be. The user should be able to easily see that there are issuer, but I feel that Reconciling is wrong. A component can be in one of the following statuses: "Stopped", "Consistent", "Reconciling", "Restarting", "Outdated". Not sure if any of the fit this situation.

Also, it would be useful for the user to be able to cleanup(delete?) Pods in failed state.

An example of a failed Pod:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2024-04-23T11:26:40Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2024-04-23T11:28:42Z"
    reason: PodFailed
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2024-04-23T11:28:42Z"
    reason: PodFailed
    status: "False"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2024-04-23T11:26:40Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://ae1a03c55355b5f5bc29b6c6990727ee87a6d8b845e2d8a795487d78bc193712
    image: radixdev.azurecr.io/oauth-demo-dev-simple:e2pmj
    imageID: radixdev.azurecr.io/oauth-demo-dev-simple@sha256:ce0827dd93e2dc2d96ac7a941cc2bbf9687ec5e93a8c8218a1d90784c8484859
    lastState: {}
    name: simple
    ready: false
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://ae1a03c55355b5f5bc29b6c6990727ee87a6d8b845e2d8a795487d78bc193712
        exitCode: 137
        finishedAt: "2024-04-23T11:28:42Z"
        reason: Error
        startedAt: "2024-04-23T11:26:41Z"
  hostIP: 10.5.3.108
  message: 'Usage of EmptyDir volume "radix-vm-tmp" exceeds the limit "5M". '
  phase: Failed
  podIP: 10.5.3.130
  podIPs:
  - ip: 10.5.3.130
  qosClass: Burstable
  reason: Evicted
  startTime: "2024-04-23T11:26:40Z"

The text was updated successfully, but these errors were encountered:

emirgens · 2024-09-10T12:54:40Z

Investigate if Pod retention period can be used
https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/

nilsgstrabo added the refinement needed label Apr 23, 2024

emirgens added refinement needed and removed refinement needed labels Sep 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend replicaStatus object with information about evicted pods caused by exceeding emptyDir sizeLimit #621

Extend replicaStatus object with information about evicted pods caused by exceeding emptyDir sizeLimit #621

nilsgstrabo commented Apr 23, 2024 •

edited by emirgens

Loading

emirgens commented Sep 10, 2024 •

edited

Loading

Extend replicaStatus object with information about evicted pods caused by exceeding emptyDir sizeLimit #621

Extend replicaStatus object with information about evicted pods caused by exceeding emptyDir sizeLimit #621

Comments

nilsgstrabo commented Apr 23, 2024 • edited by emirgens Loading

emirgens commented Sep 10, 2024 • edited Loading

nilsgstrabo commented Apr 23, 2024 •

edited by emirgens

Loading

emirgens commented Sep 10, 2024 •

edited

Loading