You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we are running Buildkit rootless in a Kubernetes installation and have defined a GC policy with keepBytes:
[[worker.oci.gcpolicy]]all = truekeepBytes = "250GB" # 50GB less than the PVC size for /home/user/.local/share/buildkit
But the Rule is not always triggered when we hit the limit. We tried to pin down the issue already and here are all the details we already found out.
GC Triggered based on Disk Usage
Most of the time, the GC is working fine and removes the cached data above the set limit, but from the time a buildkit instance, is running out of storage and responds with the following error:
error: failed to solve: ResourceExhausted: failed to prepare k4ovv028ht6dewfcgpus32fn7 as q40z7n0str2xd0ec1u7mjz1r7: copying of parent failed: failed to copy files: write /home/user/.local/share/buildkit/runc-native/snapshots/snapshots/new-19411978/usr/lib/x86_64-linux-gnu/libperl.so.5.36.0: copy_file_range: no space left on device time to time we run into the issue that the GC is not triggered and the buildkit instance is running out of storage
After some tests it looked like the buildctl disk usage command (buildctl du) did not report the correct amount for the actual disk usage (du). Since the buildctl reported disk usage was lower then the keepByte value set in the GC policy the GC was not triggered.
Disk Usage Reported by Buildkit based on type
Record Type,Size
source.local,27.36 MiB
regular,110.78 GiB
Total,110.80 GiB
When running the GC manually via buildctl prune the GC does cleanup all the space. Therefore the GC collector is working fine but it looks more like an issue with the measurement of the disk usage.
Wrong Permission within Cache Folder
What we also noticed during the analysis was that the permissions for some folders within the cache were not set as we would expect them to be.
running du does not work due to permission
du: can't open '/home/user/.local/share/buildkit/runc-native/snapshots/snapshots/1762/var/cache/apt/archives/partial': Permission denied
permission for the folder
~/.local/share/buildkit/runc-native/snapshots/snapshots/1762/var/cache/apt/archives $ ls -la
total 12
drwxr-xr-x 3 user user 4096 Oct 8 13:21 .
drwxr-xr-x 3 user user 4096 Oct 8 13:21 ..
-rw-r----- 1 user user 0 Aug 13 00:43 lock
drwx------ 2 100041 user 4096 Aug 13 00:43 partial
We also see other folder with similar permissions as the var/cache/apt/archives/partial, so this looks like not only something related to apt package manager.
But my volume mounted on /home/user/.local/share/buildkit and used only by buildkit is full at 96%, causing a no space left on disk error when trying to run a build task
EDIT:
Another observation: after restarted the pod (and resized volume), seem that the cleanup was performed
@devthejo what version of buildkit do you see this on? The original issue seems to be on v0.16, with rootless mode, is that the same setup you have?
@jedevc It was v0.13.0, I upgraded now to v0.17.1 and I'm waiting to see if it's reproducible on the new version (I was in the need to fix the bug quickly and didn't have enough time to investigate further). Not sure it's the same issue, but it looked like.
Hello,
we are running Buildkit rootless in a Kubernetes installation and have defined a GC policy with keepBytes:
But the Rule is not always triggered when we hit the limit. We tried to pin down the issue already and here are all the details we already found out.
GC Triggered based on Disk Usage
Most of the time, the GC is working fine and removes the cached data above the set limit, but from the time a buildkit instance, is running out of storage and responds with the following error:
After some tests it looked like the buildctl disk usage command (
buildctl du
) did not report the correct amount for the actual disk usage (du
). Since the buildctl reported disk usage was lower then thekeepByte
value set in the GC policy the GC was not triggered.Disk Usage Reported by Buildkit based on type
Disk Usage System
When running the GC manually via
buildctl prune
the GC does cleanup all the space. Therefore the GC collector is working fine but it looks more like an issue with the measurement of the disk usage.Wrong Permission within Cache Folder
What we also noticed during the analysis was that the permissions for some folders within the cache were not set as we would expect them to be.
running du does not work due to permission
permission for the folder
We also see other folder with similar permissions as the
var/cache/apt/archives/partial
, so this looks like not only something related to apt package manager.###Setup
We currently use the version 0.16 of the rootless container (https://hub.docker.com/layers/moby/buildkit/v0.16.0-rootless) in a K8s setup.
StatefulSet:
ConfigMap:
The text was updated successfully, but these errors were encountered: