-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install wget to avoid the busybox wget leaking defunct processes #159
Conversation
Please sign your commit. |
When running busybox wget v1.31.x to request an SSL page, there is a leak of defunct processes that eventually could hit the limit for fork syscalls, leading to the unavailability of main registry process. Installing wget from the alpine repository resolves the issue as this only affects the busybox distribution. Moreover, this should transparently resolve the issue for deployments using wget as healthcheck/readiness/liveness probe. xref: https://bugs.busybox.net/show_bug.cgi?id=15967 Closes distribution#158 Signed-off-by: aleskandro <[email protected]>
Done! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is busybox binary packed into alpine image kinda breaks my solar system brain -- I guess the galaxy brains that made this decision see something I can't.
Thanks for the fix. PTAL @thaJeztah
This All of that is custom, so I don't think we must install wget by default; it's not part of the main requirements of the image to run, and if needed should be installed by the user (extending the image) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see comment above)
Hi @theJeztah
Hi @thaJeztah, thanks for the quick reply. I get it, and the best would be to have the wget busybox applet fixed. However, as of today, there's no way for this image to execute a healthcheck of the registry as no tool is available to run HTTP(s) requests, except the bugged Is there another way users can exploit to execute an in-container healthcheck without extending the image? The following are leaked zombie processes left by the wget currently available in the image:
|
Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the |
Docker/Compose health checks are similar to |
Podman healthchecks are similar, which is what we use in https://github.com/openshift-qe/baremetal-qe-infra/pull/60/files#diff-546a6f0a9c42e667f953f658b9b1de750fe5ae500cf753a04c4871e24f45d08cR21-R27 In k8s, for sure, one can use an httpGet liveness/readiness probe and there's no need for executing a command within the container. |
@schildbach oh, I'm not saying there is anything wrong with the request - in fact I've approved the PR - I'm just curious. @thaJeztah the current wget binary is clearly DOS-ing the service when used in healthchecks. This needs fixing. |
Yeah, the busybox case should be fixed; it looks like it's not properly reaping (when running as PID1 ?). My issue is that we should try to keep the image minimal, and only install things that the image itself uses by default. While users may decide to set a healthcheck, such a healthcheck may use "any" tooling, so could be Most http-healthchecks I run into tend to only check on the loopback interface (as it's only an indicator if the software is ready to accept connections); TLS termination may even be outside of the container itself, and checking the container's external IP may not even be a good indicator to check from within the container itself, but of course there's many setups / scenarios possible. A quick fix is to start the container with docker run --init --name mycontainer -d --rm alpine:3.19 /bin/sleep inf
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com
docker exec -it mycontainer ps aux
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- /bin/sleep inf
7 root 0:00 /bin/sleep inf
40 root 0:00 ps aux But for more specific cases, the recommendation would be to configure the image with the right options, which can be a minimal Dockerfile that installs the tools needed for the user's needs, and to configure the healthcheck, e.g.; # syntax=docker/dockerfile:1
FROM registry:2
RUN apk add --no-cache wget
HEALTHCHECK CMD ["wget", "--spider", "http://localhost:5000"] |
Before I forget to mention; I think Alpine ships with its own fork of busybox (I recall that they carry various patches which may not (yet) have been accepted in upstream: https://git.alpinelinux.org/aports/tree/main/busybox?h=master); so it may also be worth opening a ticket in Alpine's issue tracker for this.
I think the busybox binary in alpine is purely as a most-minimal "bootstrap" for the image; provide a minimal userland (a shell, and some common utilities provided by busybox), but adding Alpines package-manager to install the actual tools you may need after that. |
Thanks @thaJeztah, in the specific case of the registry, it is the registry itself to be the PID1. Should we consider a fix in the registry also to handle the related signals? |
Heh, just had a lengthy chat in our internal Slack discussing this case because it's really a bit of a grey area, and to some amount a matter of "semantics";
From your earlier descriptions of busybox, I still feel it's a bug in busybox (which, after all, spins up the processes); pending that to be fixed, If the busybox maintainer consider it to not be a bug (perhaps it was not designed to be invoked this way?), and IF using the default |
Agreed 100%. All the discussion here is to have a solution until the bug is fixed upstream and downstream by the distros (I'd say it's in the upstream busybox, if not already fixed in the next versions, as the same issue happens on the ubuntu's build).
yeah, the point for the registry is that it's actually used as PID1 in this image's case. Although I agree, it would still be like a 'safety' mechanism when PID1 inherits a defunct process, as in this case. Should I close this PR or would we change it to something that is acceptable? I think some of the thoughts in this discussion should go in the issue at least, to inform other potential users making use of wget for healthchecking an https registry. |
I think we should close it as I dont think we'll reach a consensus that will lead to merging this in. Moving some of the discussed points to the original issue sounds like a good idea but I think it might not be needed as the links to this PR are displayed in the GH UI so I'd just leave it as is. |
When running busybox wget v1.31.x to request an SSL page, there is a leak of defunct processes that eventually could hit the limit for fork syscalls, leading to the unavailability of main registry process.
Installing wget from the alpine repository resolves the issue as this only affects the busybox distribution. Moreover, this should transparently resolve the issue for deployments
using wget as healthcheck/readiness/liveness probe.
xref: https://bugs.busybox.net/show_bug.cgi?id=15967
Closes #158