Install wget to avoid the busybox wget leaking defunct processes #159

aleskandro · 2024-03-04T14:53:31Z

When running busybox wget v1.31.x to request an SSL page, there is a leak of defunct processes that eventually could hit the limit for fork syscalls, leading to the unavailability of main registry process.

Installing wget from the alpine repository resolves the issue as this only affects the busybox distribution. Moreover, this should transparently resolve the issue for deployments
using wget as healthcheck/readiness/liveness probe.

xref: https://bugs.busybox.net/show_bug.cgi?id=15967

Closes #158

milosgajdos · 2024-03-04T17:39:38Z

Please sign your commit.

When running busybox wget v1.31.x to request an SSL page, there is a leak of defunct processes that eventually could hit the limit for fork syscalls, leading to the unavailability of main registry process. Installing wget from the alpine repository resolves the issue as this only affects the busybox distribution. Moreover, this should transparently resolve the issue for deployments using wget as healthcheck/readiness/liveness probe. xref: https://bugs.busybox.net/show_bug.cgi?id=15967 Closes distribution#158 Signed-off-by: aleskandro <[email protected]>

aleskandro · 2024-03-04T17:44:36Z

Done!

milosgajdos

Why is busybox binary packed into alpine image kinda breaks my solar system brain -- I guess the galaxy brains that made this decision see something I can't.

Thanks for the fix. PTAL @thaJeztah

thaJeztah · 2024-03-04T19:21:07Z

This wget binary is only used during build of the Dockerfile; as far as I can see, it's not used in the final image, unless someone decides to use it (I see some issues linked that use it in a healthcheck).

All of that is custom, so I don't think we must install wget by default; it's not part of the main requirements of the image to run, and if needed should be installed by the user (extending the image)

thaJeztah

(see comment above)

aleskandro · 2024-03-04T20:16:09Z

Hi @theJeztah

unless someone decides to use it (I see some issues linked that use it in a healthcheck).

All of that is custom, so I don't think we must install wget by default; it's not part of the main requirements of the image to run, and if needed should be installed by the user (extending the image)

Hi @thaJeztah, thanks for the quick reply. I get it, and the best would be to have the wget busybox applet fixed. However, as of today, there's no way for this image to execute a healthcheck of the registry as no tool is available to run HTTP(s) requests, except the bugged wget.

Is there another way users can exploit to execute an in-container healthcheck without extending the image?

The following are leaked zombie processes left by the wget currently available in the image:

└ $ docker exec -it myreg ps aux                                       
PID   USER     TIME  COMMAND
    1 root      0:00 registry serve /etc/docker/registry/config.yml
   40 root      0:00 [ssl_client]
   41 root      0:00 [ssl_client]
   74 root      0:00 [ssl_client]
   75 root      0:00 [ssl_client]
   88 root      0:00 [ssl_client]
   89 root      0:00 [ssl_client]
  102 root      0:00 [ssl_client]
  103 root      0:00 [ssl_client]
  104 root      0:00 ps aux

milosgajdos · 2024-03-04T20:47:58Z

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

schildbach · 2024-03-04T21:07:39Z

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

Docker/Compose health checks are similar to docker exec; the referenced executable needs to be available inside the container.

aleskandro · 2024-03-04T21:27:36Z

Is there any reason why you do in-container health checks and not leveraging some "higher leve/external/orchestration" means? I suspect the lure is the HEALTH Dockerfile instruction https://docs.docker.com/reference/dockerfile/#healthcheck?

Docker/Compose health checks are similar to docker exec; the referenced executable needs to be available inside the container.

Podman healthchecks are similar, which is what we use in https://github.com/openshift-qe/baremetal-qe-infra/pull/60/files#diff-546a6f0a9c42e667f953f658b9b1de750fe5ae500cf753a04c4871e24f45d08cR21-R27

In k8s, for sure, one can use an httpGet liveness/readiness probe and there's no need for executing a command within the container.

milosgajdos · 2024-03-04T21:28:20Z

@schildbach oh, I'm not saying there is anything wrong with the request - in fact I've approved the PR - I'm just curious.

@thaJeztah the current wget binary is clearly DOS-ing the service when used in healthchecks. This needs fixing.

thaJeztah · 2024-03-05T13:16:56Z

Yeah, the busybox case should be fixed; it looks like it's not properly reaping (when running as PID1 ?). My issue is that we should try to keep the image minimal, and only install things that the image itself uses by default. While users may decide to set a healthcheck, such a healthcheck may use "any" tooling, so could be curl or nc, or even a tool not using networking at all to decide if the container is healthy for their use-case.

Most http-healthchecks I run into tend to only check on the loopback interface (as it's only an indicator if the software is ready to accept connections); TLS termination may even be outside of the container itself, and checking the container's external IP may not even be a good indicator to check from within the container itself, but of course there's many setups / scenarios possible.

A quick fix is to start the container with --init, which makes docker insert tini (https://github.com/krallin/tini) as PID1 to handle reaping;

docker run --init --name mycontainer -d --rm alpine:3.19 /bin/sleep inf
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com
docker exec mycontainer wget --spider https://google.com

docker exec -it mycontainer ps aux
PID   USER     TIME  COMMAND
1 root      0:00 /sbin/docker-init -- /bin/sleep inf
7 root      0:00 /bin/sleep inf
40 root      0:00 ps aux

But for more specific cases, the recommendation would be to configure the image with the right options, which can be a minimal Dockerfile that installs the tools needed for the user's needs, and to configure the healthcheck, e.g.;

# syntax=docker/dockerfile:1

FROM registry:2
RUN apk add --no-cache wget
HEALTHCHECK CMD ["wget", "--spider", "http://localhost:5000"]

thaJeztah · 2024-03-05T13:26:38Z

xref: https://bugs.busybox.net/show_bug.cgi?id=15967

Before I forget to mention; I think Alpine ships with its own fork of busybox (I recall that they carry various patches which may not (yet) have been accepted in upstream: https://git.alpinelinux.org/aports/tree/main/busybox?h=master); so it may also be worth opening a ticket in Alpine's issue tracker for this.

Why is busybox binary packed into alpine image kinda breaks my solar system brain -- I guess the galaxy brains that made this decision see something I can't.

I think the busybox binary in alpine is purely as a most-minimal "bootstrap" for the image; provide a minimal userland (a shell, and some common utilities provided by busybox), but adding Alpines package-manager to install the actual tools you may need after that.

aleskandro · 2024-03-05T13:45:37Z

Thanks @thaJeztah, in the specific case of the registry, it is the registry itself to be the PID1. Should we consider a fix in the registry also to handle the related signals?

thaJeztah · 2024-03-05T15:28:28Z

Should we consider a fix in the registry also to handle the related signals?

Heh, just had a lengthy chat in our internal Slack discussing this case because it's really a bit of a grey area, and to some amount a matter of "semantics";

--init should normally be considered an "escape hatch" for a PID1 in a container misbehaving (if process is running as PID1, it should properly handle cleanup)
The grey area here is that registry (as far as I can see) is not "misbehaving"; it does handle what it's expected to do; run a registry, and handle that. It's not expected to do more than that, or not expected to handle processes it didn't create.
The situation happening here is that something went behind its back and shoved some processes underneath it, and those processes (seemingly) expecting something else to handle reaping.

From your earlier descriptions of busybox, I still feel it's a bug in busybox (which, after all, spins up the processes); pending that to be fixed, --init can be a proper workaround (or installing a different tool for the job, such as the apk add wget). From the perspective of the registry's core responsibility, adding that functionality to the registry code would be an "enhancement", not so much a bugfix, and (TBD) could be considered out of scope of its feature-set.

If the busybox maintainer consider it to not be a bug (perhaps it was not designed to be invoked this way?), and IF using the default wget is expected to be the default way to perform health-checks in this image, then an alternative could be to include tini in the container as PID1 to make sure the image has a reaper by default.

aleskandro · 2024-03-05T15:38:35Z

From your earlier descriptions of busybox, I still feel it's a bug in busybox

Agreed 100%. All the discussion here is to have a solution until the bug is fixed upstream and downstream by the distros (I'd say it's in the upstream busybox, if not already fixed in the next versions, as the same issue happens on the ubuntu's build).

If the busybox maintainer consider it to not be a bug (perhaps it was not designed to be invoked this way?), and IF using the default wget is expected to be the default way to perform health-checks in this image, then an alternative could be to include tini in the container as PID1 to make sure the image has a reaper by default.

yeah, the point for the registry is that it's actually used as PID1 in this image's case. Although I agree, it would still be like a 'safety' mechanism when PID1 inherits a defunct process, as in this case.

Should I close this PR or would we change it to something that is acceptable? I think some of the thoughts in this discussion should go in the issue at least, to inform other potential users making use of wget for healthchecking an https registry.

milosgajdos · 2024-03-06T09:35:10Z

Should I close this PR or would we change it to something that is acceptable? I think some of the thoughts in this discussion should go in the issue at least, to inform other potential users making use of wget for healthchecking an https registry.

I think we should close it as I dont think we'll reach a consensus that will lead to merging this in.

Moving some of the discussed points to the original issue sounds like a good idea but I think it might not be needed as the links to this PR are displayed in the GH UI so I'd just leave it as is.

aleskandro mentioned this pull request Mar 4, 2024

The wget binary leaks defunct ssl_client processes #158

Open

aleskandro force-pushed the fix-wget branch from 0cd8640 to a76a0e5 Compare March 4, 2024 17:44

milosgajdos approved these changes Mar 4, 2024

View reviewed changes

thaJeztah requested changes Mar 4, 2024

View reviewed changes

aleskandro closed this Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install wget to avoid the busybox wget leaking defunct processes #159

Install wget to avoid the busybox wget leaking defunct processes #159

aleskandro commented Mar 4, 2024

milosgajdos commented Mar 4, 2024

aleskandro commented Mar 4, 2024

milosgajdos left a comment

thaJeztah commented Mar 4, 2024

thaJeztah left a comment

aleskandro commented Mar 4, 2024

milosgajdos commented Mar 4, 2024

schildbach commented Mar 4, 2024

aleskandro commented Mar 4, 2024 •

edited

Loading

milosgajdos commented Mar 4, 2024

thaJeztah commented Mar 5, 2024

thaJeztah commented Mar 5, 2024

aleskandro commented Mar 5, 2024

thaJeztah commented Mar 5, 2024

aleskandro commented Mar 5, 2024 •

edited

Loading

milosgajdos commented Mar 6, 2024

Install wget to avoid the busybox wget leaking defunct processes #159

Install wget to avoid the busybox wget leaking defunct processes #159

Conversation

aleskandro commented Mar 4, 2024

milosgajdos commented Mar 4, 2024

aleskandro commented Mar 4, 2024

milosgajdos left a comment

Choose a reason for hiding this comment

thaJeztah commented Mar 4, 2024

thaJeztah left a comment

Choose a reason for hiding this comment

aleskandro commented Mar 4, 2024

milosgajdos commented Mar 4, 2024

schildbach commented Mar 4, 2024

aleskandro commented Mar 4, 2024 • edited Loading

milosgajdos commented Mar 4, 2024

thaJeztah commented Mar 5, 2024

thaJeztah commented Mar 5, 2024

aleskandro commented Mar 5, 2024

thaJeztah commented Mar 5, 2024

aleskandro commented Mar 5, 2024 • edited Loading

milosgajdos commented Mar 6, 2024

aleskandro commented Mar 4, 2024 •

edited

Loading

aleskandro commented Mar 5, 2024 •

edited

Loading