Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use throttle: 1 for localhost unarchive #498

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

travisdowns
Copy link
Contributor

During the common install tasks, we download and unpack the binary to install on localhost and eventually upload it to each host (among many other things). Currently the "unpack" step, which extracts the gzipped tar archive, is perform N times if there are N hosts in the inventory, but the target directory (something like
/tmp/node_exporter-linux-arm64/1.8.2) is the same for every host with the same architecture.

This means that all the unarchive tasks are extracting in parallel in an unsynchronized manner to the same directory. It a miracle that this works, but at least usually it like it does, but sometimes tar will complain about a missing file, failing the install. I've confirmed locally that this is due to the race described above.

To fix this, we use throttle: 1 on the unpack task. This means that the first task (for each architecture) will do the unpack and the other tasks are effectively no-ops, they are reported as "ok" rather than changed in the output.

Fixes #457.

During the common install tasks, we download and unpack the binary to
install on localhost and eventually upload it to each host (among many
other things). Currently the "unpack" step, which extracts the gzipped
tar archive, is perform N times if there are N hosts in the inventory,
but the target directory (something like
/tmp/node_exporter-linux-arm64/1.8.2) is the same for every host with
the same architecture.

This means that all the unarchive tasks are extracting in parallel in an
unsynchronized manner to the same directory. It a miracle that this
works, but at least usually it like it does, but sometimes tar will
complain about a missing file, failing the install. I've confirmed
locally that this is due to the race described above.

To fix this, we use `throttle: 1` on the unpack task. This means that
the first task (for each architecture) will do the unpack and the other
tasks are effectively no-ops, they are reported as "ok" rather than
changed in the output.

Fixes prometheus-community#457.

Signed-off-by: Travis Downs <[email protected]>
@github-actions github-actions bot added bugfix and removed bugfix labels Dec 20, 2024
@travisdowns
Copy link
Contributor Author

travisdowns commented Dec 20, 2024

I tested this locally with a file that is more likely to trigger the race: a tar of 10,000 small files all at the top level of the tar: this triggers the error 100% of the time w/o the fix and passes 1,000 iterations locally with no failures.

I also verified that deploying prom to a 4 host inventory works, 100 iterations without failure. This inventory has 2 hosts each for 2 different architectures (amd64 and arm64).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment