Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not run apt update or simulate apt dist-upgrade #181

Merged
merged 1 commit into from
Oct 16, 2023

Conversation

anarcat
Copy link
Contributor

@anarcat anarcat commented Oct 13, 2023

This is causing all sorts of problems. The first of which is that we're hitting our poor mirrors every time the script is ran, which, in the Debian package configuration, is every 15 minutes (!!).

The second is that this locks the cache and makes this script needlessly stumble upon a possible regression in APT from Debian bookworm and Ubuntu 22.06:

https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851

That still has to be confirmed: it's possible that apt update can hang for a long time, but that shouldn't concern us if we delegate this work out of band.

I also do not believe actually performing the dist-upgrade calculations is necessary to compute the pending upgrades at all. I've done work with python-apt for other projects and haven't found that to be required: the cache has the necessary information about pending upgrades.

Closes: #179

@anarcat anarcat force-pushed the no-apt-update-upgrade branch from dc995a9 to 3b17a4d Compare October 13, 2023 15:54
@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2023

additional data point, output is similar before/after here:

root@rdsys-test-01:~# /usr/share/prometheus-node-exporter-collectors/apt_info.py
# HELP apt_upgrades_pending Apt packages pending updates by origin.
# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{origin="",arch=""} 0
# HELP apt_upgrades_held Apt packages pending updates but held back.
# TYPE apt_upgrades_held gauge
apt_upgrades_held{origin="",arch=""} 0
# HELP apt_autoremove_pending Apt packages pending autoremoval.
# TYPE apt_autoremove_pending gauge
apt_autoremove_pending 2
# HELP node_reboot_required Node reboot is required for software updates.
# TYPE node_reboot_required gauge
node_reboot_required 0
root@rdsys-test-01:~# time python3 ./apt_info.py.orig
# HELP apt_upgrades_pending Apt packages pending updates by origin.
# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{origin="",arch=""} 0
# HELP apt_upgrades_held Apt packages pending updates but held back.
# TYPE apt_upgrades_held gauge
apt_upgrades_held{origin="",arch=""} 0
# HELP apt_autoremove_pending Apt packages pending autoremoval.
# TYPE apt_autoremove_pending gauge
apt_autoremove_pending 2
# HELP node_reboot_required Node reboot is required for software updates.
# TYPE node_reboot_required gauge
node_reboot_required 0

real	0m1.841s
user	0m1.066s
sys	0m0.179s
root@rdsys-test-01:~# time /usr/share/prometheus-node-exporter-collectors/apt_info.py
# HELP apt_upgrades_pending Apt packages pending updates by origin.
# TYPE apt_upgrades_pending gauge
apt_upgrades_pending{origin="",arch=""} 0
# HELP apt_upgrades_held Apt packages pending updates but held back.
# TYPE apt_upgrades_held gauge
apt_upgrades_held{origin="",arch=""} 0
# HELP apt_autoremove_pending Apt packages pending autoremoval.
# TYPE apt_autoremove_pending gauge
apt_autoremove_pending 2
# HELP node_reboot_required Node reboot is required for software updates.
# TYPE node_reboot_required gauge
node_reboot_required 0

real	0m0.458s
user	0m0.424s
sys	0m0.033s

the script also runs almost 4 times faster, naturally...

of course, the above is a rather dumb example, with no pending upgrades... i'll check fleet wide if there are changes shortly.

update: well. we don't have any pending upgrades right now, so that's zero fleet-wide, but i'm pretty confident this will Just Work, especially given jak's comment in the Debian bug tracker: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028212#62

@dswarbrick
Copy link
Member

dswarbrick commented Oct 13, 2023

I was never a fan of this script updating the cache, and suggested on at least one occasion that we'd be better off running the script as an unprivileged user by default, just so that it wouldn't try to update the cache.

OTOH, there will be people who run this script without asynchronous updating of the cache (via e.g. apticron or unattended-upgrades). I'd sorta prefer to put this cache-updating code behind a command-line switch (disabled by default), for such cases. And to address the issue of it hitting the repo servers too frequently, how about something like taking modulo 86400 of the current Unix timestamp, to ensure it only hits the repos once per day, regardless of how often it is run?

anarcat added a commit to anarcat/node-exporter-textfile-collector-scripts that referenced this pull request Oct 13, 2023
We use the `pkgcache.bin` modification time as a heuristic, but there
might be a better way to do this.

We also silently ignore errors when pulling that file to avoid broken
systems from breaking the other metrics. I believe this will lead to a
null metric which is probably what we want anyway.

A possible improvement to this would be to individually inspect the
`InRelease` files and report the mirror's timestamps as well, but
that's more involved. It's also not a priority compared to this fix,
because we want the update timestamp if we remove the `apt update`
run (in prometheus-community#181).

Closes: prometheus-community#180

Signed-off-by: Antoine Beaupré <[email protected]>
@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2023 via email

@dswarbrick
Copy link
Member

dswarbrick commented Oct 13, 2023

I don't see that it would be too complex to simply ensure that the cache update is not performed unless explicitly opted into (for those who need it). By making the flag disabled by default, no extra changes to the .service would be needed. It would merely be a behaviour change (albeit one that could be reverted, unlike stripping out the functionality entirely).

I suggested such a flag several months ago: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1028212#10

This repo has no tags, so is effectively a rolling release. Therefore what Debian stable or any other distro which packages these scripts does is up to that distro.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2023 via email

@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2023

Also, I did a bit of research and the normal way of ensuring the package list is up to date on APT systems is to enable the APT::Periodic::Update-Package-Lists setting in apt.conf. That's as simple as:

echo 'APT::Periodic::Update-Package-Lists "1";' > /etc/apt/apt.conf.d/99_auto_apt_update.conf

I hardly see why this specific script should do that job instead...

@anarcat
Copy link
Contributor Author

anarcat commented Oct 13, 2023

I think one key concept that I might not have mentioned here is that this script reports on two different tasks: updating the package list and actually upgrading pending packages. The latter can be done pretty much any time an admin passes by a server or unattended-upgrades runs, and it's worth running often to check for that. The former, however, is needless to run repeatedly because mirrors themselves (heck, ftp-master.debian.org even) do not upgrade that often (dinstall runs every 15 minutes if my memory is correct).

So we really shouldn't do any action here: just like this script doesn't (and shouldn't!) perform actual upgrades, I don't think it should update the package list either. I really like the idea of having the script deliberately readonly, and I think it's going to make everything simpler if we keep it like that.

But again, if you want to take this further, feel free... I just don't want to get bogged down in those implementation details right now.

@SuperQ
Copy link
Contributor

SuperQ commented Oct 13, 2023

Yes, I would prefer users update the apt-cache on their own schedule and this script does not. It's much better to have the package cache run async from the script and avoid side effects.

Even behind a flag, I don't think it's good to include it. There are lots of better ways to deal with updating the apt package cache.

@SuperQ SuperQ requested a review from dswarbrick October 13, 2023 19:09
Copy link
Contributor

@SuperQ SuperQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding the apt config notes as a documentation comment to the top of the script?

@anarcat anarcat force-pushed the no-apt-update-upgrade branch from 3b17a4d to fddbd16 Compare October 14, 2023 02:07
@anarcat
Copy link
Contributor Author

anarcat commented Oct 14, 2023

Would you mind adding the apt config notes as a documentation comment to the top of the script?

done.

anarcat added a commit to anarcat/node-exporter-textfile-collector-scripts that referenced this pull request Oct 14, 2023
We use the `pkgcache.bin` modification time as a heuristic, but there
might be a better way to do this.

We also silently ignore errors when pulling that file to avoid broken
systems from breaking the other metrics. I believe this will lead to a
null metric which is probably what we want anyway.

A possible improvement to this would be to individually inspect the
`InRelease` files and report the mirror's timestamps as well, but
that's more involved. It's also not a priority compared to this fix,
because we want the update timestamp if we remove the `apt update`
run (in prometheus-community#181).

Closes: prometheus-community#180

Signed-off-by: Antoine Beaupré <[email protected]>
Co-authored-by: Ben Kochie <[email protected]>
@anarcat anarcat requested a review from SuperQ October 14, 2023 02:33
apt_info.py Outdated Show resolved Hide resolved
apt_info.py Outdated Show resolved Hide resolved
@anarcat anarcat marked this pull request as draft October 15, 2023 01:56
This is causing all sorts of problems. The first of which is that
we're hitting our poor mirrors every time the script is ran, which, in
the Debian package configuration, is *every 15 minutes* (!!).

The second is that this locks the cache and makes this script
needlessly stumble upon a possible regression in APT from Debian
bookworm and Ubuntu 22.06:

https://bugs.launchpad.net/ubuntu/+source/apt/+bug/2003851

That still has to be confirmed: it's possible that `apt update` can
hang for a long time, but that shouldn't concern us if we delegate
this work out of band.

I also do not believe actually performing the `dist-upgrade`
calculations is necessary to compute the pending upgrades at all. I've
done work with python-apt for other projects and haven't found that to
be required: the cache has the necessary information about pending
upgrades.

Closes: prometheus-community#179

Signed-off-by: Antoine Beaupré <[email protected]>
Co-authored-by: Daniel Swarbrick <[email protected]>
@anarcat anarcat force-pushed the no-apt-update-upgrade branch from e3a34c0 to 524dbae Compare October 15, 2023 01:57
@anarcat anarcat marked this pull request as ready for review October 15, 2023 01:58
@anarcat anarcat requested a review from dswarbrick October 15, 2023 01:58
@anarcat
Copy link
Contributor Author

anarcat commented Oct 15, 2023

thanks for the review and (tacit?) approval, @dswarbrick :) i integrated your changes in the commit with credit in the commitlog.

@anarcat
Copy link
Contributor Author

anarcat commented Oct 16, 2023

i think this is now ready to merge, @SuperQ can you push the magic button? :)

@SuperQ SuperQ merged commit 66010f0 into prometheus-community:master Oct 16, 2023
3 checks passed
@anarcat anarcat deleted the no-apt-update-upgrade branch October 17, 2023 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

apt_info.py can hang for hours
3 participants