Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup action removes packages still in use #30

Closed
dashmage opened this issue Jan 9, 2024 · 4 comments
Closed

Cleanup action removes packages still in use #30

dashmage opened this issue Jan 9, 2024 · 4 comments

Comments

@dashmage
Copy link

dashmage commented Jan 9, 2024

We found that the automatically-run cleanup action on a synchronise can remove packages that are still referenced by existing Packages lists. This was observed by deb files in all repositories except for "main" being removed, after running the synchronize action.

I believe it is due to [1] in the locate_package_indicies function. The line of code correctly iterates over all dists, but it then takes only the first Packages* file it can find, alphabetically. This just happens to be in the main repository, because universe, multiverse, restricted are all alphabetically larger.

[1]

package_indices.append([sorted(distro.glob("**/Packages*"))[0] for distro in d.glob("*")])

On a related note, this destroyed the snapshots as well as the main mirror because they symlink to the pool directory. This was very surprising! I guess the symlink is to avoid the snapshots taking up unnecessary space. Can I suggest an alternative to use hardlinks for the deb files in the snapshots? This way an unexpected deletion wouldn't happen, and no unnecessary disk space would be used.


Imported from Launchpad using lp2gh.

@dashmage
Copy link
Author

dashmage commented Jan 9, 2024

(by vultaire)
I've also reviewed this. I agree with Danny's findings. The code was likely written the way it was because there are often Packages, Packages.bz2 and Packages.gz files (maybe even Packages.xz as well?), and these all have the same real content, thus it makes sense to only parse one copy. However, the code in question is not actually looking just at the packages files in the same immediate directory, but across multiple directories within the same mirrored repository subfolder - thus the code unfortunately ends up discarding far more than it should.

@dashmage
Copy link
Author

dashmage commented Jan 9, 2024

(by vultaire)
I thought I had some code which could be used here, but upon further reflection my repository "health check" code actually ends up parsing the compressed Packages files as well as the uncompressed ones - thus it's not really a good reference for correcting the broken logic here.

I think the key thing is: for each directory which contains package files, at least one Package file should be parsed. Rather than searching for all Packages files within a “dists” folder, it’s necessary to recurse to the directories within the dists folder where the Packages files are directly contained.

Or, alternatively (and perhaps a better long term solution): when we create a snapshot, rather than the current method of symlinking the “pool” subdirs and copying everytihing else, it’s probably better to do a “cp -r --link”-style copy of the pool.. That is, for files within the pool directory, create hard links rather than normal copies. This has the benefit of cleanup being implicit via the filesystem - the hard-linked “copies” don’t take up any extra disk space beyond the little that the hard links themselves consume in the filesystem, and snapshot removal simply becomes a “rm -rf” of the snapshot directory.

…The above all stated, the first method is probably the simpler one to do with the current code.

@dashmage
Copy link
Author

dashmage commented Jan 9, 2024

(by rgildein)
I proposed fix in 1.


@rgildein
Copy link

rgildein commented Apr 8, 2024

Fixed by #28.

@rgildein rgildein closed this as completed Apr 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants