-
Notifications
You must be signed in to change notification settings - Fork 27
Improving the encapsulation (chunking) algorithm #456
Conversation
@@ -51,6 +50,10 @@ pub struct ObjectSourceMeta { | |||
/// One suggested way to generate this number is to have it be in units of hours or days | |||
/// since the earliest changed item. | |||
pub change_time_offset: u32, | |||
/// Change frequency | |||
pub change_frequency: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the semver break label for this
81c399d
to
6528349
Compare
a308cf6
to
22e27df
Compare
cd0f72c
to
570e993
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot to go over here but not seeing anything major to fix so far; nice work!
d043c54
to
a0fc520
Compare
This one needs a rebase, right? |
Moving to draft, can you move out of draft when you think it's ready for another review? |
4f9ad2a
to
4edd5c0
Compare
Does this make sense to you?
? |
Also
? |
Pushed a few clippy lint fixes |
The changes look good to me. Thanks for the help! |
There is a linting error:
|
Ah fun, it's a new lint. OK, I tried another tack for that one. |
How about this?
? |
Do we actually need the content annotation |
No, it was inserted just for consistency and readability |
OK hmm. I think I'd say let's drop it because with it present, we're "mixing namespaces" - the things we put in here are otherwise strings provided externally. So if one was writing a tool to process this data and map it back to package names, they'd need to skip the layer. Which is what we do anyways too! So I don't see an advantage to adding it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK let's land this and try it out and make further improvements after merging!
Although, I think it's probably cleaner to squash everything down to one commit; WDYT? |
yup squashing the commits |
layer deltas using historical builds Revamp basic_packing to follow the prior packing structure if the --prior-build flag exists. This simply modifies existing layers with upgrades/downgrades/removal of packages. The last layer contains any new addition to packages. In the case where --prior-build flag does not exist, the frequency of updates of the packages (frequencyinfo) and size is utilized to segment packages into different partitions (all combinations of low, medium, high frequency and low, medium, high size). The partition that each package falls into is decided by its deviation from mean. Then the packages are alloted to different layers to ensure 1) low frequency packages don't mix with high frequency packages 2) High sized packages are alloted separate bins 3) Low sized packages can be put together in the same bin This problem is aka multi-objective bin packing problem with constraints aka multiple knapsack problem. The objectives are conflicting given our constraints and hence a compromise is taken to minimize layer deltas while respecting the hard limit of overlayfs that the kernel can handle.
OK following up here related to this comment. Basically a core tradeoff here is that in the case where a prior build is not used as a base, this causes the packing to be much more sensitive to the input. And the consequence then is we'll actually cause more data to be downloaded in many cases. For the purposes of gathering some real-world data, I used the scripts linked above to construct two chains of images at Before this PR
So a user following all those updates would download ~5978 MB. After this PR
And after, they download ~5737 which is a bit better even! |
Hum interestingly, b9 and b10 have exactly the same set of RPMs and on further inspection were built from the same config commit. So the difference here comes down to build-level reproducibility (e.g. the initramfs and the rpm database). Oh and fun, looks like authselect writes timestamps into its generated files in e.g.
But for some reason that no-op change caused an almost complete diff with this change. |
Oh. I think something went wrong here...b9 and b10 are different things when repacked. |
What are the fcos version numbers for b9 and b10? |
Yeah, the versions are different:
So something went wrong with my scripting...looking |
OK I think a core bug here may relate to |
Defaulting to using the |
Right. I'll look at doing more numbers with that enabled. But I do think we still need to care about the no-previous-state case; if nothing else, it's what will happen if we ship this code today without updating coreos-assembler e.g. too. Until now users have been able to rely on the "default" rpm-ostree chunking. |
Followup: rpm-ostree/issues/4247
To do: