-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
problem achieving reproducibility #4131
Comments
This proves that there are cases where the last three layers can be different. A layer is actually a Note it's also possible that all the file contents inside a layer |
I appreciate your suggestion that was helpful to drill down on where the differences start!
It seems like this could be the result of dependency upgrades to the gzip library that jib is using? In this case, it seems like it can be ruled out since I can build and immediately rebuild, any number of times, and never have jib reproducibility. The first layer that's different is the project resources. Build A
Build B
|
Yes. So since you're using Jib on the same machine, it's very likely that you do put different files in layers every time. And you seem to say that those project resources are different somehow. And I forgot to say this: Jib doesn't put a jar, so your jar task configuration to produce a reproducible jar is irrelevant. |
The resources directory across the two builds should be identical though. I think I proved that by showing the following steps above:
|
You only prove that the contents of the files are identical. A file can have different sorts of attributes. I'd first verify each individual file is byte-to-byte identical ( And make sure you didn't miss any hidden or special files when tarring or untarring. |
Sounds good. Might be worth mentioning - I removed the resources for testing purposes so there's no produced resources layer and now the class file layer isn't reproducible so I don't think it's related to any contents of the resources directory. |
It seems like I'm getting timing mismatches in the layer metadata. To keep the diff as small as possible I'm using this
Here's the diff produced by
|
That makes sense and is what I suspected. This article explains the
I think the observed |
Sorry, I might be conflating settings here, but I'm not touching these resources at all between reproducible build attempts, but even if I was I thought the jib time settings handle that transparently? I see jib is setting the single resource file to the default of epoch in the layer on all attempts.
I don't know much about the format of tar metadata to say what that time is, but |
I'm not setting |
And after getting all the timestamps, can you correlate the observed value with any of them? |
Come to think of it, checking ctime this way won't work, because ctime will be reset anyway when unpacking files. What I want to check first is whether the observed ctime is the ctime of the layer tar or the ctime of the files in the tar at the time of creating the tar. For that matter, I think you can test this out by putting multiple files, not one, and do the byte-diff to see if there are multiple entries of diff or a single one. |
I created two more resources so now there's three total and each one just contains a random sentence of text. The layer byte diff now includes three pairs of differences, one for each file it seems.
One thing to note is I don't see any changes on my host machine. I created the two additional files and modified the first at 12:11, waited a bit, and ran
Epoch to timestamp of tar times: So the times present in the tar bytes coincide with the times that the gradle tasks ran. |
This is what I expected. I thought it's very likely that the Apache Commons Compress is including Anyways, it is what it is. You said earlier that you cannot reproduce this with a hello-world project, and given these observations, what I hypothesize is that your project pulls in a different version of Apache Commons Compress. Is your project a multi-module project? If not, can you force-set the same library version that the hello-world project uses? |
For that matter, you can check the version used in the hello-world project by running
Note there are cases where you cannot trust the Then, assuming that your project is not a multi-module one, you can force a version like this: #3564 (comment) |
FYI, there is a precedence that a new version of Apache Commons Compress produced a different binary:
I forgot about |
This is a multi module project. The root level resolves to The linked apache issue is interesting. I wonder if something similar is happening again on 1.22 or 1.23. |
As I said, often you cannot trust this output in a multi-module project, so it's very possible that the Jib module uses 1.23.0. You should carefully follow the project setup explained in this FAQ to ensure that Jib uses 1.21. That is, you define all plugins in the root project while selectively applying them. And then force the version 1.21. That said, I wonder what happens if you force 1.23.0 in the hello-world project. |
Thank you for all of your assistance! We have reproducibility again when forcing 1.21 so it does seem like 1.22 or 1.23 changes introduced some more breaking changes from a reproducibility perspective. I have not looked at the jib or apache compress lib, but I'll attempt to take a closer look and see if I can get a change together. |
@GoogleContainerTools/cloud-java-team-teamsync this is a bigger issue to Jib, because eventually, Jib will have to upgrade the library. |
@chanseokoh You're correct. It seems like they added more time fields in 1.22. andrebrait/commons-compress@b7f0cbb Showing surrounding lines on my byte diff helped clear things a bit. @@ -481,7 +481,7 @@
00001e00: 3238 2061 7469 6d65 3d31 3639 3839 3431 28 atime=1698941
-00001e10: 3833 322e 3033 3037 3439 370a 3238 2063 832.0307497.28 c
-00001e20: 7469 6d65 3d31 3639 3839 3431 3833 312e time=1698941831.
-00001e30: 3531 3538 3737 340a 3338 204c 4942 4152 5158774.38 LIBAR
+00001e10: 3836 342e 3530 3338 3436 370a 3238 2063 864.5038467.28 c
+00001e20: 7469 6d65 3d31 3639 3839 3431 3836 332e time=1698941863.
+00001e30: 3737 3733 3135 350a 3338 204c 4942 4152 7773155.38 LIBAR
00001e40: 4348 4956 452e 6372 6561 7469 6f6e 7469 CHIVE.creationti
-00001e50: 6d65 3d31 3639 3839 3431 3833 310a 0000 me=1698941831...
+00001e50: 6d65 3d31 3639 3839 3431 3836 330a 0000 me=1698941863...
00001e60: 0000 0000 0000 0000 0000 0000 0000 0000 ................ In addition to file related time fields, it seems like a |
Good catch! This needs to be addressed as well. |
Another user hit this (#4141), and their PR looks very promising. |
PR #4142 which will hopefully address this issue is currently under review. |
jib-gradle-plugin:3.4.2 and jib-maven-plugin:3.4.2 have been released with the fix (#4204)! Marking this issue as complete. |
Environment: local docker build
Description of the issue:
I'm trying to produce reproducible builds within gradle projects that include internal dependencies so I'm not able to share the project.
I'm using jib on a gradle project that contains only a single application so there's no dependencies pulled in with
project
references. I can include the build block below. Nothing is placed insrc/main/jib
.With the above, I'm able to
clean build
and verify that the jar is reproducible (jar checksum is the same).Although, when I
clean build jibDockerBuild
thebuild/jib-image.digest
contents are not reproducible.From what I can tell jib does not place any files besides under
/app
so I did the following and was able to confirm that no files there differ betweenclean build jibDockerBuild
calls.find app/. -type f -exec sha1sum {} +
was used within separate image builds and I can diff those files to verify there's no diff. I also spot checked that the date on all of those files is the same (epoch time).I can also run this
find app/. -type f -newermt +3 -print
and verify that the old files that show up are/proc/.
,/sys/.
, and/.dockerenv
. This tells me that the jib build isn't building any files and placing them on the image with the current time.I'm at a loss at what else to check for to see what differs between jib builds that's keeping us from having reproducibility. Is there any suggestions at all that might help with my debugging effort?
Expected behavior:
Reproducible digest
Steps to reproduce:
Unfortunately, I'm not able to reproduce with a hello world project and our build environment contains internal dependencies.
jib-gradle-plugin
Configuration:In an effort to debug this problem, I've been using the below configuration so it's easy to
docker run
and get dropped into a shell.Here's an example of the layers produced across builds. The first four layers are always reproducible.
The text was updated successfully, but these errors were encountered: