-
-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should rulesets distribute a pre-built artifact rather than rely on GitHub source/release archive #11
Comments
See https://docs.google.com/document/d/1s_8AihGXbYujNWU_VjKNYKQb8_QGGGq7iKwAVQgjbn0/edit?usp=sharing for discussion around the requirements for testing against multiple Bazel versions. |
Could you motivate this? It is not clear to me why this should be mandated. If the motivation is that users of a rule set should not depend on dev-dependencies of that rule set, then this can be achieved without a dedicated distribution artifact. E.g. in rules_haskell dev-dependencies are only pulled in in rules_haskell's own I think it is a plus that Bazel rule sets can be imported directly from their source repository at any commit without needing to generate a distribution artifact first. This makes it very easy to pull in a non-release commit of a rule set that contains a needed fix. If rule sets are only intended to be used from distribution artifacts, then this use-case is no longer necessarily supported, as a rule set may depend on generated files that are only included in the distribution artifact. Either way, I don't think this should be mandated without the required tooling being available. See below. Regarding bazel-in-bazel tests. I agree that this would be useful to have. We have looked into this for rules_haskell and in this context looked into a Gazelle extension to generate filegroup targets capturing all files required to run the rule set. (The same would be useful for generating distribution artifacts.) We based our efforts on Gazelle's It would be great to have general purpose versions of |
Mostly the pre-built distribution artifact is required to get a stable checksum. If you rely on GitHub's generated .tgz source archives, you get a breakage when GitHub makes OS updates on their servers that create those archives. |
Hi 👋🏽,
|
@ittaiz what do you think about the SIG contributing or owning the current integration test repo in bazelbuild org? |
Be happy to add contributors and even hand ownership over if you feel that's important |
Is this still true? I haven't found official GitHub documentation stating that the archives are reproducible, but I have found this reproducible-builds thread pointing out that Github uses Just as a quick test I compared the GH archive to a
As you can see, the SHA256 is identical. This suggests that the archive is generated reproducibly. Anecdotally, the only instance where I encountered issues with a changing commit hash in the last couple years was kubernetes/kubernetes#99376. In this case the change was due to a problematic |
@aherrmann I've followed this guidance ever since Jay Conrod made a big deal out of it in rules_go and bazel_gazelle. bazel-contrib/rules_go#2340 suggests maybe some GitHub URLs are reliable and some are not? There is yet another reason I think rules should build their own distribution archive, which is that you can calculate your own checksum to produce the WORKSPACE snippet in the release process before shipping the commits to GitHub. |
Thanks for the pointer, I dug into this a little. I've attached the details in the end, in short: I don't think this was a case of the Github generated source archive changing. Instead, it looks to me as though this was a mixup between the SHA for the Github generated source archive and the release artifact. So, I don't think this is evidence to support the claim that Github source archives are non-reproducible.
The same can be achieved using To be clear, I'm not saying one should not use release artifacts. But, I am saying that I don't see why it should be mandated that everyone use them without a good technical reason to motivate that mandate. I haven't seen such a reason, yet. As mentioned above, there are upsides to the source archive approach and costs to the release artifact approach. Details: If we take a look at the changes in the PR we see --- a/multirun/deps.bzl
+++ b/multirun/deps.bzl
@@ -4,7 +4,7 @@ def multirun_dependencies():
_maybe(
http_archive,
name = "bazel_skylib",
- sha256 = "2ef429f5d7ce7111263289644d233707dba35e39696377ebab8b0bc701f7818e",
+ sha256 = "2ea8a5ed2b448baf4a6855d3ce049c4c452a6470b1efd1504fdb7c1c134d220a",
strip_prefix = "bazel-skylib-0.8.0",
urls = ["https://github.com/bazelbuild/bazel-skylib/archive/0.8.0.tar.gz"],
) The 0.8.0 release has a release artifact and of course the generated source archive. If we look at the SHAs of each of these we find
I.e. the old hash was the hash of the release artifact and the new hash is the hash of the generated source archive. If we compare the contents of these two archives we find
I.e. the release artifact has no prefix, while the generated source archive does have the standard The change is from Jan 2020, I'm pretty sure Github generated source archives had the For reference, I can produce an equivalent to the Github generated source archive with the same hash on my machine today:
If I try to reproduce the release artifact I get a different hash than the release artifact uploaded on
But, comparing this generated prefix-less tarball to the release tarball I find
So, the difference comes down to the release artifact containing slightly different headers including a timestamp. |
Great discussion. I think this issue ended up conflating two things. We agree that we need bazel-in-bazel integration testing of rules, let's move that to a new issue since the bulk of discussion here was about the release archive and that's just one motivation for bazel-in-bazel testing. |
I've updated all my repos, as well as the rules-template, to reflect that GitHub produces a stable SHA for the artifacts it serves. |
Sorry to revive this closed issue, but I just encountered a situation in which the SHA of a GitHub-provided archive changed over time and thus ended up breaking the build. Over at https://github.com/CodeIntelligenceTesting/jazzer, we use the following dependency on abseil-cpp:
An hour ago, CI runs started to fail with this error:
I attached both the ZIP file that can currently be obtained from https://github.com/abseil/abseil-cpp/archive/f2dbd918d8d08529800eb72f23bd2829f92104a4.zip (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-new.zip) and the ZIP file that was previously generated by GitHub and that I obtained from my local repository cache (abseil-cpp-f2dbd918d8d08529800eb72f23bd2829f92104a4.github-old.zip). Running diffoscope on these files shows that the mtimes hour changed:
@aherrmann Do you have an idea how this could happen and whether tar.gz would not have been prone to this? |
Looks like the change has been rolled back, so this might have been an honest bug. |
And they said that they would insure the checksum doesn't change in the future. So I think this might even harden the case that we can rely on the checksum. |
@brentleyjones That's great to know. Could you point me to the place where they confirmed that? |
So not as strong as a guarantee as I originally read it as, but it seems the rollback was related to the checksum change: https://twitter.com/tgummerer/status/1488493440103030787 |
There is https://twitter.com/tgummerer/status/1488493481874055173 though, so depending on archives for individual commits is unsafe. |
Yikes 😕 |
I think we have to push hard and escalate (like Ulf did) to point out that GH is running a package repo and the world relies on it for supply-chain safety... |
/cc @tgummerer |
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
GitHub's stability guarantee for the archive is iffy, and we want metrics on downloads. See bazel-contrib/SIG-rules-authors#11 (comment)
Since git v2.38.0 git archive tar.gz format default has changed from invoking gzip to an internal gzip compressor implementation. However, the output bitstream is not identical, meaning the resulting tar.gz archive's checksum is different. This causes problems for PGP signing. In order to avoid this issue for both old and new archive generation alike manually invoke gzip in mkrelease script, bypassing git archive's internal compression logic completely regardless of version. GitHub and others presumably use a similar method to deal with this change to keep old tag archive checksums from changing. * git/git@4f4be00 * https://github.blog/changelog/2023-01-30-git-archive-checksums-may-change/ * https://github.com/orgs/community/discussions/45830 * bazel-contrib/SIG-rules-authors#11
Using `/archive/refs/tags/` check sum hash may be more stable. Some conflicting infomation: [1] bazel-contrib/SIG-rules-authors#11 (comment) [2] https://github.com/orgs/community/discussions/45830#discussioncomment-4823531
Using `/archive/refs/tags/` check sum hash may be more stable. Some conflicting infomation: [1] bazel-contrib/SIG-rules-authors#11 (comment) [2] https://github.com/orgs/community/discussions/45830#discussioncomment-4823531
Using `/archive/refs/tags/` check sum hash may be more stable. Some conflicting infomation: [1] bazel-contrib/SIG-rules-authors#11 (comment) [2] https://github.com/orgs/community/discussions/45830#discussioncomment-4823531
Using `/archive/refs/tags/` check sum hash may be more stable. Some conflicting infomation: [1] bazel-contrib/SIG-rules-authors#11 (comment) [2] https://github.com/orgs/community/discussions/45830#discussioncomment-4823531
Rules ought to distribute an artifact that doesn't contain references to development-time dependencies, and omits testing code and examples.
This means the distribution can be broken if files are missing.
In addition, rules ought to integration-test against all supported bazel versions. So there should be some bazel-in-bazel test that consumes the HEAD distribution artifact and tests that the examples work.
Right now there are a few ways. rules_nodejs and rules_python have a built-in integration test runner. rules_go has a special go_bazel_test rule.
The text was updated successfully, but these errors were encountered: