Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🚨 Sigstore Signature images do not match across different geo-locations 🚨 #187

Open
2 tasks done
BenTheElder opened this issue Mar 18, 2023 · 18 comments
Open
2 tasks done
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/release Categorizes an issue or PR as relevant to SIG Release.

Comments

@BenTheElder
Copy link
Member

BenTheElder commented Mar 18, 2023

Is there an existing issue for this?

  • I have searched the existing issues

What did you expect to happen?

Images should have identical digests no matter what region I pull from.

This does not appear to be the case for some of the sigstore images added by the image-promoter

Thread: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1679166550351119

This issue is to track, the underlying fix will happen in the backing registries and in the image promoter (https://github.com/kubernetes-sigs/promo-tools) if we actively have a bug still causing this.

To be clear this is not a bug in the registry application, however it will be visible to users of the registry, and more visible on registry.k8s.io than k8s.gcr.io (because k8s.gcr.io has much much broader backing regions: eu, us, asia).

We'll want to fix the underlying issues if any remain in promo-tools and then fixup the backing registry contents somehow.

Debugging Information

I have script that inspects some important high-bandwidth images. It's a bit slow, and currently it only checks at k8s.gcr.io / registry.k8s.io https://github.com/BenTheElder/registry.k8s.io/blob/check-images/hack/tools/check-images.sh

We'll need to check the backing stores. I noticed a difference between my laptop at home and SSH to a cloud workstation.

Anything else?

/sig release

Code of Conduct

  • I agree to follow this project's Code of Conduct
@BenTheElder BenTheElder added kind/bug Categorizes issue or PR as related to a bug. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. labels Mar 18, 2023
@k8s-ci-robot k8s-ci-robot added the sig/release Categorizes an issue or PR as relevant to SIG Release. label Mar 18, 2023
@BenTheElder
Copy link
Member Author

kubernetes-sigs/promo-tools#784 to track resolving any bugs in the image promoter

@BenTheElder BenTheElder pinned this issue Mar 18, 2023
@BenTheElder
Copy link
Member Author

So far this seems to only be the sigstore images.

Given that clients will generally be fetching these with a tag that is computed based on the digest of the adjacent image that was signed, not the digest of the signature "images" themselves, this is probably unlikely to break anyone, but worth fixing regardless.

@BenTheElder BenTheElder added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Mar 18, 2023
@BenTheElder
Copy link
Member Author

This could cause a problem if a single image pull (many API calls) somehow gets routed to multiple instances of the registry.k8s.io backend in different regions, because the signature blobs available would not match.

We think this is very unlikely. Still something to fix.

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 19, 2023

So ... I've computed an index of all images like host : partial_ref : digest.

A partial_ref in this case is like kube-proxy-s390x@sha256:8acf368ce46f46b7d02e93cb1bcfb3b43931c0fbb4ee13b3d4d94d058fa727f7 IE it's either a digest or a tag image ref with the $host prefix trimmed to save space.

This is ~600M of JSON. It took O(hours to obtain) given the rate limits on scanning our registries and the volume of images.

I've then filtered this back down to only tag refs and digest refs that have no tags pointing at them. Both types map to the digest they point to. Filtering this way reduces the data set but not the information, it just means we skip the image@digest => digest reference type when we also have a tag pointing to that digest anyhow.

The tradeoff is to diff you need to check both the ref and the digest between two hosts, but we want to know if tags are different anyhow.

I would share but even the filtered and processed version is 353M ...

EDIT: The filtered version is available in gzip compressed JSON here: https://kubernetes.slack.com/archives/CJH2GBF7Y/p1679213042607229?thread_ts=1679166550.351119&cid=CJH2GBF7Y


Anyhow, by running some queries over this data I can see that none of the backing registries have the same amount of refs.

If I pick two regions and compute the ref diffs, what I see every time so far is a mix of dangling digest refs with no tag and :sha256-<hash>.sig sigstore tags.

Unfortunately there are a large amount of dangling digest refs in the region diffs, so we can't just say "well it's all sigstore tags" and call it a day. There are also too many of these to quickly fetch and check all of the manifests.

But just inspecting a random sample of dangling digest refs from the region pairs I inspected, so far 100% of the time crane manifest ${host}/${partial_ref} reveals a sigstore manifest.

I would guess that we pushed signature tags multiple times to images and these dangling refs are previous signature pushes.

ALL of the tag type references so far are :sha256-.*.sig sigstore tags.

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 19, 2023

The tag type references in the diff also suggest that we have signed images that only have a published signature at all in some regions AFAICT, which is a bit worse than exact signature varying by region ...

E.G. for us-west1 vs us-west2 AR instances:

Missing:
us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le:sha256-4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b.sig (signature tag)

Available:
us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le@sha256:4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b (the image that should be signed)

us-west1-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le:sha256-4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b.sig (signature in the other region)

us-west2-docker.pkg.dev/k8s-artifacts-prod/images/kubernetes/kube-scheduler-ppc64le@sha256:4019c5d5f3a84dbc355b52b5240b645404d5f2541edc392ccb4d2f8acc1deb8b (signed image in the other region)

You can verify that these are really missing / available by crane manifest $image for each of these.

This also applies to k8s.gcr.io with eu/us/asia backing registries. However it's far less visible there as users are far less likely to ever encounter different backing registries given the very broad geographic scopes.

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 19, 2023

Quantifying scale of sigstore tags:

376 missing sigstore tags in australia-southeast1-docker.pkg.dev/k8s-artifacts-prod/images
388 missing sigstore tags in europe-north1-docker.pkg.dev/k8s-artifacts-prod/images
362 missing sigstore tags in europe-southwest1-docker.pkg.dev/k8s-artifacts-prod/images
376 missing sigstore tags in europe-west2-docker.pkg.dev/k8s-artifacts-prod/images
365 missing sigstore tags in europe-west8-docker.pkg.dev/k8s-artifacts-prod/images
421 missing sigstore tags in asia.gcr.io/k8s-artifacts-prod
439 missing sigstore tags in eu.gcr.io/k8s-artifacts-prod
365 missing sigstore tags in europe-west4-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in southamerica-west1-docker.pkg.dev/k8s-artifacts-prod/images
374 missing sigstore tags in us-central1-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in us-west1-docker.pkg.dev/k8s-artifacts-prod/images
363 missing sigstore tags in asia-northeast1-docker.pkg.dev/k8s-artifacts-prod/images
383 missing sigstore tags in asia-south1-docker.pkg.dev/k8s-artifacts-prod/images
356 missing sigstore tags in europe-west9-docker.pkg.dev/k8s-artifacts-prod/images
377 missing sigstore tags in us-east1-docker.pkg.dev/k8s-artifacts-prod/images
387 missing sigstore tags in us-east4-docker.pkg.dev/k8s-artifacts-prod/images
370 missing sigstore tags in us-south1-docker.pkg.dev/k8s-artifacts-prod/images
377 missing sigstore tags in asia-east1-docker.pkg.dev/k8s-artifacts-prod/images
366 missing sigstore tags in asia-northeast2-docker.pkg.dev/k8s-artifacts-prod/images
371 missing sigstore tags in europe-west1-docker.pkg.dev/k8s-artifacts-prod/images
381 missing sigstore tags in us-east5-docker.pkg.dev/k8s-artifacts-prod/images
374 missing sigstore tags in us-west2-docker.pkg.dev/k8s-artifacts-prod/images
430 missing sigstore tags in us.gcr.io/k8s-artifacts-prod

Note that's going to include each manifest, so there's potentially one of these for each architecture within the same image.

The more interesting detail is we have some other image tags that are only in some backends:

asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0-alpha.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.10
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.11
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.12
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.13
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.14
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.15
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.5
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.6
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.7
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.8
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.4.9
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.5.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.0-rc.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.0.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.1.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.2.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.3.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.3
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.4.4
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.0
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.1
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.2
asia.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3
eu.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3
us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v1.5.3

100% of these are only missing from the k8s.gcr.io registries, so I think they were somehow manually cleaned up from k8s.gcr.io but not registry.k8s.io, if I had to guess. They all appear to be related to cluster-api-azure.

See below for how this happened #187 (comment)

You can verify that these are in other backends like this sample:

crane manifest southamerica-west1-docker.pkg.dev/k8s-artifacts-prod/images/cluster-api-aure/cluster-api-azure-controller:v0.3.0

crane manifest us.gcr.io/k8s-artifacts-prod/cluster-api-aure/cluster-api-azure-controller:v0.3.0

Code in BenTheElder@2e32a2c / https://github.com/BenTheElder/registry.k8s.io/tree/check-images, data file in slack linked above.

@BenTheElder
Copy link
Member Author

The "cluster-api-aure" tags were partially synced before and led to kubernetes/k8s.io#4368 which should be catching future mis-configuration leading to partial sync on the promoter config side of things.

ref: https://kubernetes.slack.com/archives/CCK68P2Q2/p1666053166103809?thread_ts=1666040894.622279&cid=CCK68P2Q2

We should make sure that test is actually running on the migrated registry.k8s.io/ folder, I see it was copied over but I'm not sure the scripts run it.

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 19, 2023

kubernetes/k8s.io#4988 will ensure we keep applying the regression test that should prevent mis-configuring subprojects to not promote to all regions. (IE the cluster-api-aure situation)

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 20, 2023

Confirmed dangling digests that are not in all regions are 100% either sigstore manifests (containing "dev.cosignproject.cosign/signature" in the manifest) or cluster-api-aure images.

Scanned with BenTheElder@a10201c

@BenTheElder
Copy link
Member Author

BenTheElder commented Mar 20, 2023

So recapping:

TLDR of backing registry skew after fully checking through all mismatching tags and digests in a snapshot from this weekend.

The following cases appear to exist:

  1. sigstore signature tags not available in all backends
  2. sigstore signature tags available may have different digests in backends
  3. mis-configured promotion for #cluster-api-azure under cluster-api-aure/ to only some backends

These are all known issues. 3) should not get worse via regression tests (kubernetes/k8s.io#4988)

1 & 2 are being worked on and kubernetes-sigs/promo-tools#784 is probably the best place to track that.

See also for 1&2:
https://groups.google.com/g/kubernetes-announce/c/0_jVjhLvNuI

@puerco
Copy link
Member

puerco commented Mar 27, 2023

OK, regarding the diverging .sig images in the registries I think it is a by-product of the promoter getting rate limited. I did a recap of the findings in slack but leaving it here too to register it:


Looking at images before the promoter started breaking due to the rate limits, the .sig layers match. I found a mismatching tag in the images promoted as part of the (failed) v1.26.3 release.

For example, registry.k8s.io/kube-scheduler:v1.26.3 is fully signed and replicated, all match:

TAG=$(crane digest registry.k8s.io/kube-scheduler:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-scheduler:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-south1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-northeast1-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
asia-northeast2-docker.pkg.dev: sha256:61d6baae440f4692509db9dd825ef4614a8179a175fc60390cf88830a22f6f6c
(output of all mirrors matching trimmed)

There are some images which have missing signatures, but the ones that are there, all match, for example kube-controller-manager:

TAG=$(crane digest registry.k8s.io/kube-controller-manager:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-controller-manager:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev:  ERROR
asia-south1-docker.pkg.dev: sha256:ec54ca831d0135d7691fa3cc36cfb5deb5d73eadbb6736edcbb8eb63270f02c3
asia-northeast1-docker.pkg.dev:  ERROR
asia-northeast2-docker.pkg.dev:  ERROR
australia-southeast1-docker.pkg.dev:  ERROR
europe-north1-docker.pkg.dev:  ERROR
europe-southwest1-docker.pkg.dev: sha256:ec54ca831d0135d7691fa3cc36cfb5deb5d73eadbb6736edcbb8eb63270f02c3
europe-west1-docker.pkg.dev:  ERROR
(full output trimmed)

in all the images we promoted that day, the one that has a different digest is the kube-proxy copy in asia-south1-docker.pkg.dev:

TAG=$(crane digest registry.k8s.io/kube-proxy:v1.26.3 | sed -e 's/:/-/' ); for m in $(cat mirrors); do echo -n "${m}: "; crane digest ${m}/k8s-artifacts-prod/images/kubernetes/kube-proxy:${TAG}.sig 2>/dev/null || echo " ERROR"; done
asia-east1-docker.pkg.dev: sha256:b55c42ada82c11e3d8d176deb6572b53f371f061e19d69baf0f14d6dbc7362ab
asia-south1-docker.pkg.dev: sha256:bb1e7fda66a3bfd41d2dd0b71c275ef24e2386af82102b6c52b2f20233d8b940
asia-northeast1-docker.pkg.dev: sha256:b55c42ada82c11e3d8d176deb6572b53f371f061e19d69baf0f14d6dbc7362ab

Here's what's going on:

When the promoter signs, it stamps the images with its own signer identity:

COSIGN_EXPERIMENTAL=1 cosign-1.13 verify us-east1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 | jq  '.[].optional.Subject' 

Verification for us-east1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - Any certificates were verified against the Fulcio roots.
"[email protected]"

(note the SA ID in the last line: krel-trust@ )

The diverging digest has the the identity from the signature we add when the build process runs:

COSIGN_EXPERIMENTAL=1 cosign-1.13 verify asia-south1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 | jq  '.[].optional.Subject'  

Verification for asia-south1-docker.pkg.dev:/k8s-artifacts-prod/images/kubernetes/kube-proxy:v1.26.3 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - Any certificates were verified against the Fulcio roots.
"[email protected]"

(note the identity here is krel-staging@ )

This signature is the only one that is different in the release, so we are not resigning. It is simply that when processing the signatures for this particular image, the promoter got rate limited and died in the middle.

@BenTheElder
Copy link
Member Author

This signature is the only one that is different in the release, so we are not resigning. It is simply that when processing the signatures for this particular image, the promoter got rate limited and died in the middle.

Wait, we're pushing images to prod and then mutating them? Why?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2023
@ameukam
Copy link
Member

ameukam commented Jun 26, 2023

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 26, 2023
@BenTheElder BenTheElder added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Sep 13, 2023
@aliok
Copy link

aliok commented May 9, 2024

Please note that this issue is linked in LFX Mentorship 2024 term 2.
A related issue is kubernetes/release#2962

@BenTheElder
Copy link
Member Author

Thanks! This issue is just for tracking / visibility to users of the registry, the necessary changes will be in repos like kubernetes/release where image publication is managed, if/when it is fixed we will replicate updates back here for visibility.

@anshikavashistha
Copy link

@aliok This project seems interesting to me. I really want to work on this project .Is there any prerequisite task that needs to be done ?
Please share the link of community channel or any slack channel.

@BenTheElder
Copy link
Member Author

Hi folks, please discuss possibly working on this in kubernetes/release#2962 and let's reserve this issue for indicating to users of the registry when we have progress or more details on the situation.

@dims dims unpinned this issue Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/release Categorizes an issue or PR as relevant to SIG Release.
Projects
None yet
Development

No branches or pull requests

7 participants