Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New version of Bundle API #242

Open
5 tasks
erikgb opened this issue Nov 23, 2023 · 16 comments
Open
5 tasks

New version of Bundle API #242

erikgb opened this issue Nov 23, 2023 · 16 comments
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@erikgb
Copy link
Contributor

erikgb commented Nov 23, 2023

We have already discussed some breaking changes we are considering making to the Bundle API. This issue is proposed as an umbrella issue to agree on what to do and how to proceed. Browsing through our currently open issues, I can find the following candidates:

Our first decision should be if we want to do #63 - as it suggests renaming the kind from Bundle to ClusterBundle. And AFAIK it is not possible to have any conversion webhook to convert between kinds. A conversion webhook can only convert between versions of the same kind. Doing #63 will create space to eventually fix #131 in the future. I am not saying we should do #131, but without #63 it will not be possible to do #131 with good naming of resources. If we decide to go for a new kind, I think we should try to keep both kinds around for a couple of releases - to allow users to migrate.

Over to how I think we should introduce a version (or kind). Either way, I suggest starting by introducing a new version (in a dedicated PR) as an exact copy of the existing API version - making the new version unserved. As long as the new version is unserved we can now proceed with changing the new version (and/or kind) in multiple follow-up PRs - without blocking for releases of trust-manager.

@kdubb
Copy link

kdubb commented Dec 5, 2023

@erikgb We created a CertificateBundler CRD and operator that is nearly identical to trust manager's basic Bundle with a couple major differences that help support rotation.

The operator for the bundler reads sources, builds PEM & JKS bundles and then creates ConfigMaps in target namespaces. We support Certificate style Secrets only (tls.crt with PEM) for ease and it fits our use cases.

1. Append Certificates

The status of CertificateBundler caches the current set of certs (as a PEM bundle). When reconciling the CRD (e.g. due to change in a source Secret) we scan the sources and append any new certificates, leaving old ones in place.

2. Filter Expired Certificates

During reconciliation, when building the certificate bundle, any expired certificates are excluded. Additionally, we schedule an appropriately timed reconciliation if any certificates expire sooner than the normal 10hour drift reconcile.

3. Deduplication

Duplicate certificates are removed by comparing Issuer + Subject + Serial Number; although Issuer + Serial Number seems to actually be the right unique key.

Implementation

Our actual implementation goes like this:

  1. Scan sources for certificates
  2. Append current certificates (from cached)
  3. Filter expired certificates
  4. Deduplicate certificates
  5. Cache bundle in status
  6. Generate or update config maps in target namespaces
  7. Schedule reconcile for earliest expiring certificate (if needed)

Taken together these features allow rotation in almost any form without any extra work; specifically simple reissues and key rotations. It works for cross signing as well by using multiple sources (although we currently don't have that setup or test it).

Example

A bundler is created with a single source secret generated by cert-manager's issuance of a Certificate. It will create the matching ConfigMap as expected. When cert-manager reissues the Certificate the associated Secret is updated and that triggers a reconcile for the bundler, which has the net effect of updating the ConfigMap with the previous and new certificates. Services will now trust both. Finally, at some point the bundler will remove the previous certificate when it expires.

Clusters can use Reloader, wave or similar to detect changes in the bundle ConfigMap and ensure services are restarted.

Back to trust-manager

What are your thoughts on including these features in trust-manager?

These features could be behind a feature flag or flags but it seems like sensible default behavior without really affecting current functionality.

@erikgb
Copy link
Contributor Author

erikgb commented Dec 5, 2023

@kdubb This looks very interesting. It's late here, so I will only answer shortly now.

  • Deduplication was added to trust-manger in Add certificates deduplication feature #184.
  • Regarding the caching until certificate expiry, even those removed from the sources, I'll have to think a bit more. It sounds a bit strange to me now - just after reading it. What do you do if you need to remove a CA from the bundle before it expires?
  • I also think it's questionable to remove certificates based on expiry. Why shouldn't targets reflect sources - even if some certificates have expired?

Pretty sure @SgtCoDFish should read ☝️.

@kdubb
Copy link

kdubb commented Dec 5, 2023

What do you do if you need to remove a CA from the bundle before it expires?

This is missing from my explanation. If the source is removed, the certs it contributed are explicitly removed during reconciliation. If you had a compromised, or otherwise invalid, certificate you can immediately reissue by deleting and re-creating the certificate.

Why shouldn't targets reflect sources - even if some certificates have expired?

The goal with the bundler isn't an accurate depiction of sources but rather a "correct" depiction of trust, while supporting rotation.

@kdubb
Copy link

kdubb commented Dec 5, 2023

The idea for this came from production topic cert-manager integration ntentionally copying CA Certificates

If you think about it, we're doing this. We're just copying it to the status field of the bundler and automating it.

@SgtCoDFish
Copy link
Member

Super interesting to read, thanks @kdubb ! The implementation of "remembering" previously trusted CAs has (sort of) been discussed before, some of that in #144 (which proposes a different method - I'm linking that issue for visibility).

I definitely see the benefit of it - it obviously helps with rotation. My immediate concern with "remembering" certs is that it doesn't generalise to multiple clusters in a predictable way. The semantics of a trust-manager Bundle today are crystal clear: you'll get a de-duped list of all CA certificates present in all sources. Different sources means different bundles, but if the sources are identical across 10 clusters the bundle will be identical too.

If we add caching or "remembering", then the exact same bundle spec changes in meaning after it's been deployed. The longer it lives, the more it remembers and if it's deleted and recreated it won't be the same. A bundle created before a rotation trusts more than a bundle created after a rotation.

I think the behaviour that the semantics don't change over time is valuable to preserve, and I'd be nervous about adding this kind of thing to the CRD for that reason. My gut reaction is that an external tool which watches a resource and tracks changes in it in a new resource would be the way to go. E.g. my watcher component is set to watch the key "tls.crt" in Secret A. Every time Secret A is changed, the watcher reads "tls.crt" and appends the value to the ConfigMap "my-history". "my-history" could then by used as a trust-manager source. I'd be interested to hear your thoughts on that!

@SgtCoDFish
Copy link
Member

SgtCoDFish commented Dec 6, 2023

Duplicate certificates are removed by comparing Issuer + Subject + Serial Number; although Issuer + Serial Number seems to actually be the right unique key.

This is a total drive-by comment, but I wouldn't personally go for this approach. Comparing any of these attributes ignores the signature on the cert, which is by far the most important thing for determining uniqueness. It's perfectly possible for two certs to have the same issuer, subject and serial number yet for them to be signed by different private keys.

Comparing on those attributes means that if an attacker can create a CA which looks like yours but with a different private key and they can get it into your trust store, your cert might be deduped rather than theirs. TBH if an attacker gets a cert into your trust bundle it's game over anyway, but at the end of the day you'd be de-duping a cert that wasn't a dupe!

The safest way to de-dupe is to use the fingerprint (hash of the entire cert). That's what trust-manager does: https://github.com/cert-manager/trust-manager/blob/01bd331abb8ee071025e2b8989930a2eb3b1d8e9/pkg/util/cert_pool.go

I'd strongly suggest changing your de-duplication logic as a priority to compare hashes!

@kdubb
Copy link

kdubb commented Dec 6, 2023

First, a bit of background, the name CertificateBundler (notice the verb) was chosen to be explicit that it is not spec'ing a "bundle" but a bundling mechanism. Maybe that changes things a bit, trust-manager could keep Bundle and add a Bundler (or maybe BundleGenerator given the name similarity).

My gut reaction is that an external tool which watches a resource and tracks changes in it in a new resource would be the way to go.

This is exactly what a CertificateBundler is.

E.g. my watcher component is set to watch the key "tls.crt" in Secret A. Every time Secret A is changed, the watcher reads "tls.crt" and appends the value to the ConfigMap "my-history". "my-history" could then by used as a trust-manager source

This seems to be adding a second step with more configuration to achieve a similar result. I'm fine with the idea but it seems like your "watcher" is just our CertificateBundler without the ability to project bundle ConfigMaps into target namespaces; might as well just have the one CRD be able to do everything.

As you eluded to, I can agree some really advanced setups might use a CertificateBundler to generate a dynamic bundle ConfigMap (in the trust namespace) and then use a Bundle to build and project a compound bundle from the dyanamic bundle and some other sources, and project that compound bundle into targets.

That being said, I think a lot of users would be more easily served by not requiring the use of both a CertificateBundler and a Bundle; allowing them the ability to just use a CertificateBundler would ease configuration.

The end result would to have a Bundle and Bundler, each with their specific semantics and each could be used stand alone or piggy backing on each other.

@kdubb
Copy link

kdubb commented Dec 6, 2023

This is a total drive-by comment, but I wouldn't personally go for this approach.

Thank you so much for this information. I was already questioning whether we were using the correct deduplication "key" and your suggestion makes total sense. I will correct this in our implementation.

@kdubb
Copy link

kdubb commented Dec 6, 2023

Alternatively, a preserveHistory and possibly a filterExpiredCertificates options could be added to the Bundle, disabled by default. Enabling both would achieve the semantics I laid out.

Edit: Upon reflection and looking at our code, the name preserveHistory wouldn't really capture what we are doing. Maybe a single preserveUnexpiredVersions is a better name and would enable this new flow.

@nan-coupa
Copy link

@erikgb, we've run into a specific issue with an expired certificate as relates to:
https://www.openssl.org/blog/blog/2021/09/13/LetsEncryptRootCertExpire/

The current cert-manager-package-debian container image includes an expired cert which triggers this problem:

 #<OpenSSL::X509::Certificate
  subject=#<OpenSSL::X509::Name CN=DST Root CA X3,O=Digital Signature Trust Co.>,
  issuer=#<OpenSSL::X509::Name CN=DST Root CA X3,O=Digital Signature Trust Co.>,
  serial=#<OpenSSL::BN 91299735575339953335919266965803778155>,
  not_before=2000-09-30 21:12:19 UTC,
  not_after=2021-09-30 14:01:15 UTC>]

This version of debian does not appear to be maintaining the ca-certificates package:

$ docker run --rm -it docker.io/library/debian:11-slim /bin/bash
root@0035344d151a:/# apt-get -yq update >/dev/null && DEBIAN_FRONTEND=noninteractive apt-get -qy -o=Dpkg::Use-Pty=0 install --no-install-recommends ca-certificates >/dev/null && dpkg-query --show --showformat="\${Version}" ca-certificates
debconf: delaying package configuration, since apt-utils is not installed
Updating certificates in /etc/ssl/certs...
129 added, 0 removed; done.
20210119

We should either upgrade to debian bookworm per #183 (which appears to have much more current ca bundles), or have a way to filter out expired CA certs.

@erikgb
Copy link
Contributor Author

erikgb commented Jan 9, 2024

@nan-coupa, thanks for your comment, but is it relevant for a new API version? We have #183 suggesting a system bundle based on Debian bookworm. Removing expired certificates from the trust bundle as an opt-in sounds good to me - do you mind opening an issue for it? I think this has been mentioned/discussed in another issue/PR in this project, but right now I cannot find it. 🤔 @SgtCoDFish WDYT?

@nan-coupa
Copy link

@erikgb, I commented on this ticket directly because it mentioned how to handle expired certificates and you asked in an earlier comment:
#242 (comment)

I also think it's questionable to remove certificates based on expiry. Why shouldn't targets reflect sources - even if some certificates have expired?

I can create a new issue if you still think it should be separated rather than discussed here.

@erikgb
Copy link
Contributor Author

erikgb commented Jan 9, 2024

I can create a new issue if you still think it should be separated rather than discussed here.

Ahhh, thanks! I wasn't smart enough to search within this issue. 🙈 This comment was added in the context of the CertificateBundler suggested by @kdubb. I think an opt-in to remove expired certificates from a bundle could make sense as a separate feature. Please create an issue for it! TIA!

@SgtCoDFish
Copy link
Member

I've created #272 to track potentially filtering out expired certs - I see the use case!

@cert-manager-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@cert-manager-prow cert-manager-prow bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 6, 2024
@erikgb
Copy link
Contributor Author

erikgb commented Nov 6, 2024

/remove-lifecycle stale
/lifecycle frozen

@cert-manager-prow cert-manager-prow bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

5 participants