Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/letsencrypt #294

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nicktate
Copy link
Contributor

@nicktate nicktate commented Nov 13, 2019

Description

This is a WIP enhancement to add support for letencrypt certificates on digital ocean loadbalancers. I wanted to get it up for some early review and discussion around caveats.

annDODomain = "service.beta.kubernetes.io/do-loadbalancer-domain" has been added as a supported annotation to specify on Kubernetes services.

It is required to already have a valid root domain on digitalocean. Digital ocean only provides a name server service, it is not itself a domain registrar. I've opted to require this manual step because even though we could automatically generate the root domain entry via the apis, it would be on the user to update the nameservers for their domain to point to digitalocean nameservers.

The domain is parsed into an effective top level domain (eTLD), a root domain, and a subdomain.

If the given domain is mysite.example.com the eTLD is com because this is the top level domain new domains can be issued under. As a second example mysite.example.co.uk, the eTLD is co.uk

The logic generally flows as follows:

  1. Ensure a root domain exists on digital ocean for the given annotation
  2. Find or create a letsencrypt certificate in digital ocean that is valid for the given domain, and then update the service certificate-id annotation to reflect that certificate
  3. Create or update the loadbalancer with the forwarding rules using the certificate-id
    a. Note The digitalocean loadbalancer service also effectively creates an A record for the root domain to point to the load balancer IP
  4. Once the loadbalancer has reached active status, create a A record for the subdomain pointing to the loadbalancer IP if a subdomain exists

Additional Considerations / Caveats

  • The current implementation is not able to successfully cleanup generated certificates or A-records if the user modifies the service to change the certificate-id, or domain. The interface provided through the CCM abstracts away the previous state of the service so you lose reference to what the pre-existing values were.
    • We could attempt to solve this either via finalizers or creating our own reconciler but it seems like we would be stepping too far outside the confines of the ccm interface an duplicating a lot of effort
  • The LB service does not cleanup generated root domain records. It seems like a fix we should push into the LB service versus handling in the ccm We now ensure root A-record in CCM so we don't rely on undocumented side effects of LB service
  • Existing certificate rotation logic will continue to work as expected because during the ensureCertificate step, fetch will fail to locate the existing certificate and then research for a new certificate that is valid for the given domain (which will be the rotated cert)

What does this pull request fix

#92

cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/loadbalancers.go Outdated Show resolved Hide resolved
Copy link
Contributor

@timoreimann timoreimann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass and left a few comments; will do a more thorough review later, ideally after my initial batch of comments have been discussed / addressed.

Thanks for doing this Nick!


if err != nil {
respErr, ok := err.(*godo.ErrorResponse)
if !ok || respErr.Response.StatusCode != http.StatusNotFound {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't the response code also be available on the more easily accessible response return parameter (i.e., the second one we ignore for now)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had seen a few other examples in this codebase of how I implemented error handling in this section. I agree with you though. I will update and also update the other corresponding sections for consistency.

cloud-controller-manager/do/loadbalancers.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/certificates.go Outdated Show resolved Hide resolved
@@ -339,7 +378,7 @@ func (l *loadBalancers) recordUpdatedLetsEncryptCert(ctx context.Context, servic
return nil
}

func (l *loadBalancers) updateLoadBalancer(ctx context.Context, lb *godo.LoadBalancer, service *v1.Service, nodes []*v1.Node) (*godo.LoadBalancer, error) {
func (l *loadBalancers) updateLoadBalancer(ctx context.Context, service *v1.Service, lb *godo.LoadBalancer, nodes []*v1.Node) (*godo.LoadBalancer, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, was there a particular reason you reversed the order here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It felt like the switch was making parameter ordering more consistent across the code base... For example, these existing functions feel like they are prioritizing service.

(l *loadBalancers) EnsureLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) 
recordUpdatedLetsEncryptCert(ctx context.Context, service *v1.Service, lbCertID, serviceCertID string)
(l *loadBalancers) UpdateLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) 

I guess from a more abstract level, the service is the driving factor for all of this functionality so seems like it should be prioritized. I also could have been losing my mind while deciding to make that change 😛because it seems so subtle you could go either way so I'm fine reverting.

cloud-controller-manager/do/loadbalancers.go Outdated Show resolved Hide resolved
cloud-controller-manager/do/loadbalancers.go Outdated Show resolved Hide resolved
certName := getCertificateName(domain.full)
cert, resp, err := l.resources.gclient.Certificates.Get(ctx, certID)
if err != nil && resp.StatusCode != http.StatusNotFound {
klog.Errorf("failed to fetch certificate: %s", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a return?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the certificate ID in the error message ("failed to fetch certificate by ID %s: %s", certID, err).

Similar below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to take a look upstream as to what happens when you return an error response in this method. It felt like the process shouldn't as long as the loadbalancer was successfully deleted. I will have to think more about whether we want strong guarantees around cleanup of these resources.

cloud-controller-manager/do/loadbalancers.go Outdated Show resolved Hide resolved
@nicktate nicktate force-pushed the feature/letsencrypt branch 3 times, most recently from 1c3ef6b to 90507a8 Compare November 21, 2019 17:49
@nicktate nicktate force-pushed the feature/letsencrypt branch from 90507a8 to 958e64a Compare November 24, 2019 22:58
* Expects root domain to already be created and validated on
DigitalOcean (DO is not a registrar so we assume user has preconfigured
domain)
* Add domain annotation to specify either the root domain or a subdomain
of your choosing to the LoadBalancer service
* Automatically find or generate certificate, and attach to
LoadBalancer
* Automatically generate A-record for your subdomain to point to the
LoadBalancer
@nicktate nicktate force-pushed the feature/letsencrypt branch from 958e64a to 0600670 Compare November 24, 2019 23:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants