-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update health checks to use Kube-Proxy when no other configuration is provided #622
base: master
Are you sure you want to change the base?
Conversation
This issue is currently awaiting triage. If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cc @dims |
@dims: GitHub didn't allow me to assign the following users: kmala. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/retest To check this out, I created a cluster manually and ran the test suite, worked for me there so I think this may be a flake in this case. Could also see that it was creating the health check as expected. I've been testing this change as well in OpenShift and the CI we run for disruption of workload services is looking good with this patch included |
/retest |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
/test pull-cloud-provider-aws-e2e |
/retest failure seemed like a test-infra issue, giving it another go before I dig in. |
/retest |
pkg/providers/v1/aws.go
Outdated
portModified := parseStringAnnotation(svc.Annotations, ServiceAnnotationLoadBalancerHealthCheckPort, &hc.Port) | ||
|
||
// For a non-local service, we override the health check to use the kube-proxy port when no other overrides are provided. | ||
// The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic.
My guess is that the assumption is not correct and the 10256 port is blocked by kOps.
You can get an idea of what security group rules are created by default:
https://github.com/kubernetes/kops/blob/master/tests/integration/update_cluster/aws-lb-controller/kubernetes.tf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewing the TF you've linked, I can't see where ingress is allowed for any nodeport to the workers, unless I've misread, it looks like users have to manually allow the ports to the worker nodes to allow Service Of Type Load Balancer already?
My comment here was more, every node will open this port, because every node should be running kube-proxy or an equivalent substitute. Whether the security group rules are configured or not wasn't intended to be captured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nodes open ports to each other, but not to any other IP. I think currently AWS CCM add the necessary rules for node ports, so port 10256 should be added there.
We can also test by opening node ports during testing, by adding --set cluster.spec.nodePortAccess=0.0.0.0/0
here:
cloud-provider-aws/hack/e2e/run.sh
Line 131 in d2b28cc
--create-args="--dns=none --zones=${ZONES} --node-size=m5.large --master-size=m5.large --override=cluster.spec.cloudControllerManager.cloudProvider=aws --override=cluster.spec.cloudControllerManager.image=${IMAGE_NAME}:${IMAGE_TAG}" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack that's helpful, I'll add that hack for now and investigate updating to set up the rules correctly via CCM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?
cloud-provider-aws/pkg/providers/v1/aws.go
Lines 4637 to 4641 in d055109
allProtocols := "-1" | |
permission := &ec2.IpPermission{} | |
permission.IpProtocol = &allProtocols | |
permission.UserIdGroupPairs = []*ec2.UserIdGroupPair{sourceGroupID} |
I spun up a cluster and can see that all traffic is allowed between the load balancer security group and the worker security group, the health checks are working, so perhaps theres some other issue at play with the E2E?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?
You are right, not sure how we can work around that.
Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?
I see you are looking at the Classic LB related function. Look for updateInstanceSecurityGroupsForNLB()
.
I spun up a cluster...
How did you do that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the NLB, and tested it live, the NLB version does the same thing security group wise.
How did you do that?
Not using KOPS, but I had an AWS cluster that I could test this all on by other means.
I'll see if I can use the script for the run e2e job to spin up a cluster using KOPS
pkg/providers/v1/aws_loadbalancer.go
Outdated
|
||
// We need the port as both string and int. | ||
kubeProxyHealthCheckPort = "10256" | ||
kubeProxyHealthCheckPortInt = int32(10256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, it should be pretty easy to do strconv.Itoa(kubeProxyHealthCheckPort)
where needed.
// We need the port as both string and int. | |
kubeProxyHealthCheckPort = "10256" | |
kubeProxyHealthCheckPortInt = int32(10256) | |
kubeProxyHealthCheckPort = int32(10256) |
So on further investigation, it seems the issue right now is that when kops configures Cilium to replace kube-proxy, it doesn't tell it to replace kube-proxies health check endpoint by default, and Cilium by default doesn't expose the health check. Strikes me as odd, that, when you have Manually configuring the healthz port on cilium to |
@hakman Coming back to this, I think we may want a softer approach to this change. I think it's the correct direction to take, but, in discovering that kops isn't configuring Cilium "correctly" right now, and having worked through the same change with the Azure folks, I'm going to propose another approach. With Azure, we have added a way for users to choose between individual and shared health checks. The default for now being the individual health checks. That maintains the existing behaviour but for those in the know, they can update their health checks to use the new shared behaviour. The intention there is to then go through a cycle of announcements and change the default over time, giving users the chance to make sure they have the kube-proxy health checks configured, and giving them a way to retain the old, broken behaviour if they desire. Do you agree with moving forward with the same pattern here? If so, I'll get the PR updated to reflect what was done in Azure so there's a consistent experience across the two |
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
c1e7910
to
9394abd
Compare
@kmala I've updated this now to reflect the changes that we made in the Azure provider, where, the health checks are configurable in the cloud config level, as to whether we should use Shared (new) or the old ServiceNodePort health check options. This will allow users to configure how they want, but keeps the existing behaviour as the default. I suggest over time we move the default across to the other and make the ServiceNodePort behaviour a more legacy option |
9394abd
to
532b956
Compare
Anyone around who can help get this across the line? This is adding an opt-in way to move health checks towards an agreed KEP that means that the health checks actually make sense, where today they don't actually work and don't gracefully drain nodes correctly. In the future we should look to make these health checks the default, but this gives us the switch to allow people to start using it, before we make that default switch in the future. I'm curious if the ALBO has implemented this KEP yet? |
will take a look at this today |
/lgtm |
I don't think so, @oliviassss can you confirm? |
What type of PR is this?
/kind bug
What this PR does / why we need it:
When a service is using the default configuration (ie they haven't changed any of port, path or protocol via annotations), this updates the health check to, rather then checking the traffic port, check the kube-proxy health endpoint.
This is deliberately done so that if a user has specified any change to the port/path/protocol, we do not change the behaviour of their health checks. But if they have no opinion at all, we give them an improved health check.
Which issue(s) this PR fixes:
KEP 3458 was added to the AWS provider in release 1.27. This means now that a nodes presence in the load balancer backends, is no longer determined by whether the instance is healthy or not.
We have observed long periods of disruption (15-20s) during upgrades with the AWS provider since this went in. The issue here is because the health check is checking the traffic port, and not the nodes ability to route traffic to the backend.
This is specifically for
exxternalTrafficPolicy: Cluster
. In this topology, all nodes in the cluster accept connections for the backend for the service, and then route that internally. So, unless all endpoints are down, the traffic port that is exposed will always return a healthy result. When a node is going away, it's ability to route the traffic from the traffic port to the backend, is based on whether kube-proxy is running or not. In fact, so much so, that actually, it makes more sense to health check kube-proxy.When kube-proxy is being terminated, assuming you have configured graceful shutdown, it will return bad healthchecks for a period before shutting down. This allows the AWS health check to observe the node is going away, before it loses the ability to route traffic to the backend.
In the current guise, it only fails health checks after the node can no longer route traffic.
I did a lot of research into this earlier in the year and wrote up kubernetes-sigs/cloud-provider-azure#3499 which explains the problem in a little more depth, note as well, that GCP uses kube-proxy based health checks in this manner and we are working to move Azure over too.
There's also KEP-3836 which suggests this is the way to go forward as well.
Special notes for your reviewer:
Does this PR introduce a user-facing change?: