Update health checks to use Kube-Proxy when no other configuration is provided #622

JoelSpeed · 2023-06-28T11:04:42Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

When a service is using the default configuration (ie they haven't changed any of port, path or protocol via annotations), this updates the health check to, rather then checking the traffic port, check the kube-proxy health endpoint.

This is deliberately done so that if a user has specified any change to the port/path/protocol, we do not change the behaviour of their health checks. But if they have no opinion at all, we give them an improved health check.

Which issue(s) this PR fixes:

KEP 3458 was added to the AWS provider in release 1.27. This means now that a nodes presence in the load balancer backends, is no longer determined by whether the instance is healthy or not.

We have observed long periods of disruption (15-20s) during upgrades with the AWS provider since this went in. The issue here is because the health check is checking the traffic port, and not the nodes ability to route traffic to the backend.

This is specifically for exxternalTrafficPolicy: Cluster. In this topology, all nodes in the cluster accept connections for the backend for the service, and then route that internally. So, unless all endpoints are down, the traffic port that is exposed will always return a healthy result. When a node is going away, it's ability to route the traffic from the traffic port to the backend, is based on whether kube-proxy is running or not. In fact, so much so, that actually, it makes more sense to health check kube-proxy.

When kube-proxy is being terminated, assuming you have configured graceful shutdown, it will return bad healthchecks for a period before shutting down. This allows the AWS health check to observe the node is going away, before it loses the ability to route traffic to the backend.
In the current guise, it only fails health checks after the node can no longer route traffic.

I did a lot of research into this earlier in the year and wrote up kubernetes-sigs/cloud-provider-azure#3499 which explains the problem in a little more depth, note as well, that GCP uses kube-proxy based health checks in this manner and we are working to move Azure over too.

There's also KEP-3836 which suggests this is the way to go forward as well.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Update health checks to use Kube-Proxy when no other configuration is provided

k8s-ci-robot · 2023-06-28T11:04:50Z

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2023-06-28T11:04:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hakman for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hakman · 2023-06-28T12:41:05Z

/cc @dims

dims · 2023-06-28T14:15:34Z

@kmala can you please review this?

/assign @kmala

k8s-ci-robot · 2023-06-28T14:15:36Z

@dims: GitHub didn't allow me to assign the following users: kmala.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

@kmala can you please review this?

/assign @kmala

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JoelSpeed · 2023-06-28T14:28:30Z

/retest

To check this out, I created a cluster manually and ran the test suite, worked for me there so I think this may be a flake in this case.

Could also see that it was creating the health check as expected.

I've been testing this change as well in OpenShift and the CI we run for disruption of workload services is looking good with this patch included

haoranleo · 2023-06-28T17:12:33Z

/retest

k8s-triage-robot · 2023-06-29T16:45:25Z

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

kmala · 2023-07-05T16:29:14Z

/test pull-cloud-provider-aws-e2e

cartermckinnon · 2023-07-20T21:50:19Z

/retest

failure seemed like a test-infra issue, giving it another go before I dig in.

JoelSpeed · 2023-07-21T08:28:12Z

/retest

hakman · 2023-07-21T10:11:05Z

pkg/providers/v1/aws.go

+	portModified := parseStringAnnotation(svc.Annotations, ServiceAnnotationLoadBalancerHealthCheckPort, &hc.Port)
+
+	// For a non-local service, we override the health check to use the kube-proxy port when no other overrides are provided.
+	// The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic.


The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic.

My guess is that the assumption is not correct and the 10256 port is blocked by kOps.
You can get an idea of what security group rules are created by default:
https://github.com/kubernetes/kops/blob/master/tests/integration/update_cluster/aws-lb-controller/kubernetes.tf

Reviewing the TF you've linked, I can't see where ingress is allowed for any nodeport to the workers, unless I've misread, it looks like users have to manually allow the ports to the worker nodes to allow Service Of Type Load Balancer already?

My comment here was more, every node will open this port, because every node should be running kube-proxy or an equivalent substitute. Whether the security group rules are configured or not wasn't intended to be captured

Nodes open ports to each other, but not to any other IP. I think currently AWS CCM add the necessary rules for node ports, so port 10256 should be added there.
We can also test by opening node ports during testing, by adding --set cluster.spec.nodePortAccess=0.0.0.0/0 here:

cloud-provider-aws/hack/e2e/run.sh

Line 131 in d2b28cc

--create-args="--dns=none --zones=${ZONES} --node-size=m5.large --master-size=m5.large --override=cluster.spec.cloudControllerManager.cloudProvider=aws --override=cluster.spec.cloudControllerManager.image=${IMAGE_NAME}:${IMAGE_TAG}" \

Ack that's helpful, I'll add that hack for now and investigate updating to set up the rules correctly via CCM

Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?

Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?

cloud-provider-aws/pkg/providers/v1/aws.go

Lines 4637 to 4641 in d055109

allProtocols := "-1"

permission := &ec2.IpPermission{}

permission.IpProtocol = &allProtocols

permission.UserIdGroupPairs = []*ec2.UserIdGroupPair{sourceGroupID}

I spun up a cluster and can see that all traffic is allowed between the load balancer security group and the worker security group, the health checks are working, so perhaps theres some other issue at play with the E2E?

Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?

You are right, not sure how we can work around that.

Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?

I see you are looking at the Classic LB related function. Look for updateInstanceSecurityGroupsForNLB().

I spun up a cluster...

How did you do that?

I looked at the NLB, and tested it live, the NLB version does the same thing security group wise.

How did you do that?

Not using KOPS, but I had an AWS cluster that I could test this all on by other means.
I'll see if I can use the script for the run e2e job to spin up a cluster using KOPS

hakman · 2023-07-21T10:13:39Z

pkg/providers/v1/aws_loadbalancer.go


+	// We need the port as both string and int.
+	kubeProxyHealthCheckPort    = "10256"
+	kubeProxyHealthCheckPortInt = int32(10256)


Just a thought, it should be pretty easy to do strconv.Itoa(kubeProxyHealthCheckPort) where needed.

Suggested change

// We need the port as both string and int.

kubeProxyHealthCheckPort = "10256"

kubeProxyHealthCheckPortInt = int32(10256)

kubeProxyHealthCheckPort = int32(10256)

JoelSpeed · 2023-08-15T08:28:33Z

So on further investigation, it seems the issue right now is that when kops configures Cilium to replace kube-proxy, it doesn't tell it to replace kube-proxies health check endpoint by default, and Cilium by default doesn't expose the health check. Strikes me as odd, that, when you have strict kube-proxy replacement, you wouldn't want to emulate the health check endpoint that is expected of kube-proxy.

Manually configuring the healthz port on cilium to 0.0.0.0:10256 allows the tests to pass.

JoelSpeed · 2024-02-14T14:39:02Z

@hakman Coming back to this, I think we may want a softer approach to this change. I think it's the correct direction to take, but, in discovering that kops isn't configuring Cilium "correctly" right now, and having worked through the same change with the Azure folks, I'm going to propose another approach.

With Azure, we have added a way for users to choose between individual and shared health checks. The default for now being the individual health checks. That maintains the existing behaviour but for those in the know, they can update their health checks to use the new shared behaviour.

The intention there is to then go through a cycle of announcements and change the default over time, giving users the chance to make sure they have the kube-proxy health checks configured, and giving them a way to retain the old, broken behaviour if they desire.

Do you agree with moving forward with the same pattern here? If so, I'll get the PR updated to reflect what was done in Azure so there's a consistent experience across the two

k8s-triage-robot · 2024-05-14T14:56:32Z

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle stale
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-13T15:18:52Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

Mark this PR as fresh with /remove-lifecycle rotten
Close this PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

JoelSpeed · 2024-06-13T15:51:45Z

/remove-lifecycle rotten

JoelSpeed · 2024-08-19T11:18:02Z

@kmala I've updated this now to reflect the changes that we made in the Azure provider, where, the health checks are configurable in the cloud config level, as to whether we should use Shared (new) or the old ServiceNodePort health check options.

This will allow users to configure how they want, but keeps the existing behaviour as the default.

I suggest over time we move the default across to the other and make the ServiceNodePort behaviour a more legacy option

…r wide services

JoelSpeed · 2024-11-08T11:55:14Z

Anyone around who can help get this across the line?

This is adding an opt-in way to move health checks towards an agreed KEP that means that the health checks actually make sense, where today they don't actually work and don't gracefully drain nodes correctly.

In the future we should look to make these health checks the default, but this gives us the switch to allow people to start using it, before we make that default switch in the future.

I'm curious if the ALBO has implemented this KEP yet?

kmala · 2024-11-08T16:54:46Z

will take a look at this today

kmala · 2024-11-08T20:52:14Z

/lgtm

kmala · 2024-11-08T20:58:10Z

I'm curious if the ALBO has implemented this KEP yet?

I don't think so, @oliviassss can you confirm?

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 28, 2023

k8s-ci-robot requested a review from hakman June 28, 2023 11:04

k8s-ci-robot requested a review from justinsb June 28, 2023 11:04

k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 28, 2023

k8s-ci-robot requested a review from dims June 28, 2023 12:41

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 29, 2023

hakman reviewed Jul 21, 2023

View reviewed changes

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 13, 2024

JoelSpeed force-pushed the kube-proxy-health-checks branch from c1e7910 to 9394abd Compare August 19, 2024 11:16

k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 19, 2024

Add configuration to allow Kube Proxy health checks when using cluste…

532b956

…r wide services

JoelSpeed force-pushed the kube-proxy-health-checks branch from 9394abd to 532b956 Compare August 20, 2024 08:07

k8s-ci-robot assigned kmala Nov 8, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update health checks to use Kube-Proxy when no other configuration is provided #622

Update health checks to use Kube-Proxy when no other configuration is provided #622

JoelSpeed commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

hakman commented Jun 28, 2023

dims commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

JoelSpeed commented Jun 28, 2023

haoranleo commented Jun 28, 2023

k8s-triage-robot commented Jun 29, 2023

kmala commented Jul 5, 2023

cartermckinnon commented Jul 20, 2023 •

edited

Loading

JoelSpeed commented Jul 21, 2023

hakman Jul 21, 2023

JoelSpeed Jul 21, 2023

hakman Jul 21, 2023

JoelSpeed Jul 21, 2023

JoelSpeed Aug 7, 2023

JoelSpeed Aug 7, 2023

hakman Aug 7, 2023

JoelSpeed Aug 7, 2023

hakman Jul 21, 2023

JoelSpeed commented Aug 15, 2023 •

edited

Loading

JoelSpeed commented Feb 14, 2024

k8s-triage-robot commented May 14, 2024

k8s-triage-robot commented Jun 13, 2024

JoelSpeed commented Jun 13, 2024

JoelSpeed commented Aug 19, 2024

JoelSpeed commented Nov 8, 2024

kmala commented Nov 8, 2024

kmala commented Nov 8, 2024

kmala commented Nov 8, 2024

	allProtocols := "-1"

	permission := &ec2.IpPermission{}
	permission.IpProtocol = &allProtocols
	permission.UserIdGroupPairs = []*ec2.UserIdGroupPair{sourceGroupID}

Update health checks to use Kube-Proxy when no other configuration is provided #622

Are you sure you want to change the base?

Update health checks to use Kube-Proxy when no other configuration is provided #622

Conversation

JoelSpeed commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

hakman commented Jun 28, 2023

dims commented Jun 28, 2023

k8s-ci-robot commented Jun 28, 2023

JoelSpeed commented Jun 28, 2023

haoranleo commented Jun 28, 2023

k8s-triage-robot commented Jun 29, 2023

kmala commented Jul 5, 2023

cartermckinnon commented Jul 20, 2023 • edited Loading

JoelSpeed commented Jul 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JoelSpeed commented Aug 15, 2023 • edited Loading

JoelSpeed commented Feb 14, 2024

k8s-triage-robot commented May 14, 2024

k8s-triage-robot commented Jun 13, 2024

JoelSpeed commented Jun 13, 2024

JoelSpeed commented Aug 19, 2024

JoelSpeed commented Nov 8, 2024

kmala commented Nov 8, 2024

kmala commented Nov 8, 2024

kmala commented Nov 8, 2024

cartermckinnon commented Jul 20, 2023 •

edited

Loading

JoelSpeed commented Aug 15, 2023 •

edited

Loading