OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166

frobware · 2024-11-19T15:18:48Z

Introduce logic in desiredRouterDeployment to set the environmentvariable ROUTER_IDLE_CLOSE_ON_RESPONSE when the IdleConnectionTerminationPolicy field in the IngressController spec is set to Deferred. This change enables configuring HAProxy with the idle-close-on-response option for better control over idle connection termination behaviour.

Requires:

openshift-ci-robot · 2024-11-19T15:20:57Z

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is invalid:

expected the bug to target either version "4.18." or "openshift-4.18.", but it targets "4.19.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

DO NOT MERGE: github.com/openshift/api vendor bump

DO NOT MERGE: make update (CRD)

DO NOT MERGE: force IngressControllerImage to OCPBUGS-43745: Add support for IdleCloseTerminationPolicy router#639

OCPBUGS-43745: Add support for idle connection termination policy

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

frobware · 2024-11-19T15:22:19Z

/hold

Many commits should never merge.

frobware · 2024-11-20T15:59:43Z

/testwith openshift/cluster-ingress-operator/release-4.18/e2e-aws-operator openshift/router#639

openshift-ci · 2024-11-20T15:59:47Z

@frobware, testwith: could not generate prow job. ERROR:

no ref for requested test included in command

frobware · 2024-11-20T16:08:52Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

frobware · 2024-11-21T07:47:56Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

frobware · 2024-12-10T13:51:15Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

frobware · 2024-12-10T15:38:42Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

alebedev87 · 2024-12-11T14:47:15Z

/assign

candita · 2024-12-11T19:28:59Z

/assign @grzpiotrowski

openshift-ci · 2024-12-13T17:51:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from alebedev87. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

frobware · 2024-12-13T17:51:44Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

frobware · 2024-12-13T18:33:59Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

Add a `/custom-response` endpoint that returns a configurable response via the `CUSTOM_RESPONSE` environment variable.

Pickup openshift/api#2102 % git show 27316471eb72fe8fcf0d44fb5a0602f698f253dc commit 27316471eb72fe8fcf0d44fb5a0602f698f253dc Merge: de9de05a8 b7417509c Author: openshift-merge-bot[bot] <148852131+openshift-merge-bot[bot]@users.noreply.github.com> Date: Wed Dec 18 10:31:50 2024 +0000 Merge pull request #2102 from frobware/OCPBUGS-43745-idle-close-on-response OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec Vendoring steps: $ go mod edit -replace github.com/openshift/api=github.com/openshift/api@27316471eb72fe8fcf0d44fb5a0602f698f253dc $ go mod tidy $ go mod vendor $ make update

frobware · 2024-12-19T10:21:56Z

openshift/api#2102 has merged. Vendor bump in 6a882ac.

This is ready for review but needs openshift/router#639 for the new e2e test to pass.

/hold cancel

frobware · 2024-12-19T10:22:34Z

/jira refresh

openshift-ci-robot · 2024-12-19T10:22:40Z

@frobware: This pull request references Jira Issue OCPBUGS-43745, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.19.0) matches configured target version for branch (4.19.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

alebedev87

The code change LGTM. All the comments are about the e2e.

alebedev87 · 2024-12-20T09:28:40Z

test/e2e/idle_connection_test.go

+	expectedBackendName := fmt.Sprintf("be_http:%s:%s", routeName.Namespace, routeName.Name)
+	expectedServerName := fmt.Sprintf("pod:%s:%s:http:%s:%d", tc.pods[serviceIndex].Name, service.Name, tc.pods[serviceIndex].Status.PodIP, service.Spec.Ports[0].Port)
+
+	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
+	defer cancel()
+	if err := waitForHAProxyConfigUpdate(ctx, t, ic, expectedBackendName, expectedServerName); err != nil {
+		return nil, fmt.Errorf("error waiting for HAProxy configuration update for route %s to point to service %s/%s: %w", routeName, service.Namespace, service.Name, err)
+	}


That adds a lot of moving parts (== code) to the test. Also, will it work in DCM case? The server address may be updated dynamically (I have to check how DCM behaves during a server switch). May be a simple time.Sleep(6 * time.Seconds) would be enough (and would suit the DCM case too).

Yes, all good points. Given this is a parallel test I could just sleep 30s (for busy clusters).

I will rip this out and sleep 30s instead. Why 30s? To avoid flakes if we chose 6s (i.e., a 1s more than the 5s routing update default). There's little point in chasing test flakes when the test is a) parallel and b) runs for a couple of minutes anyway.

alebedev87 · 2024-12-20T09:35:22Z

test/e2e/idle_connection_test.go

+		podList, err := getPods(t, kclient, deployment)
+		if err != nil {
+			return nil, fmt.Errorf("failed to fetch pods for deployment %s/%s: %w", deployment.Namespace, deployment.Name, err)
+		}
+
+		if len(podList.Items) == 0 {
+			return nil, fmt.Errorf("no pods in  deployment %s/%s", deployment.Namespace, deployment.Name)
+		}
+
+		for i := range podList.Items {
+			tc.pods = append(tc.pods, &podList.Items[i])
+		}


Do we need this for the config check? I'm asking because the check for Complete deployment guarantees us available pods.

This was local assertions, or post-conditions. Exiting this function should have setup everything required by the remainder of the test code.

alebedev87 · 2024-12-20T09:36:52Z

test/e2e/idle_connection_test.go

+		if err := waitForDeploymentComplete(t, kclient, deployment, 2*time.Minute); err != nil {
+			return nil, fmt.Errorf("deployment %s/%s is not ready: %w", deployment.Namespace, deployment.Name, err)
+		}


We do a check for the deployment Complete in idleConnectionCreateBackendService already.

alebedev87 · 2024-12-20T09:38:17Z

test/e2e/idle_connection_test.go

+	if len(tc.deployments) != 2 {
+		return nil, fmt.Errorf("expected 2 deployments, but got %d", len(tc.deployments))
+	}
+
+	if len(tc.services) != 2 {
+		return nil, fmt.Errorf("expected 2 services, but got %d", len(tc.services))
+	}
+
+	if len(tc.pods) != 2 {
+		return nil, fmt.Errorf("expected 2 pods, but got %d", len(tc.pods))
+	}


These checks seem redundant, if we reached this point in code it means that the API server assured us that these objects are created in idleConnectionCreateBackendService.

alebedev87 · 2024-12-20T09:44:35Z

test/e2e/idle_connection_test.go

+		if i == 0 && policy == currentIdleTerminationPolicy {
+			t.Logf("Skipping initial policy switch as current policy %q already matches %q", currentIdleTerminationPolicy, policy)
+		} else {
+			if err := idleConnectionSwitchIdleTerminationPolicy(t, ic, icName, policy); err != nil {
+				t.Fatalf("failed to switch to policy %q: %v", policy, err)
+			}
+		}


I'm thinking whether 2 test cases (1 for Immediate and 1 for Deferred) would be a simpler approach. There would be no need 1) for idleConnectionSwitchIdleTerminationPolicy function, 2) to care for the order of the policies.

Seems like a lot of repetition to have two test cases for 10 lines of code.

I believe it’s important to test the case where the termination policy is switched. If we had two private ICs pre-configured with Deferred and Immediate policies, we would never perform an end-to-end test to validate the behaviour of switching between them.

alebedev87 · 2024-12-20T09:52:18Z

test/e2e/idle_connection_test.go

+	return nil
+}
+
+func idleConnectionCreateDeployment(ctx context.Context, namespace string, serviceNumber int, serverResponse, image string) (*appsv1.Deployment, error) {


Do you want your test workload to be a Deployment for a resilience reason? I'm asking because build*Pod functions from utils_test.go (example) are used in most cases and look shorter.

Would the test benefit form just a pod? I don't know. I just chose a deployment as a reasonable abstraction.

alebedev87 · 2024-12-20T10:05:27Z

test/e2e/idle_connection_test.go

+								{Name: "TLS_CERT", Value: "/etc/serving-cert/tls.crt"},
+								{Name: "TLS_KEY", Value: "/etc/serving-cert/tls.key"},


Why do we need all the hustle with the serving cert secret if at the end we hit the plain http port?

Because I rely on the server in test/http2 which requires certificates for the server that listens on port https.

I see now. Does it make sense to make the listening on HTTPS port conditional (== don't listen on HTTPS if envvar is not provided)?

frobware · 2024-12-24T09:57:18Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

Introduce logic in desiredRouterDeployment to set the environment variable `ROUTER_IDLE_CLOSE_ON_RESPONSE` when the `IdleConnectionTerminationPolicy` field in the IngressController spec is set to `Deferred`. This change enables configuring HAProxy with the `idle-close-on-response` option for better control over idle connection termination behaviour.

frobware · 2024-12-24T15:48:34Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

openshift/router#639

openshift-ci · 2024-12-24T15:57:51Z

@frobware: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/verify	`014b62f`	link	true	`/test verify`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

frobware · 2024-12-24T15:59:16Z

/testwith openshift/cluster-ingress-operator/master/e2e-aws-operator openshift/router#639

openshift-ci bot requested review from alebedev87 and Miciah November 19, 2024 15:19

frobware changed the title ~~OCPBUGS 43745 idle close on response~~ OCPBUGS:43745: idle close on response Nov 19, 2024

frobware changed the title ~~OCPBUGS:43745: idle close on response~~ OCPBUGS-43745: idle close on response Nov 19, 2024

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 19, 2024

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from a7dea46 to 991d367 Compare November 20, 2024 15:53

frobware changed the title ~~OCPBUGS-43745: idle close on response~~ OCPBUGS-43745: Add support for IdleCloseTerminationPolicy Nov 20, 2024

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 991d367 to 052f6b6 Compare November 20, 2024 19:23

frobware mentioned this pull request Nov 21, 2024

OCPBUGS-43745: Add IdleCloseOnResponse field to IngressControllerSpec openshift/api#2102

Merged

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 052f6b6 to 0aa6244 Compare November 21, 2024 07:47

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 0aa6244 to 1072f8f Compare November 21, 2024 08:04

openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 27, 2024

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch 2 times, most recently from f89fbb2 to 9707f72 Compare December 10, 2024 13:44

openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 10, 2024

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 9707f72 to 77673b3 Compare December 10, 2024 14:49

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 77673b3 to 3029807 Compare December 10, 2024 15:43

openshift-ci bot assigned alebedev87 Dec 11, 2024

openshift-ci bot assigned grzpiotrowski Dec 11, 2024

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 650e796 to 6449f4c Compare December 13, 2024 18:33

frobware added 2 commits December 19, 2024 09:56

test/http2: Add custom-response endpoint for flexible e2e testing

6083558

Add a `/custom-response` endpoint that returns a configurable response via the `CUSTOM_RESPONSE` environment variable.

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 6449f4c to 60b3714 Compare December 19, 2024 10:19

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Dec 19, 2024

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Dec 19, 2024

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Dec 19, 2024

openshift-ci bot requested a review from lihongan December 19, 2024 10:22

frobware mentioned this pull request Dec 19, 2024

OCPBUGS-43745: Add support for IdleCloseTerminationPolicy openshift/router#639

Open

alebedev87 reviewed Dec 20, 2024

View reviewed changes

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 60b3714 to 1c5e4e9 Compare December 24, 2024 09:55

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 1c5e4e9 to 014b62f Compare December 24, 2024 15:47

frobware added 2 commits December 24, 2024 15:56

DO NOT MERGE: disable serial tests

40d1040

DO NOT MERGE: force IngressControllerImage to openshift/router#639

49e561e

openshift/router#639

frobware force-pushed the OCPBUGS-43745-idle-close-on-response branch from 014b62f to 49e561e Compare December 24, 2024 15:57

		{Name: "TLS_CERT", Value: "/etc/serving-cert/tls.crt"},
		{Name: "TLS_KEY", Value: "/etc/serving-cert/tls.key"},

OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166

Are you sure you want to change the base?

OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166

Conversation

frobware commented Nov 19, 2024 • edited Loading

openshift-ci-robot commented Nov 19, 2024

frobware commented Nov 19, 2024

frobware commented Nov 20, 2024

openshift-ci bot commented Nov 20, 2024

frobware commented Nov 20, 2024

frobware commented Nov 21, 2024

frobware commented Dec 10, 2024

frobware commented Dec 10, 2024

alebedev87 commented Dec 11, 2024

candita commented Dec 11, 2024

openshift-ci bot commented Dec 13, 2024

frobware commented Dec 13, 2024

frobware commented Dec 13, 2024

frobware commented Dec 19, 2024

frobware commented Dec 19, 2024

openshift-ci-robot commented Dec 19, 2024

alebedev87 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alebedev87 Dec 20, 2024 • edited Loading

Choose a reason for hiding this comment

frobware commented Dec 24, 2024

frobware commented Dec 24, 2024

openshift-ci bot commented Dec 24, 2024

frobware commented Dec 24, 2024

frobware commented Nov 19, 2024 •

edited

Loading

alebedev87 Dec 20, 2024 •

edited

Loading