Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-43745: Add support for IdleCloseTerminationPolicy #1166

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion cmd/ingress-operator/start.go
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ func NewStartCommand() *cobra.Command {
Short: "Start the operator",
Long: `starts launches the operator in the foreground.`,
Run: func(cmd *cobra.Command, args []string) {
options.IngressControllerImage = "quay.io/amcdermo/ocpbugs-43745-idle-close-on-response@sha256:7f4d7db86584674f969e6fbd68f7dab2d70b4b22f7fc0ceceb9cdfc307356b43"
if err := start(&options); err != nil {
log.Error(err, "error starting")
os.Exit(1)
Expand All @@ -86,7 +87,6 @@ func NewStartCommand() *cobra.Command {
}

func start(opts *StartOptions) error {

kubeConfig, err := config.GetConfig()
if err != nil {
return fmt.Errorf("failed to get kube config: %v", err)
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,6 @@ require (
// github.com/operator-framework/operator-sdk.
replace (
bitbucket.org/ww/goautoneg => github.com/munnerz/goautoneg v0.0.0-20120707110453-a547fc61f48d
github.com/openshift/api => github.com/openshift/api v0.0.0-20241018083007-4f6053f954b0
github.com/openshift/api => github.com/openshift/api v0.0.0-20241218103150-27316471eb72
k8s.io/client-go => k8s.io/client-go v0.31.1
)
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -987,8 +987,8 @@ github.com/opencontainers/runtime-spec v0.1.2-0.20190507144316-5b71a03e2700/go.m
github.com/opencontainers/runtime-spec v0.1.2-0.20190618234442-a950415649c7/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-spec v1.0.0/go.mod h1:jwyrGlmzljRJv/Fgzds9SsS/C5hL+LL3ko9hs6T5lQ0=
github.com/opencontainers/runtime-tools v0.0.0-20181011054405-1d69bd0f9c39/go.mod h1:r3f7wjNzSs2extwzU3Y+6pKfobzPh+kKFJ3ofN+3nfs=
github.com/openshift/api v0.0.0-20241018083007-4f6053f954b0 h1:9CBNaPGycU2dDzq0XoRIqxH0vHZezKDfbINx8e5zH0I=
github.com/openshift/api v0.0.0-20241018083007-4f6053f954b0/go.mod h1:Shkl4HanLwDiiBzakv+con/aMGnVE2MAGvoKp5oyYUo=
github.com/openshift/api v0.0.0-20241218103150-27316471eb72 h1:36OSW7rV3TzkZhZ5BPQ2D1j8nepAV2CJCQFAHKdqg9A=
github.com/openshift/api v0.0.0-20241218103150-27316471eb72/go.mod h1:Shkl4HanLwDiiBzakv+con/aMGnVE2MAGvoKp5oyYUo=
github.com/openshift/build-machinery-go v0.0.0-20200211121458-5e3d6e570160/go.mod h1:1CkcsT3aVebzRBzVTSbiKSkJMsC/CASqxesfqEMfJEc=
github.com/openshift/client-go v0.0.0-20200116152001-92a2713fa240/go.mod h1:4riOwdj99Hd/q+iAcJZfNCsQQQMwURnZV6RL4WHYS5w=
github.com/openshift/client-go v0.0.0-20240405120947-c67c8325cdd8 h1:HGfbllzRcrJBSiwzNjBCs7sExLUxC5/1evnvlNGB0Cg=
Expand Down
72 changes: 71 additions & 1 deletion manifests/00-custom-resource-definition.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1255,6 +1255,76 @@ spec:
type: string
type: object
type: object
idleConnectionTerminationPolicy:
default: Immediate
description: |-
idleConnectionTerminationPolicy maps directly to HAProxy's
idle-close-on-response option and controls whether HAProxy
keeps idle frontend connections open during a soft stop
(router reload).

Allowed values for this field are "Immediate" and
"Deferred". The default value is "Immediate".

When set to "Immediate", idle connections are closed
immediately during router reloads. This ensures immediate
propagation of route changes but may impact clients
sensitive to connection resets.

When set to "Deferred", HAProxy will maintain idle
connections during a soft reload instead of closing them
immediately. These connections remain open until any of the
following occurs:

- A new request is received on the connection, in which
case HAProxy handles it in the old process and closes
the connection after sending the response.

- HAProxy's `timeout http-keep-alive` duration expires
(300 seconds in OpenShift's configuration, not
configurable).

- The client's keep-alive timeout expires, causing the
client to close the connection.

Setting Deferred can help prevent errors in clients or load
balancers that do not properly handle connection resets.
Additionally, this option allows you to retain the pre-2.4
HAProxy behaviour: in HAProxy version 2.2 (OpenShift
versions < 4.14), maintaining idle connections during a
soft reload was the default behaviour, but starting with
HAProxy 2.4, the default changed to closing idle
connections immediately.

Important Consideration:

- Using Deferred will result in temporary inconsistencies
for the first request on each persistent connection
after a route update and router reload. This request
will be processed by the old HAProxy process using its
old configuration. Subsequent requests will use the
updated configuration.

Operational Considerations:

- Keeping idle connections open during reloads may lead
to an accumulation of old HAProxy processes if
connections remain idle for extended periods,
especially in environments where frequent reloads
occur.

- Consider monitoring the number of HAProxy processes in
the router pods when Deferred is set.

- You may need to enable or adjust the
`ingress.operator.openshift.io/hard-stop-after`
duration (configured via an annotation on the
IngressController resource) in environments with
frequent reloads to prevent resource exhaustion.
enum:
- Immediate
- Deferred
type: string
logging:
description: |-
logging defines parameters for what should be logged where. If this
Expand Down Expand Up @@ -2089,7 +2159,7 @@ spec:
type: string
connectTimeout:
description: |-
ConnectTimeout defines the maximum time to wait for
connectTimeout defines the maximum time to wait for
a connection attempt to a server/backend to succeed.

This field expects an unsigned duration string of decimal numbers, each with optional
Expand Down
10 changes: 10 additions & 0 deletions pkg/operator/controller/ingress/deployment.go
Original file line number Diff line number Diff line change
Expand Up @@ -734,6 +734,8 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
}

deployment.Spec.Template.Spec.Containers[0].Image = ingressControllerImage
deployment.Spec.Template.Spec.Containers[0].ImagePullPolicy = corev1.PullAlways

deployment.Spec.Template.Spec.DNSPolicy = corev1.DNSClusterFirst

var (
Expand Down Expand Up @@ -1185,6 +1187,13 @@ func desiredRouterDeployment(ci *operatorv1.IngressController, ingressController
)
}

if ci.Spec.IdleConnectionTerminationPolicy == operatorv1.IngressControllerConnectionTerminationPolicyDeferred {
env = append(env, corev1.EnvVar{
Name: "ROUTER_IDLE_CLOSE_ON_RESPONSE",
Value: "true",
})
}

// TODO: The only connections from the router that may need the cluster-wide proxy are those for downloading CRLs,
// which, as of writing this, will always be http. If https becomes necessary, the router will need to mount the
// trusted CA bundle that cluster-network-operator generates. The process for adding that is described here:
Expand Down Expand Up @@ -1675,6 +1684,7 @@ func deploymentConfigChanged(current, expected *appsv1.Deployment) (bool, *appsv
updated.Spec.Template.Spec.Containers[0].SecurityContext = expected.Spec.Template.Spec.Containers[0].SecurityContext
updated.Spec.Template.Spec.Containers[0].Env = expected.Spec.Template.Spec.Containers[0].Env
updated.Spec.Template.Spec.Containers[0].Image = expected.Spec.Template.Spec.Containers[0].Image
updated.Spec.Template.Spec.Containers[0].ImagePullPolicy = expected.Spec.Template.Spec.Containers[0].ImagePullPolicy
copyProbe(expected.Spec.Template.Spec.Containers[0].LivenessProbe, updated.Spec.Template.Spec.Containers[0].LivenessProbe, true)
copyProbe(expected.Spec.Template.Spec.Containers[0].ReadinessProbe, updated.Spec.Template.Spec.Containers[0].ReadinessProbe, true)
copyProbe(expected.Spec.Template.Spec.Containers[0].StartupProbe, updated.Spec.Template.Spec.Containers[0].StartupProbe, true)
Expand Down
51 changes: 51 additions & 0 deletions pkg/operator/controller/ingress/deployment_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2564,3 +2564,54 @@ func TestDesiredRouterDeploymentRouterExternalCertificate(t *testing.T) {

checkDeploymentHasEnvSorted(t, deployment)
}

// Test_IdleConnectionTerminationPolicy validates that the ingress
// controller correctly sets the ROUTER_IDLE_CLOSE_ON_RESPONSE
// environment variable based on the IngressController's
// IdleConnectionTerminationPolicy field.
func Test_IdleConnectionTerminationPolicy(t *testing.T) {
ic, ingressConfig, infraConfig, apiConfig, networkConfig, _, clusterProxyConfig := getRouterDeploymentComponents(t)

for _, tc := range []struct {
name string
policy operatorv1.IngressControllerConnectionTerminationPolicy
expectEnvVarPresent bool
expectedEnvVarValue string
}{{
name: "IdleConnectionTerminationPolicy is Deferred",
policy: operatorv1.IngressControllerConnectionTerminationPolicyDeferred,
expectEnvVarPresent: true,
expectedEnvVarValue: "true",
}, {
name: "IdleConnectionTerminationPolicy is not set",
policy: "",
expectEnvVarPresent: false,
expectedEnvVarValue: "",
}, {
name: "IdleConnectionTerminationPolicy is Immediate (default)",
policy: operatorv1.IngressControllerConnectionTerminationPolicyImmediate,
expectEnvVarPresent: false,
expectedEnvVarValue: "",
}} {
t.Run(tc.name, func(t *testing.T) {
ic.Spec.IdleConnectionTerminationPolicy = tc.policy

deployment, err := desiredRouterDeployment(ic, ingressControllerImage, ingressConfig, infraConfig, apiConfig, networkConfig, false, false, nil, clusterProxyConfig, false, false)
if err != nil {
t.Fatalf("failed to generate desired router Deployment: %v", err)
}

expectedEnv := []envData{{
name: "ROUTER_IDLE_CLOSE_ON_RESPONSE",
expectPresent: tc.expectEnvVarPresent,
expectedValue: tc.expectedEnvVarValue,
}}

if err := checkDeploymentEnvironment(t, deployment, expectedEnv); err != nil {
t.Errorf("environment variable check failed: %v", err)
}

checkDeploymentHasEnvSorted(t, deployment)
})
}
}
34 changes: 1 addition & 33 deletions test/e2e/all_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -88,41 +88,9 @@ func TestAll(t *testing.T) {
t.Run("TestUnmanagedAWSLBSubnets", TestUnmanagedAWSLBSubnets)
t.Run("TestAWSEIPAllocationsForNLB", TestAWSEIPAllocationsForNLB)
t.Run("TestUnmanagedAWSEIPAllocations", TestUnmanagedAWSEIPAllocations)
t.Run("Test_IdleConnectionTerminationPolicy", Test_IdleConnectionTerminationPolicy)
})

t.Run("serial", func(t *testing.T) {
t.Run("TestDefaultIngressControllerSteadyConditions", TestDefaultIngressControllerSteadyConditions)
t.Run("TestClusterOperatorStatusRelatedObjects", TestClusterOperatorStatusRelatedObjects)
t.Run("TestConfigurableRouteNoConsumingUserNoRBAC", TestConfigurableRouteNoConsumingUserNoRBAC)
t.Run("TestConfigurableRouteNoSecretNoRBAC", TestConfigurableRouteNoSecretNoRBAC)
t.Run("TestConfigurableRouteRBAC", TestConfigurableRouteRBAC)
t.Run("TestDefaultIngressCertificate", TestDefaultIngressCertificate)
t.Run("TestDefaultIngressClass", TestDefaultIngressClass)
t.Run("TestOperatorRecreatesItsClusterOperator", TestOperatorRecreatesItsClusterOperator)
t.Run("TestAWSLBTypeDefaulting", TestAWSLBTypeDefaulting)
t.Run("TestAWSResourceTagsChanged", TestAWSResourceTagsChanged)
t.Run("TestHstsPolicyWorks", TestHstsPolicyWorks)
t.Run("TestIngressControllerCustomEndpoints", TestIngressControllerCustomEndpoints)
t.Run("TestIngressStatus", TestIngressStatus)
t.Run("TestLocalWithFallbackOverrideForLoadBalancerService", TestLocalWithFallbackOverrideForLoadBalancerService)
t.Run("TestOperatorSteadyConditions", TestOperatorSteadyConditions)
t.Run("TestPodDisruptionBudgetExists", TestPodDisruptionBudgetExists)
t.Run("TestProxyProtocolOnAWS", TestProxyProtocolOnAWS)
t.Run("TestRouteHTTP2EnableAndDisableIngressController", TestRouteHTTP2EnableAndDisableIngressController)
t.Run("TestRouteHardStopAfterEnableOnIngressController", TestRouteHardStopAfterEnableOnIngressController)
t.Run("TestRouteHardStopAfterTestInvalidDuration", TestRouteHardStopAfterTestInvalidDuration)
t.Run("TestRouteHardStopAfterTestOneDayDuration", TestRouteHardStopAfterTestOneDayDuration)
t.Run("TestRouteHardStopAfterTestZeroLengthDuration", TestRouteHardStopAfterTestZeroLengthDuration)
t.Run("TestRouteNbthreadIngressController", TestRouteNbthreadIngressController)
t.Run("TestRouterCompressionOperation", TestRouterCompressionOperation)
t.Run("TestUpdateDefaultIngressControllerSecret", TestUpdateDefaultIngressControllerSecret)
t.Run("TestCanaryRoute", TestCanaryRoute)
t.Run("TestCanaryWithMTLS", TestCanaryWithMTLS)
t.Run("TestCanaryRouteClearsSpecHost", TestCanaryRouteClearsSpecHost)
t.Run("TestRouteHTTP2EnableAndDisableIngressConfig", TestRouteHTTP2EnableAndDisableIngressConfig)
t.Run("TestRouteHardStopAfterEnableOnIngressConfig", TestRouteHardStopAfterEnableOnIngressConfig)
t.Run("TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig", TestRouteHardStopAfterEnableOnIngressControllerHasPriorityOverIngressConfig)
t.Run("TestHostNetworkPortBinding", TestHostNetworkPortBinding)
t.Run("TestDashboardCreation", TestDashboardCreation)
})
}
Loading