h2: Rapid reset mitigations (7.4) #4009

dridi · 2023-10-18T16:46:08Z

Port of #3997, #3998, #3999 and adjacent commits to the 7.4 branch.

No merge conflicts.

This adds parameters h2_rst_allowance and h2_rst_allowance_period, which govern the rate of which we allow clients to reset h/2 streams. If the limit is exceeded the connection is closed. Mitigates: varnishcache#3996

Only RST frames received earlier than this duration will be considered rapid.

It was particularly hard to follow once we reach client c3.

The goal is for top-level transports to report whether the client is still present or not.

Once a client is reportedly gone, processing its VCL task(s) is just a waste of resources. The execution of client-facing VCL is intercepted and an artificial return(fail) is returned in that scenario. Thanks to the introduction of the universal return(fail) proper error handling and resource tear down is already in place, which makes this change safe modulus unknown bugs. This adds a circuit breaker anywhere in the client state machine where there is VCL execution. A new Reset time stamp is logged to convey when a task does not complete because the client is gone. This is a good complement to the walk away feature and its original circuit breaker for the waiting list, but this has not been integrated yet. While the request is technically failed, it won't increase the vcl_fail counter, and a new req_reset counter is incremented. This new behavior is guarded by a new vcl_req_reset feature flag, enabled by default. Refs varnishcache#3835 Refs 61a15cb Refs e5efc2c Refs ba54dc9 Refs 6f50a00 Refs b881699

The error check is not performed in a critical section to avoid contention, at the risk of not seeing the error until the next transport poll.

With varnishcache#3998 we need to ensure streams are not going to skip vcl_recv if reset faster than reaching this step for the request task. The alternative to prevent the vcl_req_reset feature from interfering is to simply disable it.

Noticed while porting varnishcache#3998 to the 6.0 branch with a varnishtest more sensitive to timing.

This will allow per-session adjustments and also significantly lower the risk of inconsistent calculations in the rate limit code during parameter changes. Ref varnishcache#3996

as agreed on IRC.

(sorry)

we can not make the parameter const because API.

nigoroll

LGTM

dridi · 2023-10-23T17:20:58Z

I accidentally pushed this patch series plus the patch series containing 325faac directly to 7.4 (and free of conflicts) but I suppose it's fine.

nigoroll and others added 18 commits October 18, 2023 18:43

Bump cli_limit to fit param.show -j with more parameters coming

ce3c6b5

h2: Add a rate limit facility for h/2 RST handling

9dda589

This adds parameters h2_rst_allowance and h2_rst_allowance_period, which govern the rate of which we allow clients to reset h/2 streams. If the limit is exceeded the connection is closed. Mitigates: varnishcache#3996

Introduce RAPID_RESET as a sess_close reason

856e2fd

Add param h2_rapid_reset

5eb4c5c

Only RST frames received earlier than this duration will be considered rapid.

Polish h2_rapid_reset docs

56eded7

Flexelinting

cbadf10

slinkified dridi-polish

0e37f4f

vtc: Avoid cycling the barrier in t02014

727882c

It was particularly hard to follow once we reach client c3.

transport: New poll method

a983f4b

The goal is for top-level transports to report whether the client is still present or not.

http2_session: Implement transport polling

6ecc9ec

The error check is not performed in a critical section to avoid contention, at the risk of not seeing the error until the next transport poll.

vtc: Stabilize r3996 and increase coverage

b27f508

With varnishcache#3998 we need to ensure streams are not going to skip vcl_recv if reset faster than reaching this step for the request task. The alternative to prevent the vcl_req_reset feature from interfering is to simply disable it.

vtc: Missing synchronization in t02025

e3847e1

Noticed while porting varnishcache#3998 to the 6.0 branch with a varnishtest more sensitive to timing.

Copy rapid reset parameters to the h2 session

e12a088

This will allow per-session adjustments and also significantly lower the risk of inconsistent calculations in the rate limit code during parameter changes. Ref varnishcache#3996

Add vmod_h2 to control rapid_reset parameters per session

b0301c5

Start with a reasonable default for h2_rapid_reset_limit

498adbd

as agreed on IRC.

Adjust test case to previous commit

f1c044c

(sorry)

Flexelinting

eb8aed1

we can not make the parameter const because API.

dridi added b=enhancement c=H/2 r=7.4 labels Oct 18, 2023

dridi mentioned this pull request Oct 18, 2023

Handling of CVE-2023-44487 / HTTP2 Rapid Reset #3996

Closed

nigoroll approved these changes Oct 23, 2023

View reviewed changes

dridi merged commit eb8aed1 into varnishcache:7.4 Oct 23, 2023
1 check passed

dridi deleted the rapid_reset_7.4 branch October 23, 2023 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

h2: Rapid reset mitigations (7.4) #4009

h2: Rapid reset mitigations (7.4) #4009

dridi commented Oct 18, 2023

nigoroll left a comment

dridi commented Oct 23, 2023

h2: Rapid reset mitigations (7.4) #4009

h2: Rapid reset mitigations (7.4) #4009

Conversation

dridi commented Oct 18, 2023

nigoroll left a comment

Choose a reason for hiding this comment

dridi commented Oct 23, 2023