Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h2: Make requests walk away from the waiting list #3835

Closed
wants to merge 5 commits into from

Conversation

dridi
Copy link
Member

@dridi dridi commented Aug 12, 2022

When a session goes to the cleanup step, it will indefinitely try to
kill all the streams until those stuck in a waiting list reembark a
thread. This can lead to clients timing out, closing the h2 session,
and opening a new h2 session for the exact same requests. This can
artifically inflate the number of dead tasks in a waiting list with
generous backend timeouts.

To avoid that, an h2 stream will keep a list of requests currently
in a waiting list and make them walk away quietly during cleanup.
An h2 stream would normally only have one request running at any
time but this may no longer be true with modules like vmod_pesi.


See https://varnish-cache.org/lists/pipermail/varnish-misc/2022-July/027160.html

This is equivalent to what we have in Varnish Enterprise that I
mentioned in this thread. It does not solve the problem described
in the email thread, because we have no path to walk away from
an HTTP/1 request (pun intended).

I believe we could potentially solve the HTTP/1 case if we involved
a waiter that would only care about closed connections, ignoring
incoming data that I believe trigger a wake up today.

I am convinced that I managed a better design than the original
attempt, and I'm looking forward to reviews. Unfortunately, I will
miss the next two bugwash meetings.

@nigoroll
Copy link
Member

nigoroll commented Aug 29, 2022

generally in favor, have not looked closer yet, still :(

bin/varnishd/cache/cache_hash.c Outdated Show resolved Hide resolved
AZ(req->stale_oc);
VSLb_ts_req(req, "Waitinglist", W_TIM_real(wrk));
VSLb(req->vsl, SLT_Error, "The client is going away");
if (req->esi_level > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why should we only go to VCLFAIL for subrequests? This implies that HSH_WalkAway will only ever be called as part of a broken connection, and we have to close. Could it not be useful also for a possible future waitinglist timeout feature, and we need to produce a 503?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, we can add a dedicated HSH_TIMEOUT. Otherwise I can use the same argument for the error message preventing us to use this branch for future timeouts. I think we will be fine if we dedicated HSH_WALKAWAY to the case where the client went away while a request was in a waiting list.

For the timeout case, we probably want to have different semantics than plainly ending the transaction. Maybe a waiting list timeout would be a deferred hash_ignore_busy?

I must admit I don't remember precisely why we did that downstream. I think we needed this to interrupt ESI delivery and prevent it from recursing as the client is gone anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's actually a comment a few lines below explaining why we fail sub-requests:

/* NB: Interrupt delivery upwards to avoid engaging
 * new sub-request tasks.
 */

Granted, it doesn't give many details.

@@ -466,11 +466,13 @@ struct req {
void *transport_priv;

VTAILQ_ENTRY(req) w_list;
VTAILQ_ENTRY(req) t_list;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like quite the waste of memory adding 24 bytes for the list entry and the objhead reference flag on struct req. I think I prefer the original way of doing the workspace allocation, perhaps replaced/expanded with a fallback malloc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had considered this, and decided it would be simpler, especially with the generalization of the feature to include HTTP/1 too.

I can ponder something similar to what we did downstream but more general (not tied to h2 code).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered why I didn't use the workspace: it is reserved for vary processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got some results on the generalization side, but didn't manage to without allocation space somewhere. Since workspace is reserved during waiting list operations, it is either a heap allocation or preallocated storage in struct req (current solution). Another possibility could be to opportunistically allocate an "entry" for transport tracking on the workspace before the vary reservation, if the top transport has waiting list methods.

To be continued.

Copy link
Member Author

@dridi dridi Sep 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering that eventually both h2 and http/1 (100% of client top transports) will track requests and sub-requests in waiting lists, it would be much simpler to go with the current design and have the tracking state directly in struct req.


Lck_Lock(&r2->h2sess->sess->mtx);
AZ(req->transport_objhead);
req->transport_objhead = oh;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable req->transport_objhead should be renamed req->toptransport_objhead, to reflect that it is "owned" by the top transport. As it is it would be trouble if e.g. esi_deliver wanted to use it. Same argument goes for t_list.

Copy link
Member Author

@dridi dridi Sep 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, maybe we could move req::t_list and req::transport_objhead to reqtop::w_list and reqtop::hash_objhead?

edit: obviously, not.

@dridi
Copy link
Member Author

dridi commented Sep 15, 2022

Rebased against current master to solve the trivial merge conflict. I switched to more explicit tansport::top_*() methods as suggested. Not addressing other review items, yet.

It comes with new functions hsh_waitlist() and hsh_reembark() that
centralize the logic. The top_waitlist() and top_reembark() methods
are meant for client-facing transport to register when requests and
sub-requests alike enter and leave a waiting list.
Since both vtr_reembark_f and vtr_waitlist_f have the same signature, we
can use vtr_waitlist_f to convey that those callbacks relate to waiting
lists, and the name of the methods to convey their roles.
When there's no rush, a request can simply walk away from the queue.

The concept is simple: when a factor (pun intended) other than the
backend fetch requires the request to leave the queue, it can just
walk away. It short-circuits the usual rush exit.

We can repurpose the dead counter busy_killed to count those events,
and get rid of the dead enum entry HSH_CONTINUE.
But operate on a different objhead reference guarded by the h2_sess lock
instead. This will allow coordination with the h2 session thread.
When a session goes to the cleanup step, it will indefinitely try to
kill all the streams until those stuck in a waiting list reembark a
thread. This can lead to clients timing out, closing the h2 session,
and opening a new h2 session for the exact same requests. This can
artifically inflate the number of dead tasks in a waiting list with
generous backend timeouts.

To avoid that, an h2 stream will keep a list of requests currently
in a waiting list and make them walk away quietly during cleanup.
An h2 stream would normally only have one request running at any
time but this may no longer be true with modules like vmod_pesi.

Better diff with the --ignore-all-space option.
@dridi
Copy link
Member Author

dridi commented Sep 28, 2023

Closing for now because #3992 will pull the objhead rug pretty hard.

@dridi dridi closed this Sep 28, 2023
@dridi dridi deleted the h2_walkaway branch September 28, 2023 06:59
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 2, 2023
The objcore is officially referenced by a request when it leaves its
waiting list during a rush. This allows migrating a waiting list from
one objcore to the next without the need to touch reference counters.

The hash_oc field will still have to be updated. This field must be
be kept to navigate from a request to its waiting list to implement
walking away from a waiting list.

Refs varnishcache#3835
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 3, 2023
The objcore is officially referenced by a request when it leaves its
waiting list during a rush. This allows migrating a waiting list from
one objcore to the next without the need to touch reference counters.

The hash_oc field will still have to be updated. This field must be
be kept to navigate from a request to its waiting list to implement
walking away from a waiting list.

Refs varnishcache#3835
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 13, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented.

Refs varnishcache#3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 17, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs varnishcache#3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699
dridi added a commit that referenced this pull request Oct 18, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs #3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 18, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs varnishcache#3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699

Conflicts:
	bin/varnishd/cache/cache_vrt_vcl.c
	bin/varnishd/cache/cache_vcl_vrt.c
	bin/varnishd/mgt/mgt_param_bits.c
	include/tbl/params.h
dridi added a commit that referenced this pull request Oct 18, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs #3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699

Conflicts:
	bin/varnishd/cache/cache_vrt_vcl.c
	bin/varnishd/cache/cache_vcl_vrt.c
	bin/varnishd/mgt/mgt_param_bits.c
	include/tbl/params.h
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 18, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs varnishcache#3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699
dridi added a commit to dridi/varnish-cache that referenced this pull request Oct 18, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs varnishcache#3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699

Conflicts:
	include/tbl/feature_bits.h
dridi added a commit that referenced this pull request Oct 24, 2023
Once a client is reportedly gone, processing its VCL task(s) is just a
waste of resources. The execution of client-facing VCL is intercepted
and an artificial return(fail) is returned in that scenario.

Thanks to the introduction of the universal return(fail) proper error
handling and resource tear down is already in place, which makes this
change safe modulus unknown bugs. This adds a circuit breaker anywhere
in the client state machine where there is VCL execution.

A new Reset time stamp is logged to convey when a task does not complete
because the client is gone. This is a good complement to the walk away
feature and its original circuit breaker for the waiting list, but this
has not been integrated yet.

While the request is technically failed, it won't increase the vcl_fail
counter, and a new req_reset counter is incremented. This new behavior
is guarded by a new vcl_req_reset feature flag, enabled by default.

Refs #3835
Refs 61a15cb
Refs e5efc2c
Refs ba54dc9
Refs 6f50a00
Refs b881699

Conflicts:
	include/tbl/feature_bits.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants