-
Notifications
You must be signed in to change notification settings - Fork 544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address post-merge feedback for #6471 #6505
Conversation
require.NoError(t, querierLoop.CloseSend()) | ||
cancel() | ||
require.NoError(t, util.CloseAndExhaust[*schedulerpb.SchedulerToQuerier](querierLoop)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: moving the call to cancel()
before closing the stream is required because otherwise CloseAndExhaust()
will block: the stream has not ended, and the server does not know to close the stream without the cancellation. (Previously, CloseSend()
returned immediately, and the call to cancel()
was what made the test pass.)
6dab7c3
to
2635d15
Compare
…e the stream synchronously after encountering an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes makes sense. I'm a bit worried that we're adding quite tricky logic which will be difficult to debug in production in case of issues (e.g. if a goroutine blocks, if a context leaks, etc...).
return nil | ||
go func() { | ||
for { | ||
if _, err := stream.Recv(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After 3 seconds the function will return but this goroutine will continue to consume in the background. What if the consume blocks forever? We're going to leak goroutines. So question is: do we ensure that the stream context is canceled on the client side (this side) so that the Recv()
will unblock and goroutine terminate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For almost all cases where we now use CloseAndExhaust
, the streaming call is made in the context of an incoming HTTP or gRPC request (eg. querier evaluating a query), and that context will be cancelled when the response is returned. If for some reason that context isn't already being cancelled today, then we're potentially leaking the stream, but then CloseAndExhaust
should help mitigate that by exhausting the stream and then terminating.
The only exception to this case is frontendSchedulerWorker
which is used by query-frontends to queue requests to query-schedulers. In this case, the context will always be cancelled after CloseAndExhaust
returns.
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout. (cherry picked from commit 0103407)
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout. (cherry picked from commit 0103407) Co-authored-by: Charles Korn <[email protected]>
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout.
What this PR does
This PR addresses some feedback from #6471:
level.Debug(spanLogger).Log(...)
pattern withspanLogger.DebugLog(...)
: c3c1437CloseSend()
on client streaming gRPC streams: f04bb75CloseAndExhaust
never blocks forever: 2635d15It also fixes an issue where the wrong error value was logged while shutting down the query-scheduler communication loop in query-frontends: df78279
I'd suggest reviewing each commit individually.
Which issue(s) this PR fixes or relates to
#6471
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]