-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue where successful streaming gRPC requests are incorrectly reported as cancelled #6471
Fix issue where successful streaming gRPC requests are incorrectly reported as cancelled #6471
Conversation
af9dd49
to
af430e4
Compare
…return no series are reported as cancelled rather than successful in traces and metrics.
af430e4
to
795f704
Compare
Putting back to draft to investigate flaky integration test introduced in this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll trust the test. The "CloseAndExhaust" also looks good.
LGTM.
Co-authored-by: Oleg Zaytsev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me. I'm just wondering if by draining the stream we may add extra latency, but on the other side the gRPC doc you linked is very clear that if we don't do it resources may be leaked.
Will wait for you to spot the issue causing flaky integration test (whether it's just an issue in the test or a more serious one).
ebd9ee0
to
11ddaa9
Compare
532caff
to
25f55a9
Compare
…ming goroutine closing the gRPC stream.
…re-gateway chunk stream reader
…streaming goroutine closing the gRPC stream.
25f55a9
to
91fd892
Compare
By the time we get to draining the stream, we've either:
Only in the last case would I expect any noticeable latency difference, and this should be fairly quick - worst case scenario, the client code closes the stream, the gRPC library sends a cancellation notification to the server and then the client has to wait for the server to acknowledge the cancellation and close the stream, or times out. |
I've fixed the flaky test, which was due to flaky behaviour while streaming chunks - there was a race between:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good work. The changes make sense to me. I left a couple of nits that I'm going to self-commit so we merge this PR today and will be included in the next week release.
For a follow up PR: I would add a failling rule to forbid the usage of CloseSend()
but use CloseAndExhaust()
instead.
Merging it to get it included in tomorrow's weekly release. |
I need to correct myself here: when the client closes the stream (ie. calls I've checked all of the places that are now using The main scenario I'm concerned about is store-gateway query requests that are abandoned due to hitting a query limit, which I will address in #6505. (Ingester query requests are not susceptible to the same behaviour as these use |
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout.
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout. (cherry picked from commit 0103407)
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout. (cherry picked from commit 0103407) Co-authored-by: Charles Korn <[email protected]>
…ported as cancelled (#6471) * Add failing test. * Fix issue where query requests that stream chunks from ingesters but return no series are reported as cancelled rather than successful in traces and metrics. * Fix other instances of issue. * Add changelog entry. * Address PR feedback Co-authored-by: Oleg Zaytsev <[email protected]> * Add more details to assertions * Extract method * Fix race between query evaluation finishing and ingester chunks streaming goroutine closing the gRPC stream. * Refactor SeriesChunksStreamReader to follow same structure as for store-gateway chunk stream reader * Update changelog entry. * Fix race between query evaluation finishing and store-gateway chunks streaming goroutine closing the gRPC stream. * Apply suggestions from code review --------- Co-authored-by: Oleg Zaytsev <[email protected]> Co-authored-by: Marco Pracucci <[email protected]>
* Fix typo in comment. * Use more efficient `SpanLogger.DebugLog()` method in more places. * Add linting rule to forbid the direct use of CloseSend(). * Add documentation to CloseAndExhaust. * Update changelog entry. * Report correct error when closing loop fails * Avoid reading entire store-gateway query stream response when one or more calls fail. * Silence linting warnings. * Ensure CloseAndExhaust never blocks forever. * Fix race in ingester streaming tests: there's no guarantee we'll close the stream synchronously after encountering an error * Address PR feedback: reduce timeout.
What this PR does
This PR fixes multiple related issues:
4453e0a separately refactors the ingester stream reader (
SeriesChunksStreamReader
) to follow the same structure as the store-gateway stream reader (storeGatewayStreamReader
).I'd suggest reviewing each commit individually.
Which issue(s) this PR fixes or relates to
(none)
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]