-
Notifications
You must be signed in to change notification settings - Fork 701
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding grpc_code
label to prometheus grpc_server_handling_seconds
metric
#606
Comments
Related to grpc-ecosystem/go-grpc-prometheus#75, but reviving since that repository is now deprecated. |
Sounds like the consensus in the other thread is that adding |
Yea. I think with native histograms it would be easier decision. Wonder if it makes sense to be early adopter of native histograms here for this reason cc @beorn7 Otherwise we could add opt-in grpc_code, that's also fine. |
@johanbrandhorst I'm not sure I agree with that conclusion. I wouldn't say any consensus was yet achieved. There are still community members requesting it and the "consensus" was merely the response of the maintainers. But, in any case, why is it not possible to allow the client to choose whether to include it or not? For me the increase in cardinality is well worth the value I get from it. I should be able to enable this label if I want and understand that there is a tradeoff. For example, I should be able to var serverMetrics *grpc_prometheus.ServerMetrics
serverMetrics.EnableHandlingTimeHistogram(
...,
WithStatusCodeLabel(),
) If the This work ☝️ can be in addition to switching to a native histogram so as to further reduce the impact/burden if you want to enable the If I submit a PR, will you guys consider merging it? If so, I'd be happy to submit one this week. |
It would be great to offer native histogram support, but I would recommend to also support classic histograms in parallel so that scraper that aren't ready for native histograms yet will still get a classic histogram. |
Hey! I'd also find this metrics useful because we're trying to measure our SLO burn rate and would like to set a success criteria based on success/failure and latency. If we use two separate counters, we'd have to double count messages that are both late, and errors. I think it would be best to leave it up to the end users to decide whether they want to keep this label by dropping the labels when they scrape them. |
I'm using the prometheus middleware to measure the RPC endpoint latency for my gRPC service. I've enabled the ServerHandlingTimeHistogram.
I noticed that the
grpc_code
label is not included in the observed_bucket
metrics (see this line of code). Consequently, I cannot effectively measure my p99 request latency for requests served successfully (e.g. wheregrpc_code="OK"
). I'd like to set some SLOs based on percentiles of successful requests, but this middleware doesn't currently allow me to do that as far as I can tell.. Was this design choice intentional, and if so why?Would you be open to me submitting a PR to add it?
The text was updated successfully, but these errors were encountered: