Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: expose query stats in Grafana #10192

Open
dimitarvdimitrov opened this issue Dec 9, 2024 · 5 comments
Open

Idea: expose query stats in Grafana #10192

dimitarvdimitrov opened this issue Dec 9, 2024 · 5 comments

Comments

@dimitarvdimitrov
Copy link
Contributor

dimitarvdimitrov commented Dec 9, 2024

What is the problem you are trying to solve?

The query-frontend already keeps track of aggregated query stats. Grafana has a "Stats" tab in the "Query Inspector" view. The graphite datasource already uses that tab to feed some stats back to the user. We should expose (most of) the existing query stats in that pane.

These stats helps users fine-tune queries or spot underscaled Mimir clusters by looking at shardability, volume of data touched, time spent in the queue, and time spent encoding the result.

Existing query stats

caller=handler.go:383
component=query-frontend
encode_time_seconds=4.0957e-05
estimated_series_count=31950
fetched_chunk_bytes=23791904
fetched_chunks_count=329964
fetched_index_bytes=28647863
fetched_series_count=29906
header_cache_control=
header_x_forwarded_for="10.201.114.35, 10.63.49.131"
length=24h5m0s
level=info
method=POST
msg="query stats"
param_query="REDACTED"
param_time=2024-12-03T08:45:11Z
path=/prometheus/api/v1/query
query_wall_time_seconds=6.259888235
queue_time_seconds=1.9816e-05
response_size_bytes=118
response_time=6.267380946s
results_cache_hit_bytes=0
results_cache_miss_bytes=0
route_name=prometheus_api_v1_query
sharded_queries=0
split_queries=0
status=success
status_code=200
time_since_max_time=1m0.763386437s
time_since_min_time=24h6m0.763386437s
ts=2024-12-03T08:45:18.030817629Z
user=10428
user_agent=Opencost/1.100.0 

Which solution do you envision (roughly)?

Format

The query-frontend results each query stat in the Server-Timing header. The same header is already used for some statistics. We can publish all relevant query stats to the same header. For example

Server-Timing: encode;dur=0.041, series_count;c=31950, chunk_bytes;c=23791904, chunks_count;c=329964, index_bytes;c=28647863, series_fetched;c=29906, wall_time;dur=6259.888, queue;dur=0.020, response_size;c=118, response_time;dur=6267.381, cache_hit;c=0, cache_miss;c=0, sharded;c=0, split;c=0

Considerations

The header above is 293 characters. For a lot of queries the header will comprise most of the HTTP response. We don't want to unnecessarily amplify data transfer at the query-frontend. For that I propose to include an optional request header. The query-frontend only includes these stats when requested to.

X-Mimir-Response-Query-Stats: true

Stability

This is intended to be ultimately interpreted by humans. As such we can change the set of stats we expose or rename them without notice.

Have you considered any alternatives?

Use the "Mimir / Slow queries" dashboard to find query stats.

Any additional context to share?

  • Grafana also needs to implement support for this. It probably makes sense to do this after we have a released version of it in Mimir.
  • This is a feature request. See linked internal issue.

How long do you think this would take to be developed?

Small (<= 1 month dev)

What are the documentation dependencies?

Update the HTTP reference to mention the additional header and the potential stats and the (lack of) stability guarantee.

Proposer?

No response

@sureshkrishnan-v
Copy link

Hi, I’d like to work on this issue as a first-time contributor—could you confirm if it’s available and share any guidelines and help me coz I am new to open source contributions?

@dimitarvdimitrov
Copy link
Contributor Author

thanks for offering help @sureshkrishnan-v!! It's a somewhat fresh issue and I'd like to wait until next week to hear if any has any objections or better ideas regarding this. If all goes well, I can share a rough idea for how I think this should be done, plus any steps before/after writing the code, and we can take it from there.

@sureshkrishnan-v
Copy link

ok thank you i will wait till tomorrow

@sureshkrishnan-v
Copy link

hello, sir any update ??

@dimitarvdimitrov
Copy link
Contributor Author

apologies, I was away around the holidays season. Can this wait for another week?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants