Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Choose specific percentiles from histogram #69

Open
podile opened this issue Feb 26, 2020 · 7 comments
Open

Choose specific percentiles from histogram #69

podile opened this issue Feb 26, 2020 · 7 comments

Comments

@podile
Copy link

podile commented Feb 26, 2020

Currently there is a vast number of metrics at node level as well as table level. Every metrics is important and has its own significance. However, capturing every thing will leads to space issues in case of large clusters. Though there are option to filter out some metrics, I don't see an option to capture only a specific percentiles from histogram metrics. Having this option saves a lot of space in case of bigger clusters with huge tables. For example, cassandra_table_coordinator_latency_seconds contains many number of metrics, out of them I may need only 99th percentile and 75 percentile metrics.

@zegelin
Copy link

zegelin commented Feb 26, 2020

Would this be a per-metric option? i.e, for histogram X, export the 75th and 99th percentiles, but for histogram Y, export everything?

Adding a global option to only export select percentiles for all histograms is relatively easy. Having a per-metic option is somewhat more difficult with the current project architecture.

@podile
Copy link
Author

podile commented Feb 27, 2020

Adding a global option makes sense to me. Limiting percentiles per-metric basis doesn't seem to be a wider use-case.
How difficult would it be for filtering metrics by labels. For example, table level latency metrics returning the latency of every operation, I want to capture these metrics only for the operation types "read" and "write", I don't want for any other operation.

@zegelin
Copy link

zegelin commented Feb 27, 2020

Right now there is no code path or infrastructure in place to filter metrics based on labels.

Currently, filtering is done at "registration time", which is when the exporter is notified that C* has created a new MBean (see the com.zegelin.cassandra.exporter.Harvester#registerMBean method, com/zegelin/cassandra/exporter/Harvester.java:160). This has the advantage that excluded metrics are completely ignored -- no additional work is done because a collector for that metric is never registered. But the disadvantage is that the exclusions are limited to the information available at registration time, which doesn't include labels (which are generated dynamically at collection time).

It probably wouldn't be terribly difficult to filter the stream of metrics as they're being collected, especially since Java Streams do in-fact have a filter(...) method!
See com.zegelin.cassandra.exporter.Harvester#collect for where the stream of metrics is generated.

Label-level filtering would have the side effect of filtering certain percentiles for different histograms.

@podile
Copy link
Author

podile commented Mar 2, 2020

Thank you, I will go through the code and crate a PR to provide option to only export select percentiles for all histograms.

@eperott
Copy link

eperott commented Aug 27, 2020

I'd be interested in this functionality as well.
@podile, any progress on this? How come the PR was closed?

@podile
Copy link
Author

podile commented Sep 1, 2020

The PR Percentile filter #73 contains the implementation is still Open

@eperott
Copy link

eperott commented Sep 3, 2020

Euhm, right! Not sure how I could miss that!

@zegelin , any plans to merge this any time soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants