-
-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mov quantile #761
Mov quantile #761
Conversation
@gavv |
Thanks for PR! Me or @adrianopol will do review in the upcoming days. Regarding 0..100 vs 0..1: I guess we should either name it MovPercentile and use 0..100 or name it MovQuantile and use 0..1. In both cases, we need to use float or double for this number. |
If we use float or double for taking the percentile or quantile for index calculation at one point we will have to type cast window length from size_t to double or percentile to size_t since the index and window_length is in size_t. Should this approach be okay considering precision loss cases for the aforementioned type casting. Can we take k instead of a percentile or quantile which can be the kth smallest element user want from the window length like in the document. Although not sure if this approach is in sync with the requirement. |
Just to be sure I understand the question correctly. You're asking what pair of parameters to use, among these three alternatives, right?
I think in code that will use MovPercentile/MovQuantile, the original parameters that we have would be window_size and percentile/quantile. E.g. we may need to compute 95th percentile of inter-arrival jitter for last 300 packets. We'll likely have numbers 300 and 95 (or 0.95) in config. We could compute K from those numbers, though it would be convenient if MovPercentile/MovQuantile would compute it for us. As for rounding errors, I believe that if Q is 32-bit float in range [0; 1), then we can compute K precisely for any L in range [0; 2^23), because floats in range [0; 1) can be mapped one-to-one to integers in range [0; 2^23). If that's true, it looks more than enough and we don't have to bother about rounding errors. I suggest to stick with MovQuantile approach with parameters L + Q, just to be sure we use floats or doubles in range [0; 1), since that range has highest precision (24 bits for 32-bit float, 48 bits for 64-bit float). If you think floats are not enough, let's use doubles. |
Sure I understood will make the changes |
Awesome, thanks a lot for PR and sorry for delay! All suggestions I have are cosmetic, so I just committed them by myself: 4e6b413 (copyright year, make methods private, use const, and a few renames). I also added a simple stress test with random window size and values: 0bc19da, seems to work fine. One small question about this snippet: if (win_filled_) {
...
min_heap_index_ = win_len_ - 1;
max_heap_index_ = 0; Are there situations when |
At the very start when there are not enough values to fill the window at that point the values will be different. Both of them will start from heap root and max_heap_index_ will decrement and min_heap_index_ will increment. Once the window is filled then they will take on the mentioned values. This method will still give us quantile even if we don't have all the values in the window. Thanks for merging the PR |
Yeah, I mean, by the time when we enter Or am I missing something, and and it's possible that |
That is correct and the assignments are redundant. Sorry i missed that. |
when win_filled_ that values max_heap_index_ will be 0 and min_heap_index_ is win_len - 1 there can't be another case. |
Got it, thanks. No worries, I just wanted to ensure I'm not missing something important. |
Fix #751
This PR adds the moving quantile algorithm based on the articles mentioned in the issue. It follows the quantile approximation strategy when the rolling window do not have enough data. This PR adds both the code logic under mov_quantile.h as well as the required unit test under test_mov_quantile.cpp