-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add query sampling for checking #7
base: db_obs_m3coord_cache
Are you sure you want to change the base?
Conversation
// Ratio of queries we make a check for | ||
DefaultCheckSampleRate float64 = 0.0 | ||
// Threshold in % to determine if there's difference in results (1 means 1% diff) | ||
DefaultComparePercentThreshold float64 = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1% difference cross all buckets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it checks the final aggregated result, so it's a 1% difference in the final result
) | ||
|
||
// Compares results a, b to the specified percent threshold | ||
// Results should be vectors | ||
func compareResults(a, b *promql.Result, threshold float64) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What may cause the value different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there can be some slight floating point precision errors especially with comparison, so I thought a % threshold would be best
@@ -177,6 +251,27 @@ func (h *readHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
return | |||
} | |||
|
|||
// Rulemanager results are vector values (list of metric + value) | |||
// Take a random number and check if under rate so we check a proportion of the queries | |||
if rand.Float64() < float64(h.queryCheckConfig.CheckSampleRate) && res.Value.Type() == parser.ValueTypeVector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a big deal, but do we need float64, why not float32?
if result.Err != nil { | ||
h.logger.Error("Comparison query failed to execute") | ||
} else { | ||
if result != nil && !compareResults(res, result, h.queryCheckConfig.ComparePercentThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you know this res
if queried from cache instead of m3db if cache miss the hit?
} | ||
defer query.Close() | ||
// Set context so we can default to M3DB later on | ||
result := query.Exec(context.WithValue(ctx, "UseM3DB", true)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename name to m3dbQueryResult to avoid confusion between res
and result
@@ -177,6 +251,27 @@ func (h *readHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { | |||
return | |||
} | |||
|
|||
// Rulemanager results are vector values (list of metric + value) | |||
// Take a random number and check if under rate so we check a proportion of the queries | |||
if rand.Float64() < float64(h.queryCheckConfig.CheckSampleRate) && res.Value.Type() == parser.ValueTypeVector { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this check into redis_cache.go
? Put this a this section code into root HTTP method is exposing the storage/cache knowledge to upper layer which is not good.
What changes are proposed in this pull request?
Implementing a method to periodically check results against the expected M3DB result and record metrics (+refactor of config).
How is this tested?
Deployed to
dev-aws-us-west-1
, verified from emitted metrics that checks are being performed and are accurate (also verified that mismatches are registered). Other performance metrics aren't affected.How is this feature monitored?
Code Review
For information about the code review process, e.g. how to find a reviewer or how to ping non-responsive reviewers, check the contents of go/code-review Confluence page.
Approvals
Other than the mandatory approvers enforced by the OWNER file framework (http://go/owners), this PR
requires at least one approval from another engineer.
[NEW] Shiproom
Platform & Compute Fabric:
Changes should be tracked by an approved "material change." Multiple PRs may be tracked by a single material change.
<CHANGE_ID>
See http://go/platshiproomwiki for instructions and use http://go/lightcm-template to evaluate risk. Ask questions in #platform-shiproom.
Runtime changes:
dbr-branch-x.x
branch or has a maintenancedbr-branch-x.x
label)Please refer Runtime Shiproom Wiki: http://go/runtimeshiproomwiki
Security implications
This section is intended for the reviewers of this PR
Please, make sure you consider the content of "What are the responsibilities of code reviewers?" section of go/pr-security-review