-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix returning nulls from /data/views/query if no data in BigQuery #177
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #177 +/- ##
===================================================
- Coverage 28.00000% 27.82875% -0.17125%
===================================================
Files 4 4
Lines 325 327 +2
===================================================
Hits 91 91
- Misses 223 225 +2
Partials 11 11
Continue to review full report in Codecov by Sentry.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, but I think we should never return null
for the basic viewership fields (viewcount and playtime)
And thanks for picking this one up :)
ViewCount bigquery.NullInt64 `bigquery:"view_count"` | ||
PlaytimeMins bigquery.NullFloat64 `bigquery:"playtime_mins"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea how these can ever be null?
Not that I think we shouldn't make this fix, but these are the most basic fields that should always be there, defaulting to 0
, so it kinda looks like a data pipeline problem.
ViewCount: toInt64Ptr(row.ViewCount, spec.Detailed), | ||
PlaytimeMins: toFloat64Ptr(row.PlaytimeMins, spec.Detailed), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 fields are actually not detailed, they belong to the basic viewership metrics and should always be in the response. So they are always "asked".
ViewCount: toInt64Ptr(row.ViewCount, spec.Detailed), | |
PlaytimeMins: toFloat64Ptr(row.PlaytimeMins, spec.Detailed), | |
ViewCount: toInt64Ptr(row.ViewCount, true), | |
PlaytimeMins: toFloat64Ptr(row.PlaytimeMins, true), |
(otherwise if you call the API without ?detailed=true
you will get NO metrics on the response!)
ViewCount: toInt64Ptr(row.ViewCount, spec.Detailed), | ||
PlaytimeMins: toFloat64Ptr(row.PlaytimeMins, spec.Detailed), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking further about this, we probably don't want to return null
on the API responses for these fields, as we shouldn't leak the datasource bug to the end-user. So IMO these fields should always be present and instead we return 0
if they are null
, WDYT? We could fix this directly on the SQL queries, like adding a COALESCE
on those fields, or we could just have a logic here to check v == null ? 0 : v
.
I think you're right, we should return |
Before this PR, if BigQuery returns null in some of the columns, then a call to
/data/views/query
returns an error:After this change, it will return the following:
Related to https://linear.app/livepeer/issue/PS-191/grafana-engagement-data-issues, though it does no solve the root cause of that issue.