You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An edge case has come up where two things conspired to produce duplicates in the model:
Some users' data have different session_ids at the same time (we think because of a race condition between tabs)
Latency between device and collector results in exactly the same derived_tstamp for these events
These two factors mean that a user has two sessions with exactly the same start_tstamp, but different domain_sessionid's - and these happen to be the first tstamps for the user.
This produces duplicates in the users table when we join on start_tstamp:
The same issue may exist when we join on end_tstamp for aggregates.
This seems very rare, but we should introduce some means of breaking a tie in the case where derived_tstamps happen to evaluate to exactly the same thing.
just to add another data point:
I just ran the model on 100 days of our data and this happened with two user_ids. Both have two session ids, but exactly the same start_time.
Raised the same issue on the dataform model. I have included a proposed solution there which I think we should also be able to use on the SQL runner version.
An edge case has come up where two things conspired to produce duplicates in the model:
These two factors mean that a user has two sessions with exactly the same start_tstamp, but different domain_sessionid's - and these happen to be the first tstamps for the user.
This produces duplicates in the users table when we join on start_tstamp:
data-models/web/v1/redshift/sql-runner/sql/standard/04-users/01-main/06-users.sql
Line 86 in a38e76b
The same issue may exist when we join on end_tstamp for aggregates.
This seems very rare, but we should introduce some means of breaking a tie in the case where derived_tstamps happen to evaluate to exactly the same thing.
(first reported on ZD ticket 27522)
The text was updated successfully, but these errors were encountered: