-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling the Fresh Daily export in GA4 breaks base_ga4__events model #344
Comments
Thanks for reporting this. I don't think any of the regular contributors to this package have access to the fresh export so we're going to have to rely on you for both testing and logic. Looking at the documentation, we have this:
So, based on the documentation, you are correct about duplicate events. I think the ideal behavior here is to detect the presence of the Fresh Daily export, get the max timestamp value and then use streaming rows for values above that timestamp. I think the place to achieve this is in the I think the short-term solution is to exclude the Fresh Daily data so that the package works right away and let people choose to customize the package if they want to use Fresh Daily instead of Streaming. @arifsenolwp, how long do Fresh Daily partitions persist? Do they delete when moved to Daily partitions? |
The Fresh Daily export is not similar like the streaming one but the Daily one. I mean it is kind of Daily one which is updated every 60 minutes and always persists. It never gets delete just like the Daily one. In this case I think having an option that choose whether Daily or Fresh one can be an option. The streaming one's been deleted every day ( once the Daily one becomes completed ) so there is no duplication issue after 2-3 days of that day. So probably enabling just the Fresh one is viable for most of the users if they use this dbt package because Daily one and Streaming one are linked ( intraday/streaming one will be deleted only if you enable the Daily one ) and that is ok with our models here. What I mean is, there may be a yml config on which data table the package uses and which data table it should exclude. Daily and Steaming combination works but there is no Fresh and Streaming combination really. |
Ignoring
@arifsenolwp you mentioned some differences in the event params? What differences are there? |
There are 3 options in GA4 to export data to Bigquery -> Daily, Fresh Daily and Streaming.
Enabling Daily creates sharded tables called events_*
Enabling Fresh Daily creates sharded tables called events_fresh_*
Enabling Streaming creates sharded tables called events_intraday_*
base_ga4__events model tries to parse the table names under the same dataset provided in config files and it does not cover events_fresh_* ones.
This condition fails with Bad int64 value: fresh_xx ;
Since we can not change the destination dataset for the Fresh Daily export, enabling it breaks the model.
Either we can include the fresh_ keyword in parsing logic or use variables to decide what to replace , include or exclude.
Including fresh_ keyword in parsing logic may not be a good idea though, since the event_params seems to be different for some rows with intraday_ and fresh_ tables, this may cause duplication issues.
The text was updated successfully, but these errors were encountered: