Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling the Fresh Daily export in GA4 breaks base_ga4__events model #344

Open
arifsenolwp opened this issue Sep 24, 2024 · 3 comments
Open

Comments

@arifsenolwp
Copy link

arifsenolwp commented Sep 24, 2024

There are 3 options in GA4 to export data to Bigquery -> Daily, Fresh Daily and Streaming.

Enabling Daily creates sharded tables called events_*
Enabling Fresh Daily creates sharded tables called events_fresh_*
Enabling Streaming creates sharded tables called events_intraday_*

base_ga4__events model tries to parse the table names under the same dataset provided in config files and it does not cover events_fresh_* ones.

This condition fails with Bad int64 value: fresh_xx ;

where cast(left(replace(_table_suffix, 'intraday_', ''), 8) as int64) >= {{var('start_date')}}
    {% if is_incremental() %}
        and parse_date('%Y%m%d', left(replace(_table_suffix, 'intraday_', ''), 8)) in ({{ partitions_to_replace | join(',') }})
    {% endif %}

Since we can not change the destination dataset for the Fresh Daily export, enabling it breaks the model.

Either we can include the fresh_ keyword in parsing logic or use variables to decide what to replace , include or exclude.

Including fresh_ keyword in parsing logic may not be a good idea though, since the event_params seems to be different for some rows with intraday_ and fresh_ tables, this may cause duplication issues.

@arifsenolwp arifsenolwp changed the title Enabling the Fresh Daily export time in GA4 breaks base_ga4__events model Enabling the Fresh Daily export in GA4 breaks base_ga4__events model Sep 24, 2024
@dgitis
Copy link
Collaborator

dgitis commented Sep 25, 2024

Thanks for reporting this. I don't think any of the regular contributors to this package have access to the fresh export so we're going to have to rely on you for both testing and logic.

Looking at the documentation, we have this:

The Fresh Daily export is updated approximately every 60 minutes throughout the day.

So, based on the documentation, you are correct about duplicate events.

I think the ideal behavior here is to detect the presence of the Fresh Daily export, get the max timestamp value and then use streaming rows for values above that timestamp.

I think the place to achieve this is in the combine_property_data macro. I've got some fairly major changes to this macro in a draft PR that I actually can properly test now so I recommend waiting for me to get that across the finish line before working on implementing new logic in the macro.

I think the short-term solution is to exclude the Fresh Daily data so that the package works right away and let people choose to customize the package if they want to use Fresh Daily instead of Streaming.

@arifsenolwp, how long do Fresh Daily partitions persist? Do they delete when moved to Daily partitions?

@arifsenolwp
Copy link
Author

arifsenolwp commented Sep 26, 2024

The Fresh Daily export is not similar like the streaming one but the Daily one. I mean it is kind of Daily one which is updated every 60 minutes and always persists. It never gets delete just like the Daily one.

In this case I think having an option that choose whether Daily or Fresh one can be an option. The streaming one's been deleted every day ( once the Daily one becomes completed ) so there is no duplication issue after 2-3 days of that day.

So probably enabling just the Fresh one is viable for most of the users if they use this dbt package because Daily one and Streaming one are linked ( intraday/streaming one will be deleted only if you enable the Daily one ) and that is ok with our models here. What I mean is, there may be a yml config on which data table the package uses and which data table it should exclude. Daily and Steaming combination works but there is no Fresh and Streaming combination really.

@adamribaudo-velir
Copy link
Collaborator

Ignoring combine_property_data for a moment, is supporting "Fresh Daily" export as simple as updating base_ga4__events.sql to replace _fresh and _intraday to handle either case? In this line:

where cast(left(replace(_table_suffix, 'intraday_', ''), 8) as int64) >= {{var('start_date')}}

@arifsenolwp you mentioned some differences in the event params? What differences are there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants