Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Export #317

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

User Export #317

wants to merge 13 commits into from

Conversation

dgitis
Copy link
Collaborator

@dgitis dgitis commented Apr 13, 2024

Description & motivation

Work-in-progress resolving #285.

The goal of this PR is to support the pseudonymous_user_id and user_id tables.

My initial thought is that we should keep both the package's current user tables and new ones derived from the new user export options in GA4. The reason for this is to support both old and new installations.

README modifications are not done, but I expect we will want to add a new section on disabling user models that explains the differences between our various models and how to use them.

What defaults should we have for +enable on these tables?

This PR supports audiences fields from the export. While working with a test site that has implemented audiences, the data doesn't seem to be very useful unless its ID field joins with Google Ads audience exports. Despite this, I think we should leave this in because it needs to be enabled by a variable.

The user_properties fields were done without sample data. Naming and logic will likely need to be updated.

Multi-site is not currently supported.

We also might want to move the logic in base_ga4___* models to stg_ga4__* models.

I also need to review package naming conventions to ensure data, like geo data, is consistent with elsewhere in the package.

To-do:

  • data marts
  • review package naming conventions to ensure consistency
  • user_property logic
  • readme
  • multi-site

Here are the docs for the source tables.

Checklist

  • I have verified that these changes work locally
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have run dbt test and python -m pytest . to validate existing tests

@dgitis dgitis marked this pull request as ready for review July 20, 2024 23:13
@dgitis
Copy link
Collaborator Author

dgitis commented Jul 20, 2024

@adamribaudo-velir, while I've verified that these work locally, I've only verified with a minimal amount of data. Multi-site changes were verified by setting the same property ID several times under property_ids. Obviously this is not ideal.

It would be best to verify this using a proper multi-site installation. I don't have any current clients, but I can ask some past clients.

I did not write any tests. dbt recommends testing certain things. The only logic worth testing , the multi-site macro, is not testable by the new framework.

I defaulted these new tables to +enable: false.

I also ran base_ga4__pseudonymous_users in to stg_ga4__client_ids. I renamed at the staging model because that's where the client IDs are generated.

@@ -6,41 +6,69 @@
{% if not should_full_refresh() %}
{# If incremental, then use static_incremental_days variable to find earliest shard to copy #}
{%- set earliest_shard_to_retrieve = (modules.datetime.date.today() - modules.datetime.timedelta(days=var('static_incremental_days')))|string|replace("-", "")|int -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgitis I think you've introduced some duplicate code in this macro?

@adamribaudo-velir
Copy link
Collaborator

I'm on board with the approach, but unable to run it due to this error. Couldn't find an obvious issue. I can run the main branch fine with the same project configuration.

20:08:18  Completed with 1 error and 0 warnings:
20:08:18
20:08:18    Compilation Error in model base_ga4__events (models\staging\base\base_ga4__events.sql)
  can only concatenate str (not "int") to str

  > in macro combine_property_data (macros\combine_property_data.sql)
  > called by macro run_hooks (macros\materializations\hooks.sql)
  > called by macro materialization_incremental_bigquery (macros\materializations\incremental.sql)
  > called by model base_ga4__events (models\staging\base\base_ga4__events.sql)

@dgitis
Copy link
Collaborator Author

dgitis commented Jul 21, 2024

I fixed the duplicate code issues which seems to have fixed the error, not sure why though, and also updated a test that worked on my very small test site but wouldn't work on a larger site.

@adamribaudo-velir
Copy link
Collaborator

I get an error when building the new staging models:

(python312env) C:\GitHub\velir-ga4-test-project>dbt run -m stg_ga4__client_keys
13:20:47  Running with dbt=1.8.7
13:20:48  Registered adapter: bigquery=1.8.3
13:20:48  Found 42 models, 29 data tests, 1 seed, 3 sources, 623 macros
13:20:48
13:20:50  Concurrency: 4 threads (target='dev')
13:20:50
13:20:50  1 of 1 START sql view model dbt_dev_aribaudo.stg_ga4__client_keys .............. [RUN]
13:20:51  BigQuery adapter: https://console.cloud.google.com/bigquery?project=velir-website-analytics&j=bq:US:660c1784-627b-4503-b4e5-a08e57355482&page=queryresults
13:20:51  1 of 1 ERROR creating sql view model dbt_dev_aribaudo.stg_ga4__client_keys ..... [ERROR in 1.17s]
13:20:51
13:20:51  Finished running 1 view model in 0 hours 0 minutes and 2.93 seconds (2.93s).
13:20:51
13:20:51  Completed with 1 error and 0 warnings:
13:20:51
13:20:51    Database Error in model stg_ga4__client_keys (models\staging\stg_ga4__client_keys.sql)
  Syntax error: Expected ")" but got identifier "user_property_name" at [11:82]
  compiled code at target\run\ga4\models\staging\stg_ga4__client_keys.sql
13:20:51
13:20:51  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

I have 1 user property defined:

      user_properties:
        - user_property_name: "random_number"
          value_type: "int_value"

I'd be ok approving a PR with JUST the base model data for user export so that package users at least have access to that data. We could sort out the most advanced stuff (pulling in properties, audiences) later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants