Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] hard_deletes: new_record not working as expected #1436

Open
2 tasks done
jeremyyeo opened this issue Dec 13, 2024 · 0 comments
Open
2 tasks done

[Bug] hard_deletes: new_record not working as expected #1436

jeremyyeo opened this issue Dec 13, 2024 · 0 comments
Assignees
Labels
bug Something isn't working triage

Comments

@jeremyyeo
Copy link
Contributor

jeremyyeo commented Dec 13, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

The new hard_deletes: new_record config does not appear to be working on BigQuery.

Expected Behavior

The new hard_deletes: new_record config works as expected.

Steps To Reproduce

  1. Make sure snapshot doesn't exist yet.
drop table cse-sandbox-319708.dbt_jyeo.snappy
  1. dbt project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
  my_dbt_project:
    +materialized: table

# models/snapshots.yml
snapshots:
  - name: snappy
    relation: ref('foo')
    config:
      strategy: check
      unique_key: id
      check_cols: all
      hard_deletes: new_record
-- models/foo.sql
{{ config(materialized='table') }}
select 1 as id, 'alice' as first_name
  1. Build
$ dbt build
18:55:44  Running with dbt=1.9.0-rc2
18:55:45  Registered adapter: bigquery=1.9.0-rc1
18:55:45  Unable to do partial parsing because profile has changed
18:55:45  Unable to do partial parsing because a project dependency has been added
18:55:46  Found 1 model, 1 snapshot, 487 macros
18:55:46  
18:55:46  Concurrency: 4 threads (target='bq')
18:55:46  
18:55:48  1 of 2 START sql table model dbt_jyeo.foo ...................................... [RUN]
18:55:52  1 of 2 OK created sql table model dbt_jyeo.foo ................................. [CREATE TABLE (1.0 rows, 0 processed) in 3.92s]
18:55:52  2 of 2 START snapshot dbt_jyeo.snappy .......................................... [RUN]
18:55:58  2 of 2 OK snapshotted dbt_jyeo.snappy .......................................... [CREATE TABLE (1.0 rows, 15.0 Bytes processed) in 6.39s]
18:55:58  
18:55:58  Finished running 1 snapshot, 1 table model in 0 hours 0 minutes and 12.94 seconds (12.94s).
18:55:58  
18:55:58  Completed successfully
18:55:58  
18:55:58  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. Update foo and "hard delete" record with id = 1:
-- models/foo.sql
{{ config(materialized='table') }}
select 2 as id, 'bob' as first_name
  1. Build
$ dbt --debug build
...
18:56:50  2 of 2 START snapshot dbt_jyeo.snappy .......................................... [RUN]
18:56:50  Acquiring new bigquery connection 'snapshot.my_dbt_project.snappy'
18:56:50  Began compiling node snapshot.my_dbt_project.snappy
18:56:50  Began executing node snapshot.my_dbt_project.snappy
18:56:50  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "bq", "node_id": "snapshot.my_dbt_project.snappy"} */
select * from (
        select * from `cse-sandbox-319708`.`dbt_jyeo`.`foo`
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

    
18:56:50  Opening a new connection, currently in state init
18:56:51  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:0db88ccd-957f-470c-9475-bf3a280eabdb&page=queryresults
18:56:53  On snapshot.my_dbt_project.snappy: select * from (
        select * from `cse-sandbox-319708`.`dbt_jyeo`.`foo`
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

18:56:54  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:3a08cffe-5243-43e8-a3ec-ccf4b43da0af&page=queryresults
18:56:55  On snapshot.my_dbt_project.snappy: select * from (
        select * from `cse-sandbox-319708`.`dbt_jyeo`.`foo`
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

18:56:55  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:8606c905-f583-4169-b89f-6bce631adce3&page=queryresults
18:56:56  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "bq", "node_id": "snapshot.my_dbt_project.snappy"} */

        
  
    

    create or replace table `cse-sandbox-319708`.`dbt_jyeo`.`snappy__dbt_tmp`
      
    
    

    OPTIONS(
      expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
    )
    as (
      

    with snapshot_query as (

        select * from `cse-sandbox-319708`.`dbt_jyeo`.`foo`

    ),

    snapshotted_data as (

        select *, 
    
        id as dbt_unique_key
    

        from `cse-sandbox-319708`.`dbt_jyeo`.`snappy`
        where
            
                dbt_valid_to is null
            

    ),

    insertions_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_from,
            
  
  coalesce(nullif(
    current_timestamp()
, 
    current_timestamp()
), null)
  as dbt_valid_to
,
            to_hex(md5(concat(coalesce(cast(id as string), ''), '|',coalesce(cast(
    current_timestamp()
 as string), '')))) as dbt_scd_id

        from snapshot_query
    ),

    updates_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_valid_to

        from snapshot_query
    ),

    deletes_source_data as (

        select *, 
    
        id as dbt_unique_key
    

        from snapshot_query
    ),
    

    insertions as (

        select
            'insert' as dbt_change_type,
            source_data.*,'False' as dbt_is_deleted

        from insertions_source_data as source_data
        left outer join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        snapshotted_data.dbt_unique_key is null
    

            or (
    
        snapshotted_data.dbt_unique_key is not null
    
 and ((snapshotted_data.`id` != source_data.`id`
        or
        (
            ((snapshotted_data.`id` is null) and not (source_data.`id` is null))
            or
            ((not snapshotted_data.`id` is null) and (source_data.`id` is null))
        ) or snapshotted_data.`first_name` != source_data.`first_name`
        or
        (
            ((snapshotted_data.`first_name` is null) and not (source_data.`first_name` is null))
            or
            ((not snapshotted_data.`first_name` is null) and (source_data.`first_name` is null))
        )))

        )

    ),

    updates as (

        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted

        from updates_source_data as source_data
        join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

        where (
            (snapshotted_data.`id` != source_data.`id`
        or
        (
            ((snapshotted_data.`id` is null) and not (source_data.`id` is null))
            or
            ((not snapshotted_data.`id` is null) and (source_data.`id` is null))
        ) or snapshotted_data.`first_name` != source_data.`first_name`
        or
        (
            ((snapshotted_data.`first_name` is null) and not (source_data.`first_name` is null))
            or
            ((not snapshotted_data.`first_name` is null) and (source_data.`first_name` is null))
        ))
        )
    )
    ,
    deletes as (

        select
            'delete' as dbt_change_type,
            source_data.*,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_to,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )
        
    ,
    deletion_records as (

        select
            'insert' as dbt_change_type,snapshotted_data.`id`,
            snapshotted_data.`first_name`,
            snapshotted_data.dbt_unique_key as dbt_unique_key,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_updated_at,
            snapshotted_data.dbt_valid_to as dbt_valid_to,
            snapshotted_data.dbt_scd_id,
            'True' as dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )

    select * from insertions
    union all
    select * from updates
    union all
    select * from deletes
    union all
    select * from deletion_records

    );
  
    
18:56:56  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:e4b4cbd6-dd25-4e2c-93e5-3255fcb9045d&page=queryresults
18:57:00  BigQuery adapter: Adding columns ([]) to table `cse-sandbox-319708`.`dbt_jyeo`.`snappy`".
18:57:01  On snapshot.my_dbt_project.snappy: select * from (
        

    with snapshot_query as (

        select * from `cse-sandbox-319708`.`dbt_jyeo`.`foo`

    ),

    snapshotted_data as (

        select *, 
    
        id as dbt_unique_key
    

        from `cse-sandbox-319708`.`dbt_jyeo`.`snappy`
        where
            
                dbt_valid_to is null
            

    ),

    insertions_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_from,
            
  
  coalesce(nullif(
    current_timestamp()
, 
    current_timestamp()
), null)
  as dbt_valid_to
,
            to_hex(md5(concat(coalesce(cast(id as string), ''), '|',coalesce(cast(
    current_timestamp()
 as string), '')))) as dbt_scd_id

        from snapshot_query
    ),

    updates_source_data as (

        select *, 
    
        id as dbt_unique_key
    
,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_valid_to

        from snapshot_query
    ),

    deletes_source_data as (

        select *, 
    
        id as dbt_unique_key
    

        from snapshot_query
    ),
    

    insertions as (

        select
            'insert' as dbt_change_type,
            source_data.*,'False' as dbt_is_deleted

        from insertions_source_data as source_data
        left outer join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        snapshotted_data.dbt_unique_key is null
    

            or (
    
        snapshotted_data.dbt_unique_key is not null
    
 and ((snapshotted_data.`id` != source_data.`id`
        or
        (
            ((snapshotted_data.`id` is null) and not (source_data.`id` is null))
            or
            ((not snapshotted_data.`id` is null) and (source_data.`id` is null))
        ) or snapshotted_data.`first_name` != source_data.`first_name`
        or
        (
            ((snapshotted_data.`first_name` is null) and not (source_data.`first_name` is null))
            or
            ((not snapshotted_data.`first_name` is null) and (source_data.`first_name` is null))
        )))

        )

    ),

    updates as (

        select
            'update' as dbt_change_type,
            source_data.*,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted

        from updates_source_data as source_data
        join snapshotted_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

        where (
            (snapshotted_data.`id` != source_data.`id`
        or
        (
            ((snapshotted_data.`id` is null) and not (source_data.`id` is null))
            or
            ((not snapshotted_data.`id` is null) and (source_data.`id` is null))
        ) or snapshotted_data.`first_name` != source_data.`first_name`
        or
        (
            ((snapshotted_data.`first_name` is null) and not (source_data.`first_name` is null))
            or
            ((not snapshotted_data.`first_name` is null) and (source_data.`first_name` is null))
        ))
        )
    )
    ,
    deletes as (

        select
            'delete' as dbt_change_type,
            source_data.*,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_updated_at,
            
    current_timestamp()
 as dbt_valid_to,
            snapshotted_data.dbt_scd_id, snapshotted_data.dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )
        
    ,
    deletion_records as (

        select
            'insert' as dbt_change_type,snapshotted_data.`id`,
            snapshotted_data.`first_name`,
            snapshotted_data.dbt_unique_key as dbt_unique_key,
            
    current_timestamp()
 as dbt_valid_from,
            
    current_timestamp()
 as dbt_updated_at,
            snapshotted_data.dbt_valid_to as dbt_valid_to,
            snapshotted_data.dbt_scd_id,
            'True' as dbt_is_deleted
        from snapshotted_data
        left join deletes_source_data as source_data
            on 
    
        snapshotted_data.dbt_unique_key = source_data.dbt_unique_key
    

            where 
    
        source_data.dbt_unique_key is null
    

    )

    select * from insertions
    union all
    select * from updates
    union all
    select * from deletes
    union all
    select * from deletion_records

    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

18:57:01  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:223d4138-3f0e-45eb-9541-a082b9969550&page=queryresults
18:57:02  On snapshot.my_dbt_project.snappy: select * from (
        select 
    current_timestamp()
 as dbt_snapshot_time
    ) as __dbt_sbq
    where false and current_timestamp() = current_timestamp()
    limit 0

18:57:02  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:d1854f9b-50ef-4727-a39f-bebf100a197c&page=queryresults
18:57:03  Writing runtime sql for node "snapshot.my_dbt_project.snappy"
18:57:03  On snapshot.my_dbt_project.snappy: /* {"app": "dbt", "dbt_version": "1.9.0rc2", "profile_name": "all", "target_name": "bq", "node_id": "snapshot.my_dbt_project.snappy"} */

      merge into `cse-sandbox-319708`.`dbt_jyeo`.`snappy` as DBT_INTERNAL_DEST
    using `cse-sandbox-319708`.`dbt_jyeo`.`snappy__dbt_tmp` as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     
       and DBT_INTERNAL_DEST.dbt_valid_to is null
     
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
        then insert (`id`, `first_name`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`, `dbt_is_deleted`)
        values (`id`, `first_name`, `dbt_updated_at`, `dbt_valid_from`, `dbt_valid_to`, `dbt_scd_id`, `dbt_is_deleted`)


  
18:57:03  BigQuery adapter: https://console.cloud.google.com/bigquery?project=cse-sandbox-319708&j=bq:US:6522fd03-093d-4196-9134-986648dc895d&page=queryresults
18:57:06  Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '7ccca803-fe11-4c90-9883-64b1af09a029', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x3172bd510>]}
18:57:06  2 of 2 OK snapshotted dbt_jyeo.snappy .......................................... [MERGE (2.0 rows, 304.0 Bytes processed) in 16.81s]
...

Check snapshot:

image

^ We can see that there is no new row for id = 1 added with the dbt_is_deleted col set to True - if things worked as expected (as it does on postgres) - then that would be the case.

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.11.9
- dbt-core: 1.9.0-rc2
- dbt-bigquery: 1.9.0rc1

Additional Context

This works as expected on dbt-postgres:

image
@jeremyyeo jeremyyeo added bug Something isn't working triage labels Dec 13, 2024
@peterallenwebb peterallenwebb self-assigned this Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants