Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Microbatch] Update default make_temp_relation macro to incorporate a batch specific identifier if available #360

Closed
Tracked by #10624
MichelleArk opened this issue Nov 20, 2024 · 1 comment · Fixed by #361
Assignees

Comments

@MichelleArk
Copy link
Contributor

MichelleArk commented Nov 20, 2024

Desired Improvement

Microbatch batches have been improved in core such that they may be run concurrently. When a batch is executed, the necessary data is first moved into a temp_relation in the data warehouse. The path for this temp_relation is <resource_identifier>__dbt_tmp. As it currently stands, the <resource_identifier>__dbt_tmp will be the same for each batch of a given microbatch model. If the batches are run concurrently, they may end up clobbering the temp_relation destination, which may lead to some wonkiness. As such, we need a way to ensure that each batch for a microbatch model gets a unique temp_relation path.

Helpful Prior Art

Similar to dbt-postgres: https://github.com/dbt-labs/dbt-postgres/blob/ae48e67dae6c1b00cda37ee9bdc61d3330506638/dbt/include/postgres/macros/adapters.sql#L149-L152

@QMalcolm QMalcolm changed the title Update default make_temp_relation macro to incorporate config.__dbt_internal_event_time_startif available, microbatch temp relations [Microbatch] Update default make_temp_relation macro to incorporate a batch specific identifier if available Nov 21, 2024
@QMalcolm QMalcolm self-assigned this Nov 21, 2024
@vanAkim
Copy link

vanAkim commented Nov 29, 2024

Hello,

what a shame that I didn't find this issue before learning how the source code works, and found that my problem could be solved by #361. I was currently starting a similar PR (if I copy/past the proposed solution it breaks the macro cause BQ conflict, dbt-core 1.8.8 / dbt-adapters 1.7.0 / dbt-bigquery 1.8.3).
At least, I've learned a lot and waiting to this release.

However, let me give you a little more context, because this solves my issue, but it's not related to microbatches. I'm currently implementing dbt with Airflow, and we are used to having multiples runs to ingest data in different partitions of BigQuery table.
Thus, having a dbt model model_A, materialized as incremental with insert_overwrite strategy, launching multiples runs, let's say D-2 and D-1, for different partitions, will create a conflict with model_A__dbt_tmp table. And the dbt-bigquery don't leverage this situation and relies on the make_temp_relation on dbt-adapters. So, it's the exact same situation as microbatch but on a less granular scope. I see that in a similar situation, it also share here #222

You'll probably know and understand better than me, but I wanted to share my situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants