Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize union_relations macro to support deduplication with UNION #816

Open
will-hou opened this issue Jul 27, 2023 · 6 comments
Open
Labels
enhancement New feature or request triage

Comments

@will-hou
Copy link

will-hou commented Jul 27, 2023

Describe the feature

By default, the union_relations macro performs a UNION ALL. However, I think the macro should also support deduplication of rows using UNION

{%- macro union_relations(relations, column_override=none, include=[], exclude=[], source_column_name='_dbt_source_relation', where=none) -%}

Describe alternatives you've considered

Users could perform a SELECT DISTINCT * after the UNION ALL in their model which would achieve the same functionality. However, I think it more ideal to include the option as a parameter since it is supported by SQL.

Additional context

N/A

Who will this benefit?

While it is better practice to dedupe in staging models, teams may want to use a quick UNION for prototyping non-productionized code.

Are you interested in contributing this feature?

Yes! I would be happy to make a PR with the change and associated tests.

Tasks

Preview Give feedback
No tasks being tracked yet.
Copy link

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 24, 2024
Copy link

github-actions bot commented Feb 1, 2024

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 1, 2024
@dbeatty10 dbeatty10 removed the Stale label Apr 18, 2024
@dbeatty10 dbeatty10 reopened this Apr 18, 2024
@katieclaiborne-duet
Copy link
Contributor

I'm interested in this feature as well! Anything I can do to help move it along?

@dbeatty10
Copy link
Contributor

@katieclaiborne-duet

What do you think about the ergonomics of each of the options below?

Option 1: new boolean parameter named distinct with a default of false?

Option 2: a new string parameter named union? And its two possible values were all (default) and distinct)?

Option 3 (like in #819): a new boolean parameter named deduplicate (default false)?

Examples

Option 1:

{{ dbt_utils.union_relations(
    relations=[ref('my_model'), source('my_source', 'my_table')],
    distinct=false
) }}

Option 2:

{{ dbt_utils.union_relations(
    relations=[ref('my_model'), source('my_source', 'my_table')],
    union='distinct'
) }}

Option 3:

{{ dbt_utils.union_relations(
    relations=[ref('my_model'), source('my_source', 'my_table')],
    deduplicate=true
) }}

@katieclaiborne-duet
Copy link
Contributor

Option 2 is closest to my initial thought!

I had imagined a new string parameter named union_type, with values of all (default) and distinct.

@katieclaiborne-duet
Copy link
Contributor

@dbeatty10, mind if I give option 2 a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
3 participants