Add schema to the default location root #239

JCZuurmond · 2021-10-22T14:44:41Z

Describe the feature

Add the schema to the location root by default.

Note that this logic [the generate_schema_name macro] is designed so that two dbt users won't accidentally overwrite each other's work by writing to the same schema.

I like this behavior. In the case of Spark, if you decide to materialize your tables as files (which you commonly do) then it might be that two users do not overwrite eachothers table definitions as these have separate schemas (or databases in spark). However, the users do overwrite the underlying data of eachothers tables.

I think this is unexpected behavior, which we can avoid by adding the schema to the location root - by default.

Describe alternatives you've considered

Overwriting the macro myself.

Additional context

The location-root macro.

Who will this benefit?

Teams where people develop on dbt-spark concurrently. Unexpected behavior will happen when you unknowingly overwrite the data of each others tables.

Are you interested in contributing this feature?

This is the macro I use:

{% macro location_clause() %}
  {%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%}
  {%- set schema = model['schema'] -%}
  {%- set identifier = model['alias'] -%}
  {%- set file_format = config.get('file_format', validator=validation.any[basestring]) -%}
  {%- if location_root is not none %}
  location '{{ location_root }}/schema={{ schema }}/model={{ identifier }}/file_format={{ file_format }}/'
  {%- endif %}
{%- endmacro -%}

We could choose something like:

{% macro location_clause() %}
  {%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%}
  {%- set schema = model['schema'] -%}
  {%- set identifier = model['alias'] -%}
  {%- if location_root is not none %}
  location '{{ location_root }}/{{ schema }}/{{ identifier }}
  {%- endif %}
{%- endmacro -%}

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2021-10-22T15:38:08Z

@JCZuurmond I'm super supportive of this change. It might be breaking for some users, so we'll just want to call it out as such in the changelog / release notes, and wait for the next minor version to release — which will be v1.0 :)

I don't have a strong preference between the two options you've proposed:

'{{ location_root }}/schema={{ schema }}/model={{ identifier }}/
'{{ location_root }}/{{ schema }}/{{ identifier }}

The former is closer to standard Hive partitioning layout, right? If we're already changing the default, we may as well pick an approach that will serve us well for some time to come.

JCZuurmond · 2021-10-27T06:03:28Z

The <name>=<value> syntax is used for column partitions. As these are not partitioned columns it might be confusing to some users. I have that syntax because I like the explicitness of it, no need for guessing what the schema or model is.

@guillesd what is your take on this issue?

JCZuurmond · 2021-10-27T06:08:06Z

forgot to mention, I would choose: '{{ location_root }}/{{ schema }}/{{ identifier }}. I think this is what most users would expect.

guillesd · 2021-10-28T08:23:49Z

Hey chicos! Nice indeed to get this going. I'm not sure which of the two options I'd go for. If we choose your preferred option @JCZuurmond, then it'd look something like this for partitioned tables: '{{ location_root }}/{{ schema }}/{{ identifier }}/partition_column=partition_value'. Is this ok or confusing for the user that there is this change of syntax? No strong opinion though!

JCZuurmond · 2021-10-28T09:24:39Z

I think the other option {{ location_root }}/schema={{ schema }}/identifier={{ identifier }}/partition_column=partition_value is more confusing, as it suggests schema and identifier are columns.

jtcohen6 · 2021-10-28T10:15:49Z

@JCZuurmond Ah, that's a really good point. I think you've clinched it for me: '{{ location_root }}/{{ schema }}/{{ identifier }} feels right.

github-actions · 2022-04-27T02:11:36Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

github-actions · 2022-10-31T02:12:29Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

JCZuurmond · 2022-11-28T10:41:15Z

PR #339 is still open to solve this issue

JCZuurmond added enhancement New feature or request triage labels Oct 22, 2021

jtcohen6 removed the triage label Oct 22, 2021

jtcohen6 added the good_first_issue Good for newcomers label Oct 22, 2021

dan1elt0m mentioned this issue Apr 22, 2022

add schema to the default location root #339

Closed

4 tasks

github-actions bot added the Stale label Apr 27, 2022

jtcohen6 removed the Stale label May 3, 2022

github-actions bot added the Stale label Oct 31, 2022

github-actions bot closed this as completed Nov 8, 2022

Fleid mentioned this issue May 9, 2023

[ADAP-531] [PR Tracking] add schema to the default location root #339 #757

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add schema to the default location root #239

Add schema to the default location root #239

JCZuurmond commented Oct 22, 2021 •

edited

Loading

jtcohen6 commented Oct 22, 2021

JCZuurmond commented Oct 27, 2021 •

edited

Loading

JCZuurmond commented Oct 27, 2021

guillesd commented Oct 28, 2021 •

edited

Loading

JCZuurmond commented Oct 28, 2021

jtcohen6 commented Oct 28, 2021

github-actions bot commented Apr 27, 2022

github-actions bot commented Oct 31, 2022

JCZuurmond commented Nov 28, 2022

Add schema to the default location root #239

Add schema to the default location root #239

Comments

JCZuurmond commented Oct 22, 2021 • edited Loading

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

jtcohen6 commented Oct 22, 2021

JCZuurmond commented Oct 27, 2021 • edited Loading

JCZuurmond commented Oct 27, 2021

guillesd commented Oct 28, 2021 • edited Loading

JCZuurmond commented Oct 28, 2021

jtcohen6 commented Oct 28, 2021

github-actions bot commented Apr 27, 2022

github-actions bot commented Oct 31, 2022

JCZuurmond commented Nov 28, 2022

JCZuurmond commented Oct 22, 2021 •

edited

Loading

JCZuurmond commented Oct 27, 2021 •

edited

Loading

guillesd commented Oct 28, 2021 •

edited

Loading