-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add schema to the default location root #339
Conversation
@dan1elt0m LGTM! Could you add the PR to the change log entry too? There are more entries having both the issue and the PR. @jtcohen6 : We should communicate well about this fix. It may cause confusing: new tables created with this fix have the location which includes the schema, while existing tables have the location without the schema. If users do not like this change they keep the old behavior by adding the following macro to their project: {% macro spark__location_clause() %}
{%- set location_root = config.get('location_root', validator=validation.any[basestring]) -%}
{%- set identifier = model['alias'] -%}
{%- if location_root is not none %}
location '{{ location_root }}/{{ identifier }}'
{%- endif %}
{%- endmacro -%} Still, I think it's a great change! It's even more confusing when multiple people work on the same dbt project expecting to create separate tables in their user schema, but actually the tables in different schema point to the same file location, causing unexpected, weird issues. |
Done |
@dan1elt0m Thanks for the PR! @JCZuurmond I was just going to ask the same question. This is technically a breaking change, but it's really how this should have worked all along. I'd like to include this change in v1.2 and include a clear upgrade notice to exactly this effect. Anyone who wants to preserve the old behavior has a clear mechanism for doing so. |
@jtcohen6 : What is the plan for this PR? |
@JCZuurmond Let's do it ✅ it's a good change, and a sensible default. We're in the process of migrating our adapter repos to use the same changelog system (changie) that |
@dan1elt0m @JCZuurmond so sorry for the late update, if either of you are still free to work on this the |
@dan1elt0m Could you do the |
Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Daniel Tom.
|
625796b
to
66e453e
Compare
@McKnight-42 @JCZuurmond: I added the changie log |
@McKnight-42 : Wat is the plan with this PR? |
@McKnight-42 : What is the plan with this PR? I see you opened it again |
{%- set identifier = model['alias'] -%} | ||
{%- if location_root is not none %} | ||
location '{{ location_root }}/{{ identifier }}' | ||
location '{{ location_root }}/{{ schema }}/{{ identifier }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An interesting way to make this non-breaking would be to make this configurable:
location '{{ location_root }}/{{ schema }}/{{ identifier }}' | |
{%- set use_schema_in_location = config.get('use_schema_in_location_root') -%} | |
{%- if location_root is not none %} | |
{%- if use_schema_in_location is not none %} | |
location '{{ location_root }}/{{ schema }}/{{ identifier }}' | |
{% else %} | |
location '{{ location_root }}/{{ identifier }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't we also update config.get()
to stuff the schema into the location root if it's present?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JCZuurmond do you have an opinion on that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I would like to have the location with the schema enabled by default.
Without the schema, users may overwrite each other's tables even when they have them in separate schemas because they are overwriting the underlying data via the location. This happened to colleagues of mine, which is a hard bug to track down.
I discussed the non-breaking/breaking change with @jtcohen6. I recall we were ok with it, but I do not see this conversation back in the issue maybe we had the conversation over Slack.
In any case, the change is kinda of a breaking change. The location is added when a user creates a new table; either because the data model is new (in the target schema), or when the materialization is table
, or when using --full-refresh
. Existing incremental tables are not affected and for the other options the tables are recreated anyway. Technically speaking, you could argue this is not (really) a breaking change.
My preference would be to enable the location with schema by default AND communicate clearly what is changed and why. It will be confusing if users do not know/understand this change and they see that some of the (older) tables have a location without schema and others have a location with schema. However, it will be even more confusing when multiple developers work together within the same warehouse and they are interfering with each other's tables unknowingly.
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
@Fleid or @colin-rogers-dbt : What will we do with this PR? I still think it is an important change, but also tricky due to the change in location |
This PR has been marked as Stale because it has been open with no activity as of late. If you would like the PR to remain open, please comment on the PR or else it will be closed in 7 days. |
Although we are closing this PR as stale, it can still be reopened to continue development. Just add a comment to notify the maintainers. |
resolves #239
Description
Adds schema to the default location root
Checklist
CHANGELOG.md
and added information about my change to the "dbt-spark next" section.