-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Use fully qualified names in rename for tables and views #1060
Conversation
@@ -1,3 +1,3 @@ | |||
{%- macro snowflake__get_rename_table_sql(relation, new_name) -%} | |||
alter table {{ relation }} rename to {{ new_name }} | |||
alter table {{ relation }} rename to {{ relation.incorporate(path={"identifier": new_name}) }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a user pointed out in this comment, it isn't clear right now whether the type signature of get_rename_x_sql
is:
get_rename_table_sql(relation: BaseRelation, new_name: str)
get_rename_table_sql(relation: BaseRelation, new_name: BaseRelation)
This change will break if someone is currently passing a Relation
object into the second argument. Do you think we should add conditional handling based on the type of new_name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My short answer is that I'd rather not overload the argument.
While I specifically chose the suffix name
to distinguish from relation
, that's an implicit assumption. We could adopt the practice of adding the equivalent of a docstring to macros that we expect users to use (or maybe just all of them for our own sake). I like this idea better than what we're currently doing. If we instead add a conditional that handles both a relation and a string, then we're advertising that both are acceptable when we really don't expect a relation.
I have a stronger concern from the user's second comment, that this takes away the ability to use this macro to move an object to a different schema by supplying a fully qualified name. The Snowflake docs on ALTER TABLE suggest that the default for <new_table_name> is just the identifier, which aligns with my experience in other databases. They go on further in the parameters section to mention that you could use an ALTER/RENAME statement to move an object from one database/schema to another. That's possible with the current implementation, but not so with this version.
So talking that out, I actually think we should not make this change at all. However, it's certainly worth adding the docstring to make it clearer to users what this macro expects for arguments and how it's expected to be used. I would also keep the tests that were added and update them to whatever we choose to go with. I'm definitely interested in your thoughts on this before moving forward as this is ultimately your call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikealfare Thanks for thinking this through!
- I agree we shouldn't overload the type of this macro, it should be either
str
orRelation
(andstr
to avoid a breaking change with current expected type) - If I had my druthers, we would always use fully-qualified relation names, to avoid this and related issues. (The alternative is to run
use
statements before every single query) - Are we ever actually intending to "rename" an object to change more than just its identifier (i.e. also changing its database/schema)? I don't think this is something we ever(?) do within dbt today. In theory, we could do this in the future, for dynamic tables and incremental models, as a form of database migration, if we know the object exists in its old location and only the configured database/schema has changed — a form of "drift detection" — but the current behavior is to presume that the object doesn't already exist. If we're only ever renaming object identifiers, then reusing the database/schema from the old relation name seems reasonable.
I might have been confused about the specific bug that surfaced with dynamic tables, which motivated me to open #1031. I agree the actual underlying issue was the 2024_03
bundle change to show terse objects
, which led dbt to think a DT was actually a table, and try to drop + recreate it accordingly. But it was during that drop + recreate (= create-alter-rename-swap-drop), the backup DT was supposed to be getting dropped, but dbt wasn't dropping it correctly, because it wasn't actually renaming them correctly.
Basically, something like this:
-- implicit each time we open a new connection, with the default schema set to {{ target.schema }}
use schema analytics.dbt_jcohen_dev;
-- imagine a table configured with a custom schema
create schema if not exists analytics.dbt_jcohen_other;
create or replace table analytics.dbt_jcohen_other.my_table as (select 1 as id);
-- a new connection
use schema analytics.dbt_jcohen_dev;
alter table analytics.dbt_jcohen_other.my_table rename to my_table__dbt_backup;
drop table analytics.dbt_jcohen_other.my_table__dbt_backup;
This fails because the table was renamed to analytics.dbt_jcohen_dev.my_table__dbt_backup
, NOT analytics.dbt_jcohen_other.my_table__dbt_backup
. In the meantime, the renamed DT keeps "hanging out" and is never actually dropped.
Making the proposed change actually resolved that surface-level issue for me (with 2024_03
bundle + previous show terse objects
logic), though it left the underlying problem about misclassifying DTs (which we resolved in #1049).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
notes from live conversation with @mikealfare
Two options here:
- Current proposed change. We always rename to a fully qualified identifier name. The intended scope of
get_rename_sql
is only to update theidentifier
of an object (view/table/etc). It does not support moving objects into different databases/schemas. If we need to do the latter in the future, we'll implement some other macro (get_rename_relation_sql
). - Or:
get_rename_sql
acceptsUnion[str, Relation]
as the type fornew_name
, and handles either with conditional logic. This feels like too low-level a place to be doing the type checking, and we'll be doing it several times in different places.
Proposal: The top-level get_rename_sql
macro should accept two arguments: relation
(which is a Relation
) and new_name
, which is either:
- A string, representing just an updated
identifier
forrelation
(first argument). We expect this is >95% of the expected usage of this macro, and so it should be easy to use it in this way. - A
Relation
object, which it will convert to a string (viarender()
) representing that relation's fully qualified name. - It is not supported to pass a
string
like"different_database.different_schema.different_identifier"
. If a user wants to leverageget_rename_sql
for moving objects to different database/schema, they need to build aRelation
object and pass that in.
{%- macro get_rename_sql(relation, new_name) -%}
{{- log('Applying RENAME to: ' ~ relation) -}}
{% set new_name = relation.incorporate(path={"identifier": new_name}).render() if new_name is string else new_name.render() %}
{{- adapter.dispatch('get_rename_sql', 'dbt')(relation, new_name) -}}
{%- endmacro -%}
In this example, whether we make the second argument backup_relation.identifier
("happy path") OR just backup_relation
, we get the same result. In that way, this is a backward-compatible change and yields what the user would expect.
What feels less good? The lower-level rename macros to which it dispatches should accept relation
(which is a Relation
) and new_name
(which is always a string, representing a fully qualified relation name).
We're going to need to make some choice here that feels less-than-ideal (or has some trade-off), so we're going to spend a little (not too much) time comparing two potential paths forward and then pick one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jtcohen6 Please see the new files that I pushed to this PR:
_rename_approach_0.sql
- current state_rename_approach_1.sql
- this PR's approach_rename_approach_2.sql
- the approach from our discussion_rename_approach_3.sql
- a compromise of 1 and 2_rename_usage.sql
- two impacted use cases of these changes
I combined macros from different files and packages in one spot so that we don't need to flip back and forth to follow the flow. I also highlighted the change with a comment. Most of the code really isn't changing, but it helps to see where the changed code is. My new proposal is option 3.
I have two more outcomes from this exercise:
- it's probably worth documenting this flow (e.g.
_rename_approach_0.sql
) somewhere for future discussions - I really like the idea of adding python-like docstrings to macros to communicate intent and expected inputs; this would be useful for adapter maintainers who need to implement these macros, and end users who may use these macros in their custom project macros
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a fourth approach which I think I actually prefer more than 3. We basically only impact the backup/deploy renames in dbt-snowflake
, which I think is where we're seeing the issue anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikealfare Thanks for the thorough walkthrough! So the idea behind approach 4 is:
- Snowflake's behavior here is unique, in its treatment of renaming objects based on the defaults specified in the connection
- Therefore, we should make this change within
dbt-snowflake
, to pass the entire rendered relation name (fully qualified) from the backup/intermediate rename macros
I think I buy your reasoning, that this approach is:
- the least disruptive to existing functionality, other adapters, and the base spec
- justifiable as an isolated change to
dbt-snowflake
because of this unique(?) behavior on Snowflake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's the rationale. And this leaves the rename macro flexible, so that users can use it for other purposes. I'll update this PR with approach 4 then, and remove the temp files. Thanks for the feedback!
resolves #1031
Problem
The rename macro for tables and views uses only the identifier. It's expected to use the fully qualified name.
Solution
Update these macros to incorporate the new name into the source relation, thereby preserving the database and schema.
Checklist