Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Use fully qualified names in rename for tables and views #1060

Merged
merged 14 commits into from
Jun 20, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Fixes-20240522-160538.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Fixes
body: 'Rename targets for tables and views use fully qualified names'
time: 2024-05-22T16:05:38.602074-04:00
custom:
Author: mikealfare
Issue: "1031"
2 changes: 1 addition & 1 deletion dbt/include/snowflake/macros/relations/table/rename.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{%- macro snowflake__get_rename_table_sql(relation, new_name) -%}
alter table {{ relation }} rename to {{ new_name }}
alter table {{ relation }} rename to {{ relation.incorporate(path={"identifier": new_name}) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a user pointed out in this comment, it isn't clear right now whether the type signature of get_rename_x_sql is:

get_rename_table_sql(relation: BaseRelation, new_name: str)
get_rename_table_sql(relation: BaseRelation, new_name: BaseRelation)

This change will break if someone is currently passing a Relation object into the second argument. Do you think we should add conditional handling based on the type of new_name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My short answer is that I'd rather not overload the argument.

While I specifically chose the suffix name to distinguish from relation, that's an implicit assumption. We could adopt the practice of adding the equivalent of a docstring to macros that we expect users to use (or maybe just all of them for our own sake). I like this idea better than what we're currently doing. If we instead add a conditional that handles both a relation and a string, then we're advertising that both are acceptable when we really don't expect a relation.

I have a stronger concern from the user's second comment, that this takes away the ability to use this macro to move an object to a different schema by supplying a fully qualified name. The Snowflake docs on ALTER TABLE suggest that the default for <new_table_name> is just the identifier, which aligns with my experience in other databases. They go on further in the parameters section to mention that you could use an ALTER/RENAME statement to move an object from one database/schema to another. That's possible with the current implementation, but not so with this version.

So talking that out, I actually think we should not make this change at all. However, it's certainly worth adding the docstring to make it clearer to users what this macro expects for arguments and how it's expected to be used. I would also keep the tests that were added and update them to whatever we choose to go with. I'm definitely interested in your thoughts on this before moving forward as this is ultimately your call.

Copy link
Contributor

@jtcohen6 jtcohen6 May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikealfare Thanks for thinking this through!

  • I agree we shouldn't overload the type of this macro, it should be either str or Relation (and str to avoid a breaking change with current expected type)
  • If I had my druthers, we would always use fully-qualified relation names, to avoid this and related issues. (The alternative is to run use statements before every single query)
  • Are we ever actually intending to "rename" an object to change more than just its identifier (i.e. also changing its database/schema)? I don't think this is something we ever(?) do within dbt today. In theory, we could do this in the future, for dynamic tables and incremental models, as a form of database migration, if we know the object exists in its old location and only the configured database/schema has changed — a form of "drift detection" — but the current behavior is to presume that the object doesn't already exist. If we're only ever renaming object identifiers, then reusing the database/schema from the old relation name seems reasonable.

I might have been confused about the specific bug that surfaced with dynamic tables, which motivated me to open #1031. I agree the actual underlying issue was the 2024_03 bundle change to show terse objects, which led dbt to think a DT was actually a table, and try to drop + recreate it accordingly. But it was during that drop + recreate (= create-alter-rename-swap-drop), the backup DT was supposed to be getting dropped, but dbt wasn't dropping it correctly, because it wasn't actually renaming them correctly.

Basically, something like this:

-- implicit each time we open a new connection, with the default schema set to {{ target.schema }}
use schema analytics.dbt_jcohen_dev;

-- imagine a table configured with a custom schema
create schema if not exists analytics.dbt_jcohen_other;
create or replace table analytics.dbt_jcohen_other.my_table as (select 1 as id);

-- a new connection
use schema analytics.dbt_jcohen_dev;
alter table analytics.dbt_jcohen_other.my_table rename to my_table__dbt_backup;
drop table analytics.dbt_jcohen_other.my_table__dbt_backup;

This fails because the table was renamed to analytics.dbt_jcohen_dev.my_table__dbt_backup, NOT analytics.dbt_jcohen_other.my_table__dbt_backup. In the meantime, the renamed DT keeps "hanging out" and is never actually dropped.

Making the proposed change actually resolved that surface-level issue for me (with 2024_03 bundle + previous show terse objects logic), though it left the underlying problem about misclassifying DTs (which we resolved in #1049).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notes from live conversation with @mikealfare

Two options here:

  • Current proposed change. We always rename to a fully qualified identifier name. The intended scope of get_rename_sql is only to update the identifier of an object (view/table/etc). It does not support moving objects into different databases/schemas. If we need to do the latter in the future, we'll implement some other macro (get_rename_relation_sql).
  • Or: get_rename_sql accepts Union[str, Relation] as the type for new_name, and handles either with conditional logic. This feels like too low-level a place to be doing the type checking, and we'll be doing it several times in different places.

Proposal: The top-level get_rename_sql macro should accept two arguments: relation (which is a Relation) and new_name, which is either:

  • A string, representing just an updated identifier for relation (first argument). We expect this is >95% of the expected usage of this macro, and so it should be easy to use it in this way.
  • A Relation object, which it will convert to a string (via render()) representing that relation's fully qualified name.
  • It is not supported to pass a string like "different_database.different_schema.different_identifier". If a user wants to leverage get_rename_sql for moving objects to different database/schema, they need to build a Relation object and pass that in.
{%- macro get_rename_sql(relation, new_name) -%}
    {{- log('Applying RENAME to: ' ~ relation) -}}
    {% set new_name = relation.incorporate(path={"identifier": new_name}).render() if new_name is string else new_name.render() %}
    {{- adapter.dispatch('get_rename_sql', 'dbt')(relation, new_name) -}}
{%- endmacro -%}

In this example, whether we make the second argument backup_relation.identifier ("happy path") OR just backup_relation, we get the same result. In that way, this is a backward-compatible change and yields what the user would expect.

What feels less good? The lower-level rename macros to which it dispatches should accept relation (which is a Relation) and new_name (which is always a string, representing a fully qualified relation name).

We're going to need to make some choice here that feels less-than-ideal (or has some trade-off), so we're going to spend a little (not too much) time comparing two potential paths forward and then pick one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 Please see the new files that I pushed to this PR:

  • _rename_approach_0.sql - current state
  • _rename_approach_1.sql - this PR's approach
  • _rename_approach_2.sql - the approach from our discussion
  • _rename_approach_3.sql - a compromise of 1 and 2
  • _rename_usage.sql - two impacted use cases of these changes

I combined macros from different files and packages in one spot so that we don't need to flip back and forth to follow the flow. I also highlighted the change with a comment. Most of the code really isn't changing, but it helps to see where the changed code is. My new proposal is option 3.

I have two more outcomes from this exercise:

  • it's probably worth documenting this flow (e.g. _rename_approach_0.sql) somewhere for future discussions
  • I really like the idea of adding python-like docstrings to macros to communicate intent and expected inputs; this would be useful for adapter maintainers who need to implement these macros, and end users who may use these macros in their custom project macros

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a fourth approach which I think I actually prefer more than 3. We basically only impact the backup/deploy renames in dbt-snowflake, which I think is where we're seeing the issue anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikealfare Thanks for the thorough walkthrough! So the idea behind approach 4 is:

  • Snowflake's behavior here is unique, in its treatment of renaming objects based on the defaults specified in the connection
  • Therefore, we should make this change within dbt-snowflake, to pass the entire rendered relation name (fully qualified) from the backup/intermediate rename macros

I think I buy your reasoning, that this approach is:

  • the least disruptive to existing functionality, other adapters, and the base spec
  • justifiable as an isolated change to dbt-snowflake because of this unique(?) behavior on Snowflake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that's the rationale. And this leaves the rename macro flexible, so that users can use it for other purposes. I'll update this PR with approach 4 then, and remove the temp files. Thanks for the feedback!

{%- endmacro -%}
2 changes: 1 addition & 1 deletion dbt/include/snowflake/macros/relations/view/rename.sql
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{%- macro snowflake__get_rename_view_sql(relation, new_name) -%}
alter view {{ relation }} rename to {{ new_name }}
alter view {{ relation }} rename to {{ relation.incorporate(path={"identifier": new_name}) }}
{%- endmacro -%}
Loading