-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist view column comments #866
Conversation
Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA. In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR. CLA has not been signed by users: @jurasan |
@cla-bot check: |
The cla-bot has been summoned, and re-checked this pull request! |
@jurasan can you add a changelog summarizing this change? |
@jurasan The failing code check is for whitespace in Also, I don't understand why the test in your linked PR wouldn't be able to run in this PR. We regularly call |
@JCZuurmond for information only |
fixed this
Yes, we do call dbt-core macros, but If you look at the |
Set this to auto-merge, looks like all tests are passing. @jurasan can you update your branch with the latest from main? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jurasan : Thank you for your contribution.
If I understand correctly, the code does roughly the following:
If the persist_docs.columns
flag is set to true, then comments are added to the views by matching the columns outputted by the view query with the columns in the matching model definition (yml file).
I have some doubts about executing the SQL when creating a view. A benefit of views is that they are computational cheap to create. It would be an unexpected side effect to me of setting persist_docs.columns
to true
that the view SQL is executed when creating a view.
|
||
{% macro spark__create_view_as(relation, sql) -%} | ||
create or replace view {{ relation }} | ||
{% if config.persist_column_docs() -%} | ||
{% set model_columns = model.columns %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the model columns coming from?
|
||
{% macro spark__create_view_as(relation, sql) -%} | ||
create or replace view {{ relation }} | ||
{% if config.persist_column_docs() -%} | ||
{% set model_columns = model.columns %} | ||
{% set query_columns = get_columns_in_query(sql) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that the SQL of the view executed when creating the view with persist_docs.columns = true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issuing the get_columns_in_query query against one of the largest table internally at Databricks returned in 148 ms. It is sufficiently performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the time, does it execute the SQL to get the columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It executes a limit 0.
@@ -229,9 +229,43 @@ | |||
{% endfor %} | |||
{% endmacro %} | |||
|
|||
{% macro get_matched_column(column_name, column_dict) %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spark columns are case sensitive, I would not do the upper/lower matching
@@ -229,9 +229,29 @@ | |||
{% endfor %} | |||
{% endmacro %} | |||
|
|||
{% macro get_column_comment_sql(column_name, column_dict) -%} | |||
{% if column_name in column_dict and column_dict[column_name]["description"] -%} | |||
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter with
| replace("'", "\\'")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this what you mean? could TLDR's the filter's purpose for me?
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" %} | |
{% set column_comment_clause = "comment '" ~ column_dict[column_name]["description"] ~ "'" | replace("'", "\\'")%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its to escape single-quotes to allow them to be used in the comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need it to operate only on the description, I think here you have it operating either on the whole string or on the final ' (I'm not the best at Jinja). In my local copy I moved the extraction and filtering to separate 'set' statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{% set escaped_description = column_dict[column_name]["description"] | replace("'", "\\'") %}
{% set column_comment_clause = "comment '" ~ escaped_description ~ "'" %}
resolved in #893 |
resolves #372
Problem
View column description is not persisted in view schema.
Solution
View column description is persisted as a column level comment.
Used this dbt-snowflake commit as an example.
Checklist
I was not able to run tests on it. Here is the PR with the test, but in tests it cannot call dbt-core macros.