-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Persist source table and column descriptions #248
Comments
I was looking for a way in dbt to add table and column descriptions to external tables in BigQuery as well. The BigQuery DDL supports descriptions for external tables: CREATE EXTERNAL TABLE `project`.`dataset`.`table_name` (
order_id INTEGER OPTIONS(description = "Unique identifier of the order (primary key).")
)
OPTIONS (
format = 'PARQUET',
uris = ['gs://gcs-name/orders.parquet'],
description = "Contains all order line items."
); In a naive approach, the The following mock code changes might illustrate this approach: Add column descriptionsdbt-external-tables/macros/plugins/bigquery/create_external_table.sql Lines 19 to 25 in 21428bc
create or replace external table {{source(source_node.source_name, source_node.name)}}
{%- if columns -%}(
{% for column in columns %}
{%- set column_quoted = adapter.quote(column.name) if column.quote else column.name %}
+ {%- set column_options = 'options(description = "' + column.description + '")' if 'description' in column else '' %}
- {{column_quoted}} {{column.data_type}} {{- ',' if not loop.last -}}
+ {{column_quoted}} {{column.data_type}} {{column_options}} {{- ',' if not loop.last -}}
{%- endfor -%}
) Add table descriptiondbt-external-tables/macros/plugins/bigquery/create_external_table.sql Lines 37 to 48 in 21428bc
options (
uris = [{%- for uri in uris -%} '{{uri}}' {{- "," if not loop.last}} {%- endfor -%}]
+ {%- if 'description' in external %}
+ , description = "{{external.description}}"
+ {%- endif -%}
{%- if options is mapping -%} Another approach would be to add the Do you have any thoughts on it? Just add descriptions always, or consider (currently not existing) config options? |
I've started on a fix setting descriptions on columns in this PR: |
@LoHertel, I agree with your instinct that persisting source table descriptions is specific to the dbt-external-tables package, and would lean toward building the capability here rather than in dbt-core. External source tables seem like an exception to the rule, in that they can be configured within and produced by dbt. All other source tables are simply taken as inputs. |
My PR fixes it in the same way as it works on the Provider without additional configuration. |
resolved by: #252 @katieclaiborne do you want to get the new beta release a try? |
@dataders, new handle, same me! Apologies for the delay, but I've confirmed that external table column descriptions are persisted in BigQuery on version 0.9.0. Thanks again for this work! |
@dataders, awesome, that column descriptions for BigQuery are available now 🚀 This issue addresses table descriptions as well, which have not been introduced by #252. By table description I mean the string, which describes a source table in a yaml property file: version: 2
sources:
- name: olist
tables:
- name: orders
description: |
Contains the order head information.
Each order could contain multiple line items, which are stored in the order_items table.
external:
location: 'gs://gcs-name/orders.parquet/*'
options:
format: 'PARQUET'
hive_partition_uri_prefix: 'gs://gcs-name/orders.parquet'
require_hive_partition_filter: false
partitions:
- name: year
data_type: INTEGER
columns:
- name: order_id
description: Unique identifier of the order (primary key).
data_type: STRING It would be great, if the table description is added to the CREATE EXTERNAL TABLE `project`.`dataset`.`orders` (
order_id STRING OPTIONS(description = "Unique identifier of the order (primary key).")
)
WITH PARTITION COLUMNS (
year INTEGER
)
OPTIONS (
uris = ['gs://gcs-name/orders.parquet/*'],
format = 'PARQUET',
hive_partition_uri_prefix = 'gs://gcs-name/orders.parquet',
require_hive_partition_filter = false,
description = 'Contains the order head information.\nEach order could contain multiple line items, which are stored in a seperate table.'
); The Would this be a sensible way of implementing it? |
Describe the feature
If source table and column descriptions are configured in dbt, I'd like for them to be persisted in the database when the
stage_external_sources
operation is run.Describe alternatives you've considered
Configuring source table and column descriptions does make them visible in dbt Docs and dbt Explorer, which is helpful.
Additional context
We're working in BigQuery, but I could imagine similar use cases in other databases.
I'm not sure if this is a known limitation, given that the persist_docs config is not implemented for sources. It could also be that we've just missed a step in configuring and staging our external sources!
Who will this benefit?
This feature will be helpful for users discovering sources through the database, rather than dbt Docs or dbt Explorer.
The text was updated successfully, but these errors were encountered: