-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to infer schemas on BigQuery #249
Comments
Willing to create a PR of this, assuming the fix is as straightforward as it seems. |
I second this feature! |
For me this feels very conflicting, you want to infer the schema automatically but do want to manually add the column names for documentation. I see two problems with this setup: The inferred schema in BigQuery might be different than the columns you put in manually so the documentation no longer reflects reality. |
@jbergeskans I agree that manually specifying column names and datatypes can be a pain! I looked at #259, but the implementation is different from that of Snowflake's. For dbt-snowflake's version, if
I agree with @thomas-vl and am conflicted about your implementation. However, I'm not in love with the precedent that is Snowflake's implementation either. @jbergeskans, am I right in guessing that if the What you are not looking for is something that would allow for schema evolution of an external table, as in: if actual external files have one of the below changes, then dbt shouldn't flinch and make the table accordingly:
My gut tells me that we DON'T want auto-schema evolution -- data changing silently should not be a feature right? I've also opened dbt-labs/dbt-codegen#173, this feels like the "right solution"™️ to me. |
@dataders fully agree that silently changing the schema is not something that should ever be allowed. And yes, I would be happy enough if this was generated for me so that I could easily go in and add tests / apply policy tags / add constraints where needed. In this case what we had was a spreadsheet with column names and their descriptions. So I wrote a script to generate the yaml file based on the spreadsheet and we ran into this issue. But if I could've get the field names and data types generated for me, I'd 100% take that. |
Describe the feature
Allow for the BigQuery plugin to infer schemas in the case that the user has chosen to document the external table for dbt docs (descriptions, constraints etc) but does not wish to also add data types for all columns.
Describe alternatives you've considered
Additional context
Currently, if a user adds column descriptions to a sources file, the package will generate a table creation query using the column information available in the sources file. If no
data_type
is set, the creation of the table will fail with the error message:This is because the generated query will look like this:
Which happens because of this code block
This solution is already implemented in the Snowflake plugin and I don't see that the BigQuery fix needs to be more complicated than allowing to pass in
infer_schema
and checking its truthiness.Who will this benefit?
This will allow for users to infer the schema while checking descriptions and constraints in the dbt docs.
The text was updated successfully, but these errors were encountered: