Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] dbt enforced data_type checks doesn't check complex data types in Databricks #1155

Closed
2 tasks done
zeljkostojkovic opened this issue Dec 11, 2024 · 3 comments
Closed
2 tasks done
Labels
bug Something isn't working triage

Comments

@zeljkostojkovic
Copy link

Is this a new bug in dbt-spark?

  • I believe this is a new bug in dbt-spark
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When I define model like following, and execute dbt run command, command will succeed if metadata field is really of type STRUCT, but underlying keys in the struct and their corresponding data types won't be checked.

models:
  - name: test_model
    config:
      contract:
        enforced: true
        alias_types: false
    columns:
      - name: id
        data_type: STRING

      - name: metadata
        data_type: STRUCT<property_1 STRING, property_2 STRING>

SQL model:

  select
    id,
    CAST(metadata AS STRUCT<property_3 STRING> as metadata
  from {{ source('source', 'source_table') }}

I've noticed this when I tried testing and changing my data_type from STRUCT to MAP. In that case dbt run command retuned this error:

image

definition_type and contract_type are not taking into account values inside complex/nested structures.

Expected Behavior

I would expect for dbt to fail if contract is enforced, and complex data types like ARRAY, MAP and STRUCT are not fully described, or even better under contract configuration to have check_complex_types boolean field which can decide whether to check nested structures or not.

Steps To Reproduce

Using model configuration and similar SQL command (that contains STRUCT field) from above, execute dbt run command. Command will succeed no matter what underlying struct configuration is setup in data_type field (works with setting just STRUCT<>).

Relevant log output

No response

Environment

- OS:
- Python: 3.11.9
- dbt-core: dbt Cloud CLI - 0.38.15
- dbt-spark:

Additional Context

No response

@zeljkostojkovic zeljkostojkovic added bug Something isn't working triage labels Dec 11, 2024
@amychen1776
Copy link
Contributor

Hello @zeljkostojkovic - are you using the dbt-databricks adapter or dbt-spark to connect to Databricks?

@zeljkostojkovic
Copy link
Author

HI @amychen1776, I've installed dbt Cloud CLI locally, but based on manifest.json logs "adapter_type": "databricks", it's using dbt-databricks adapter.

@amychen1776
Copy link
Contributor

amychen1776 commented Dec 16, 2024

Thank @zeljkostojkovic ! Could you open this issue on dbt-databricks since this is specific to that adapter? I'm going to close this issue out for now since we should triage and address on dbt-databricks. I unfortunately can't transfer your issue since it's in the Databricks org.

@amychen1776 amychen1776 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage
Projects
None yet
Development

No branches or pull requests

2 participants