Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-642] Schema change detected for bigint columns (expected long) #357

Closed
francescomucio opened this issue May 13, 2022 · 5 comments · May be fixed by #358
Closed

[CT-642] Schema change detected for bigint columns (expected long) #357

francescomucio opened this issue May 13, 2022 · 5 comments · May be fixed by #358
Labels
bug Something isn't working good_first_issue Good for newcomers Stale

Comments

@francescomucio
Copy link

Describe the bug

Incremental model doesn't run with bigint column.

Steps To Reproduce

Create an incremental model on a table with a bigint column, run it.

Expected behavior

It should fail with a schema change error.

Screenshots and log output

        Source columns: [<SparkColumn device_id (string)>, <SparkColumn app (string)>, <SparkColumn event_date (string)>, <SparkColumn event_timestamp (bigint)>, <SparkColumn session_id (string)>, <SparkColumn merge_key (string)>]
        Target columns: [<SparkColumn device_id (string)>, <SparkColumn app (string)>, <SparkColumn event_date (string)>, <SparkColumn event_timestamp (long)>, <SparkColumn session_id (string)>, <SparkColumn merge_key (string)>]
 

System information

dbt-core 1.1.0
dbt-databricks 1.1.0
The operating system you're using:

The output of python --version:

Additional context

It looks like the relations cache is using show table extended while for the temp view dbt uses describe table they return different data types for the Biging columns.

@francescomucio francescomucio added bug Something isn't working triage labels May 13, 2022
@github-actions github-actions bot changed the title Schema change detected for bigint columns (expected long) [CT-642] Schema change detected for bigint columns (expected long) May 13, 2022
@jtcohen6 jtcohen6 removed the triage label May 13, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented May 13, 2022

@francescomucio Thanks for opening! Context from Slack thread here.

It looks like bigint and long are aliases for one another in Databricks SQL: https://docs.databricks.com/sql/language-manual/data-types/bigint-type.html

There's a specific fix we can make here: Add a mapping between those types within SparkColumn, for use in the translate_type method here:

@classmethod
def translate_type(cls, dtype: str) -> str:
return dtype

On other databases, we use that translate_type method to handle type aliasing. Check out the default implementation in dbt-core here.

the fix

I think this might be as simple as:

  • removing the translate_type override in dbt-spark, to use the default implementation instead
  • reimplement the type label mapping, to support just longbigint:
class SparkColumn(dbtClassMixin, Column):
    TYPE_LABELS: ClassVar[Dict[str, str]] = {
        "LONG": "BIGINT",
    }

Plus a test case, for exactly the experience you're having. We have existing integration tests for on_schema_change functionality in tests/integration/incremental_on_schema_change.

Is that a fix you'd be interested in contributing?

larger context

In v1.2, we are looking to move away from using show table extended for top-of-run caching, and use show tables + show views instead (much much faster: #342). That would then mean we always use describe table to power adapter.get_columns_in_relation, with consistent results.

@francescomucio
Copy link
Author

I can contribute with a quick fix, let me try :)

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@francescomucio
Copy link
Author

Please reopen @jtcohen6 @leahwicz

@leahwicz leahwicz reopened this Nov 28, 2022
@github-actions github-actions bot removed the Stale label Nov 29, 2022
@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Good for newcomers Stale
Projects
None yet
3 participants