Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: column comparison logic for check-strategy snapshots #5223

Merged

Conversation

jtcohen6
Copy link
Contributor

@jtcohen6 jtcohen6 commented May 9, 2022

resolves #5222
resolves dbt-labs/dbt-snowflake#154

Description

The logic in snapshot_check_all_get_existing_columns added by #4893 doesn't account for cases where the user-configured check_cols doesn't match the database's column quoting/casing.

This is almost always true on Snowflake, which is where we've been seeing test failures. The snapshot doesn't fail explicitly, but it always considers the column schema to have changed, so it adds new records every single time it runs.

I added a new test case, on top of the one added in #4893. It fails without the changes to the macro in this PR, and passes with them.

For a longer-term resolution, and to ensure this doesn't happen again, let's:

That's out of scope for the current PR, which is just trying to fix the nightly tests in dbt-snowflake, and preclude an unintended regression in v1.2.

Checklist

@jtcohen6 jtcohen6 requested a review from a team May 9, 2022 11:06
@jtcohen6 jtcohen6 requested review from a team as code owners May 9, 2022 11:06
@jtcohen6 jtcohen6 requested review from ChenyuLInx and VersusFacit May 9, 2022 11:06
@cla-bot cla-bot bot added the cla:yes label May 9, 2022
@leahwicz
Copy link
Contributor

leahwicz commented May 9, 2022

@jtcohen6 this is what is causing the failed scheduled test runs for us?

@jtcohen6
Copy link
Contributor Author

jtcohen6 commented May 9, 2022

@leahwicz I believe so. I'll kick off a dbt-snowflake PR to run against this branch to confirm.

Update: All checks passing in dbt-labs/dbt-snowflake#159 (CI)

@jtcohen6 jtcohen6 merged commit 3996a69 into main May 10, 2022
@jtcohen6 jtcohen6 deleted the fix/column-comparison-snapshot_check_all_get_existing_columns branch May 10, 2022 09:05
@ueshin
Copy link

ueshin commented May 11, 2022

Hi @jtcohen6, seems like this commit broke dbt-databrick's integration tests and I guess dbt-spark's, too.

[gw0] [100%] FAILED tests/functional/adapter/test_basic.py::TestSnapshotCheckColsDatabricks::test_snapshot_check_cols

==================================================================================================== FAILURES =====================================================================================================
____________________________________________________________________________ TestSnapshotCheckColsDatabricks.test_snapshot_check_cols _____________________________________________________________________________
[gw0] darwin -- Python 3.9.11 /.../bin/python

self = <test_basic.TestSnapshotCheckColsDatabricks object at 0x7fbb684f9e20>, project = <dbt.tests.fixtures.project.TestProjInfo object at 0x7fbbb8a081f0>

    def test_snapshot_check_cols(self, project):
        # seed command
        results = run_dbt(["seed"])
        assert len(results) == 2

        # snapshot command
        results = run_dbt(["snapshot"])
        for result in results:
            assert result.status == "success"

        # check rowcounts for all snapshots
        check_relation_rows(project, "cc_all_snapshot", 10)
        check_relation_rows(project, "cc_name_snapshot", 10)
        check_relation_rows(project, "cc_date_snapshot", 10)

        relation = relation_from_name(project.adapter, "cc_all_snapshot")
        result = project.run_sql(f"select * from {relation}", fetch="all")

        # point at the "added" seed so the snapshot sees 10 new rows
        results = run_dbt(["--no-partial-parse", "snapshot", "--vars", "seed_name: added"])
        for result in results:
            assert result.status == "success"

        # check rowcounts for all snapshots
>       check_relation_rows(project, "cc_all_snapshot", 20)

/.../dbt-core/tests/adapter/dbt/tests/adapter/basic/test_snapshot_check_cols.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

project = <dbt.tests.fixtures.project.TestProjInfo object at 0x7fbbb8a081f0>, snapshot_name = 'cc_all_snapshot', count = 20

    def check_relation_rows(project, snapshot_name, count):
        relation = relation_from_name(project.adapter, snapshot_name)
        result = project.run_sql(f"select count(*) as num_rows from {relation}", fetch="one")
>       assert result[0] == count
E       assert 30 == 20

/.../dbt-core/tests/adapter/dbt/tests/adapter/basic/test_snapshot_check_cols.py:15: AssertionError

@jtcohen6
Copy link
Contributor Author

@ueshin You're totally right. I'm digging in now.

agoblet pushed a commit to BigDataRepublic/dbt-core that referenced this pull request May 20, 2022
)

* Add test case

* Update comparison in snapshot_check_all_get_existing_columns

* Add changelog entry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants