Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Snowflake Dedupe. #500

Merged
merged 15 commits into from
Apr 24, 2024
Merged

Improve Snowflake Dedupe. #500

merged 15 commits into from
Apr 24, 2024

Conversation

Tang8330
Copy link
Contributor

@Tang8330 Tang8330 commented Apr 24, 2024

Improving our Snowflake dedupe process which will minimize data movement from the target table, which will execute faster and more reliably.

@Tang8330 Tang8330 marked this pull request as ready for review April 24, 2024 17:04
@Tang8330 Tang8330 requested a review from nathan-artie April 24, 2024 17:04
return err
var parts []string
parts = append(parts, fmt.Sprintf("CREATE OR REPLACE TRANSIENT TABLE %s AS (SELECT * FROM %s QUALIFY ROW_NUMBER() OVER (PARTITION BY by %s ORDER BY %s) = 2)",
stagingTableID.Table(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use stagingTableID.FullyQualifiedName() or escape stagingTableID.Table() with sql.EscapeName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a transient table, it doesn't belong to a schema or database.

for _, pk := range orderColsToIterate {
orderByCols = append(orderByCols, fmt.Sprintf("%s ASC", pk))
}

fqTableName := tableID.FullyQualifiedName()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tableID.FullyQualifiedName() is so short now that it doesn't take arguments we could just inline it instead of assigning to fqTableName.


parts = append(parts, fmt.Sprintf("DELETE FROM %s t1 USING %s t2 WHERE %s",
fqTableName,
stagingTableID.Table(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto escape or use FullyQualifiedName().

{
// Dedupe with one primary key + no `__artie_updated_at` flag.
tableID := NewTableIdentifier("db", "public", "customers")
stagingTableID := shared.TempTableID(tableID, strings.ToLower(stringutil.Random(5)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be nice to have two methods TempTableID and TempTableIDWithSuffix.

@Tang8330
Copy link
Contributor Author

Tang8330 commented Apr 24, 2024

@nathan-artie Take another look? Should be good

Copy link
Contributor

@nathan-artie nathan-artie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Tang8330 Tang8330 merged commit 799e4c0 into master Apr 24, 2024
1 check passed
@Tang8330 Tang8330 deleted the snowflake-dedupe branch April 24, 2024 21:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants