Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Refactor snowflake to use spmc abstractions #26900

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

tomasfarias
Copy link
Contributor

@tomasfarias tomasfarias commented Dec 13, 2024

Problem

Refactor Snowflake batch export to use fully-async SPMC abstractions. Big part of the work was done in previous PRs, so there isn't much to do here.

Changes

  1. Implement new SnowflakeConsumer class
  2. Move flush function to flush method of SnowflakeConsumer.
  3. Call SnowflakeConsumer with run_consumer_loop in S3 batch export.
  4. Remove file_no from SnowflakeHeartbeatDetails (and associated tests). Since every consumer writes one file, we don't have a need to add a file_no to each one. Every file name is unique as NamedTemporaryFile generates a new name for each one.

TODO:

  • Wondering if we could use parquet here too?

👉 Stay up-to-date with PostHog coding conventions for a smoother review.

Does this work well for both Cloud and self-hosted?

How did you test this code?

Manually ran Snowflake unit tests, and everything passed (with the exception of the two tests that had to be fixed.

@tomasfarias tomasfarias force-pushed the refactor/snowflake-batch-export-with-spmc-abstractions-2 branch from beb597e to f920a38 Compare December 13, 2024 15:28
@tomasfarias tomasfarias requested a review from rossgray December 13, 2024 16:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant