Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for new materialization to enable real-time modeling #136

Open
leonard-henriquez opened this issue Aug 26, 2023 · 1 comment
Labels
category:models Related to the models in the package. priority:low Not on the roadmap. type:enhancement New features or improvements to existing features.

Comments

@leonard-henriquez
Copy link

Is your feature request related to a problem? Please describe.

In the current state of DBT Snowplow, if you want to get recent events, you need to run dbt run to process new data.
This package offers the "incremental" materialization option to process only new events and not every event with each run.
However, this approach still makes it challenging to have fresh data with low latency (<1 minute).

For instance, let's take an example:

  • 08:40 am: an event is triggered in a browser
  • 08:40 am: the Snowplow collector validates and enriches the event, then sends it to a stream
  • 08:41 am: the event is stored in the data warehouse
  • 08:45 am: a DBT job that runs every 5 minutes starts
  • 08:47 am: the DBT job finishes running my custom model (that depends on snowplow_web_base_events_this_run)

So, my data is only available at 08:47 am.
There are delays that are very hard to compress because we can't realistically run DBT jobs every second, and the DBT job takes a few minutes to complete.

Describe the solution you'd like

We could take advantage of the "lambda view" pattern and introduce a new materialization option that would benefit from materialized views and dynamic tables (for Snowflake).

Describe alternatives you've considered

Running DBT more frequently, but it's costly.

Are you interested in contributing towards this feature?

I am willing to help, but I am a newbie in DBT. I've tried to modify the materialization but didn't succeed in making it work.
However, I've found interesting resources that can help:

@leonard-henriquez leonard-henriquez added the type:enhancement New features or improvements to existing features. label Aug 26, 2023
@github-actions github-actions bot added the status:needs_triage Needs maintainer triage. label Aug 26, 2023
@miike miike added category:models Related to the models in the package. and removed status:needs_triage Needs maintainer triage. labels Aug 27, 2023
@miike
Copy link

miike commented Aug 27, 2023

For anyone landing here - here is a thread on the issue: https://discourse.snowplow.io/t/data-modeling-in-real-time/8978

This is something we will likely look into but it's not currently on the immediate road map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category:models Related to the models in the package. priority:low Not on the roadmap. type:enhancement New features or improvements to existing features.
Projects
None yet
Development

No branches or pull requests

3 participants