[Feature] dbt microbatch, max event_time alternative to now()-lookback #11129
Labels
enhancement
New feature or request
microbatch
Issues related to the microbatch incremental strategy
triage
Is this your first time submitting a feature request?
Describe the feature
description
Currently for microbatch incremental strategy only way to handle latency between now() and real loaded time of data is to use
lookback
attribute in config.Assuming there is model with
microbatch
strategy andbatch_size=day & lookback=3
and we load dbt run --full-refresh on2024-10-07
(all batches till now()) once, and in subsequent days we load only batches fornow()-lookback
-> that will not cover cases where:I. Latency for some batch is unexpectedly greater than usual. (ref. D on pic attached)
II. There is a gap in
dbt run
time (ref. G, H on pic attached)Problem arises also when:
III. There is model getting data from few
refs
, each one could have different latencyproposed solution
introduce new attribute for
config
->max_event_time:True/False'
, default valueFalse
.before running batches it will get max event times from all sources tables and take
min
of them. Calculatesmin_of_max_event_time
parameter.Then it will run batches between calculated
min_of_max_event_time-lookback
andnow()
So
lookback
attribute could be used in both cases when configmax_event_time
is setTrue
andFalse
Note. #10702 is different because it is about only first run (
begin
)Describe alternatives you've considered
one workaround is to increase lookback , but in most of the time be waste of time and resources and can't be always 100% accurate
second is to create custom test and if missed
event_time
is detected run it using--event-time-start
&--event-time-end
flags. This introduce though troublesome additional maintenance time.Who will this benefit?
Teams which use big tables with random latency time for loading data & they want minimize maintenance time.
Are you interested in contributing this feature?
yes
Anything else?
No response
The text was updated successfully, but these errors were encountered: