-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Add time_zone
configuration for source freshness checks
#11066
Comments
I propose the following configuration as a solution.
dbt/include/global_project/macros/adapters/freshness.sql
|
Thanks for reaching out @Benjamin0313 ! Could you provide the SQL for some example source data so we can try to replicate what you are seeing? What happens if you do Is the issue that the data type of If |
Thank you for your reply @dbeatty10 !
Below are the SQL script to create the data source and the YAML file for running the freshness test: _create_sample_data.sql
_source.yaml
Due to time zone differences, it’s not possible to accurately verify the freshness of the data.
Even when replacing the data type with TIMESTAMP_NTZ or TIMESTAMP_TZ, TEST_DATE is still retrieved in Japan’s time zone, while the current time is retrieved in UTC. Example with TIMESTAMP_LTZ:
Example with TIMESTAMP_TZ:
output query in snowflake TIMESTAMP_LTZ and TIMESTAMP_TZ
I thought the reason for explicitly using |
@Benjamin0313 It looks to me like the time zone can be automatically converted. Could you try the following? StepsCreate these files:
{{ config(materialized="view") }}
select
right(current_timestamp(), 5) as session_tz_offset,
-- JST: Asia/Tokyo
convert_timezone('Asia/Tokyo', {{ dbt.current_timestamp() }}) as now_offset_Tokyo,
{{ dbt.dateadd(datepart="minute", interval=-60, from_date_or_timestamp="now_offset_Tokyo") }} as hour_ago_tz_Tokyo,
hour_ago_tz_Tokyo::timestamp_ntz as hour_ago_ntz_Tokyo,
hour_ago_tz_Tokyo::timestamp_ltz as hour_ago_ltz_Tokyo,
-- PST: America/Los_Angeles
convert_timezone('America/Los_Angeles', {{ dbt.current_timestamp() }}) as now_offset_Los_Angeles,
{{ dbt.dateadd(datepart="minute", interval=-60, from_date_or_timestamp="now_offset_Los_Angeles") }} as hour_ago_tz_Los_Angeles,
hour_ago_tz_Los_Angeles::timestamp_ntz as hour_ago_ntz_Los_Angeles,
hour_ago_tz_Los_Angeles::timestamp_ltz as hour_ago_ltz_Los_Angeles,
1 as id
sources:
- name: test_project
database: "{{ target.database }}"
schema: "{{ target.schema }}"
tables:
- name: sample_table
freshness:
warn_after: {count: 59, period: minute}
error_after: {count: 60, period: minute}
loaded_at_field: "hour_ago_tz_tokyo" Run these commands: dbt build --select sample_table
dbt show --inline "select * from {{ source('test_project', 'sample_table') }}" --output json
dbt source freshness Get this output:
Brief explanationThe source is using the same fully-qualified database name as the By materializing that model as a view, it is always up-to-date with the current time. I set the warning level to be one minute below the error level. This allows us to confirm that the value of More detailIt works for me if I use either of these:
loaded_at_field: "hour_ago_tz_tokyo" or loaded_at_field: "hour_ago_ltz_tokyo" But it doesn't work for me if I use a naive datetime using the local time for loaded_at_field: "hour_ago_ntz_tokyo" |
Thank you for your response. @dbeatty10 As you mentioned, the above method indeed allows for data freshness testing. Based on your explanation, it seems that there is a step to convert the timezone to UTC when performing the data freshness test. Therefore, I think it would be necessary to use a view temporarily or add a column for timezone conversion, such as:
However, converting the timezone to UTC every time for the purpose of data freshness testing can be a bit cumbersome. Hence, I believe a better approach would be to specify the timezone in the model.yaml.
By specifying it this way, there is no need to perform timezone conversion every time during data freshness testing. Avoiding unnecessary processing within the model and handling data freshness conversion in the YAML ensures that no redundant logic is introduced into the model itself, making it a very elegant solution. What do you think of this approach? |
time_zone
configuration for source freshness checks
@Benjamin0313 I converted this to a feature request since it appears that dbt is behaving as we'd expect, and you'd like to introduce a new Just to make sure I understand, is the crux of your problem that you have datetimes representing the local time in And since As shown in #11066 (comment), source freshness checks using |
@dbeatty10 Even with TIMESTAMP_TZ or TIMESTAMP_LTZ, the timestamps remain in local time, and no conversion to UTC is performed. Below are the observations from my environment: When creating a source table using TIMESTAMP_TZ: create or replace table D_HARATO_DB.DBT_TEST.DBT_FRESHNESS_TEST as
-- timestamp_tz
select '2024-12-02'::timestamp_tz as loaded_date, 'id' as id
;
describe table D_HARATO_DB.DBT_TEST.DBT_FRESHNESS_TEST; The describe command confirms that the timezone is set to TIMESTAMP_TZ. When running dbt source freshness, the following query is executed on Snowflake. This behavior is the same for TIMESTAMP_LTZ as well. Therefore, even when specifying a timezone, there is no implicit timezone conversion to UTC. |
@Benjamin0313 Based on my experiments in #11066 (comment), I don't think that adding a But I'm not sure why you are seeing something different 🤔 After following the example in #11066 (comment), could you do the following and share the output you get? Create this file:
Run this command: dbt show -s analyses/freshness_query.sql --output json I get the following output, which shows the timezone-aware UTC offset of
|
Is this a new bug in dbt-core?
Current Behavior
Currently, when running dbt source freshness tests in a timezone other than UTC, a query like the following is compiled:
In this case, the execution order to obtain max_loaded_at is as follows:
• Convert the timezone.
• Retrieve the maximum value of created_at_local in the converted timezone.
At this stage, the timezone conversion is performed for all records, and the maximum date is then calculated from the converted timestamps. As a result, the table is fully scanned once, and aggregation is performed.
As a result, when the number of records is large, the data freshness test takes a significant amount of time.
Expected Behavior
The ideal steps would be as follows:
• First, retrieve the latest date from the table.
• Then, perform the timezone conversion.
Steps To Reproduce
dbt source freshness
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
This is the profile after running the freshness test. It scans the two records created and applies timezone conversion and retrieves the latest date from the scan results.
The text was updated successfully, but these errors were encountered: