-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add percentage support for error_if, warn_if #5172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a first crack at this @jaypeedevlin @ehmartens!
I'm going to tag in the Language
team to help here, since I think the real question is actually around the metadata that generic tests can make available within their materializations. Ideally, we'd have access to the rendered form of "{{ get_where_subquery(<ref_or_source>) }}"
, i.e. that which gets passed in as the {{ model }}
argument to generic test definitions.
@@ -3,12 +3,27 @@ | |||
{%- endmacro %} | |||
|
|||
{% macro default__get_test_sql(main_sql, fail_calc, warn_if, error_if, limit) -%} | |||
{% set model_name = model.file_key_name.split('.')[1] %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two thoughts:
- For tests defined on models, this should work. For tests defined on sources,
file_key_name
issources.source_name
(and missing the name of the specific source table), so I don't think it would work as currently constituted. We could updatefile_key_name
to includesources.source_name.source_table
. - If users have defined a
where
config, and we're calculating%
, I think we would want to filter both the numerator and denominator by thatwhere
config. You could reproduce that logic here (config.get("where")
), but all it all it seems preferable if there were some where to access the rendered version of{{ model }}
(from the generic test query) or{{ test_metadata.kwargs.model }}
(from the test node), since that's really the thing we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point on the 'where' @jtcohen6, we totally overlooked that! We've added that in, but not pushed the change yet, while we wait to hear from the language team on how we might approach getting the non-model resources appropriately.
|
||
|
||
{% macro get_pct_division(threshold, model_name) %} | ||
{% if threshold[-1] == '%' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you imagine someone wanting to define a %
failure threshold for a singular test (as opposed to generic tests)? I can't think of one myself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We took a list at that this morning, I think I probably could think of someone wanting to do this (in a rare case), but it wasn't clear to me why the current approach wouldn't work (given that thresholds should work in the same way). Could you help me understand how that might work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case where a singular test depends on multiple nodes, I'm not sure how we can tell which node should represent the denominator?
I also think, in a singular test, you're writing all the SQL yourself, it's pretty trivial to add a / (select count(*) from total * 100
and define warn_if
/error_if
with percents-as-integers.
I played with some code in #5214. It doesn't handle the rendering bit yet, but thought I'd get feedback on whether this is what we're thinking of. |
I took another look at this, and we're creating the same strings in the generic_test_builder, and rendering them in 'add_rendered_test_kwargs'. Then at compilation time they are added to the context with the top-level key "_dbt_generic_test_kwargs", and the key 'model'. So I'm wondering if we can use that instead making a new function. That rendered 'model' string is used by 'build_raw_sql' in the generic_test_builder which builds the test sql. |
@gshank Really good point! This "just works" for me if I start storing Granted, I'm taking a pretty simple approach here, and just overwriting the raw version: 2
models:
- name: my_model
columns:
- name: id
tests:
- unique:
warn_if: "=5%"
- not_null:
config:
where: 1=1
error_if: ">5%" At runtime (using BigQuery):
|
{% macro get_pct_division(threshold, model_name) %} | ||
{% if threshold[-1] == '%' %} | ||
{% set threshold = threshold[:-1] %} | ||
{% set pct_division = '/ (select count(*) from ' ~ ref(model_name) ~ ') * 100' %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of count(*)
always, should this be {{ fail_calc }}
for the denominator as well? That won't work in cases where the generic test query synthesizes a new column for use in the fail_calc
, which isn't present in the underlying {{ model }}
, but it feels better than assuming it's always a simple count.
I'm going to close this PR for now, since we're not actively working on it. |
Hi All, I found a workaround with this by writing a generic test in dbt and then referencing it in the yml files like any other ordinary relationships test: This one has a 1% error threshold and works for BigQuery. Feel free to change as you see fit.
|
resolves #4723, in collaboration with @ehmartens
Description
Adds support for using percentages in the
error_if
andwarn_if
configurations.We'd love some feedback on the following:
Checklist
changie new
to create a changelog entry