Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add percentage support for error_if, warn_if #5172

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .changes/unreleased/Features-20220427-095609.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
kind: Features
body: Add support for percentages in error_if, warn_if
time: 2022-04-27T09:56:09.707636+10:00
custom:
Author: jaypeedevlin ehmartens
Issue: "4723"
PR: "5172"
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,27 @@
{%- endmacro %}

{% macro default__get_test_sql(main_sql, fail_calc, warn_if, error_if, limit) -%}
{% set model_name = model.file_key_name.split('.')[1] %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two thoughts:

  • For tests defined on models, this should work. For tests defined on sources, file_key_name is sources.source_name (and missing the name of the specific source table), so I don't think it would work as currently constituted. We could update file_key_name to include sources.source_name.source_table.
  • If users have defined a where config, and we're calculating %, I think we would want to filter both the numerator and denominator by that where config. You could reproduce that logic here (config.get("where")), but all it all it seems preferable if there were some where to access the rendered version of {{ model }} (from the generic test query) or {{ test_metadata.kwargs.model }} (from the test node), since that's really the thing we want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point on the 'where' @jtcohen6, we totally overlooked that! We've added that in, but not pushed the change yet, while we wait to hear from the language team on how we might approach getting the non-model resources appropriately.

{% set warn_if, warn_pct_division = get_pct_division(warn_if, model_name) %}
{% set error_if, error_pct_division = get_pct_division(error_if, model_name) %}

select
{{ fail_calc }} as failures,
{{ fail_calc }} {{ warn_if }} as should_warn,
{{ fail_calc }} {{ error_if }} as should_error
{{ fail_calc }} {{ warn_pct_division }} {{ warn_if }} as should_warn,
{{ fail_calc }} {{ error_pct_division }} {{ error_if }} as should_error
from (
{{ main_sql }}
{{ "limit " ~ limit if limit != none }}
) dbt_internal_test
{%- endmacro %}


{% macro get_pct_division(threshold, model_name) %}
{% if threshold[-1] == '%' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you imagine someone wanting to define a % failure threshold for a singular test (as opposed to generic tests)? I can't think of one myself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We took a list at that this morning, I think I probably could think of someone wanting to do this (in a rare case), but it wasn't clear to me why the current approach wouldn't work (given that thresholds should work in the same way). Could you help me understand how that might work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case where a singular test depends on multiple nodes, I'm not sure how we can tell which node should represent the denominator?

I also think, in a singular test, you're writing all the SQL yourself, it's pretty trivial to add a / (select count(*) from total * 100 and define warn_if/error_if with percents-as-integers.

{% set threshold = threshold[:-1] %}
{% set pct_division = '/ (select count(*) from ' ~ ref(model_name) ~ ') * 100' %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of count(*) always, should this be {{ fail_calc }} for the denominator as well? That won't work in cases where the generic test query synthesizes a new column for use in the fail_calc, which isn't present in the underlying {{ model }}, but it feels better than assuming it's always a simple count.

{% else %}
{% set pct_division = '' %}
{% endif %}
{% do return((threshold, pct_division)) %}
{% endmacro %}