[CT-701] Store test pass and failures using --store-failures flag #5313

brittianwarner · 2022-05-31T21:30:08Z

Is this your first time opening an issue?

I have read the expectations for open source contributors

Describe the Feature

As a dbt user, I would like to store all tests including passes and failures. As part of a DQ reporting initiative, we want to be able to see percentage of successful tests compared to failures. We know we could possible customize the output to do this but am wondering if this can be built into the --store-failures flag and possibly a field in the created table for status (pass vs failure).

Describe alternatives you've considered

Create a service that reads dbt output to store test results.

Who will this benefit?

Anyone reporting on data quality in a regulated environment.

Are you interested in contributing this feature?

Yes the best I can.

Anything else?

This is more for individuals who have a requirement to report on DQ metrics. It seems like most of the functionality is in place, however it is not clear on whether we can store successful tests along with failures. Please, advise if I am missing something.

The text was updated successfully, but these errors were encountered:

brittianwarner · 2022-05-31T21:32:31Z

Ref: dbt-labs/dbt-snowflake#164

jtcohen6 · 2022-06-01T08:56:47Z

@brittianwarner Thanks for reopening here! Agree that this is the right place to continue the conversation.

For the specific use case you've outlined:

we want to be able to see percentage of successful tests compared to failures.

I think leveraging dbt's metadata (run_results.json + structured logs) is the surest way to go here. That's the most concise, reliable, immutable way to track up/down test success over time, and calculate percentages. It won't give you access to the underlying data (e.g. which actual rows in the table are most frequently failing) — if that's the goal, it would be interesting to think about how we might extend existing metadata collection to support it.

We've talked about the idea before of a centralized table, containing all that test metadata (linked in #4624). My preference remains to see this information made available via metadata, possibly synced from a dbt metadata service into the analytical database. In the meantime, I know that some folks have taken matters into their own hands.

It is the current behavior of dbt to create tables for all tests with --store-failures enabled. If the tests return zero rows, those tables will be empty, but they will remain in the *_dbt_test__audit schema. #4973 proposes that they should be dropped, instead of left in place.

possibly a field in the created table for status (pass vs failure)

In the general case, an empty table (zero rows) means pass, a table with nonzero rows means failure. But this doesn't account for tests with alternative fail_calc, or with custom warn_if/error_if thresholds.

My inclination is to:

Encourage you to weigh in on the conversation in [CT-93] [Feature] Enrich data from --store-failures #4624, as to whether "pass" tests should actually drop the tables created by --store-failures
Close this issue, as the targeted use case is better resolved by metadata, with custom-code workarounds in the meantime

Am I missing any important pieces of the puzzle?

hiattp · 2022-06-22T14:51:06Z

@dlawrences ^

dlawrences · 2022-06-24T06:30:38Z

Hey @jtcohen6

The approach we've taken for now is the following:

reimplement any test we use (both generic tests of dbt and tests in other packages) to return all records in a model passing the test, as well as all records in the model failing it at the same time (alongside the model data, we are also sticking some metadata columns that we would use for further handling e.g. __dbt_passed_test, __dbt_test_name, __dbt_test_model, __dbt_test_facet - the latter being the aspect of the model being tested)

override default dbt macro

dbt-core/core/dbt/include/global_project/macros/materializations/tests/helpers.sql

Lines 1 to 3 in 1cfc085

    
           {% macro get_test_sql(main_sql, fail_calc, warn_if, error_if, limit) -%} 
        
             {{ adapter.dispatch('get_test_sql', 'dbt')(main_sql, fail_calc, warn_if, error_if, limit) }} 
        
           {%- endmacro %}

to call a custom macro instead of directly wrapping the compiled_test_sql into a subquery for reporting purposes; the custom macro does the following overall:

{% macro
    create or replace temporary table {{ a_target_table_for_the_results_of_the_current_test_in_the_current_execution }}
    {{ main_sql }};

    insert into {{ central_table_for_persisting_tests }}
    select {{ columns_transformations_to_conform_test_results }}
    from {{ a_target_table_for_the_results_of_the_current_test_in_the_current_execution }};

    select
      {{ fail_calc }} as failures,
      {{ fail_calc }} {{ warn_if }} as should_warn,
      {{ fail_calc }} {{ error_if }} as should_error
      from (
        select *
        from {{ a_target_table_for_the_results_of_the_current_test_in_the_current_execution }}
        where __dbt_test_passed = false
        {{ "limit " ~ limit if limit != none }}
      ) dbt_internal_test
{% endmacro %}

the custom macro manages to execute only once the compiled_test_sql (referenced above as main_sql) of the test which is something we explicitly wanted

While working on this, we felt that a potential override in https://github.com/dbt-labs/dbt-core/blob/main/core/dbt/include/global_project/macros/materializations/tests/test.sql would have been slightly better since we would want to synchronise the work that dbt is doing in

dbt-core/core/dbt/include/global_project/macros/materializations/tests/test.sql

Lines 5 to 33 in 1cfc085

    
           {% if should_store_failures() %} 
        
             {% set identifier = model['alias'] %} 
        
             {% set old_relation = adapter.get_relation(database=database, schema=schema, identifier=identifier) %} 
        
             {% set target_relation = api.Relation.create( 
        
                 identifier=identifier, schema=schema, database=database, type='table') -%} %} 
        
             {% if old_relation %} 
        
                 {% do adapter.drop_relation(old_relation) %} 
        
             {% endif %} 
        
             {% call statement(auto_begin=True) %} 
        
                 {{ create_table_as(False, target_relation, sql) }} 
        
             {% endcall %} 
        
             {% do relations.append(target_relation) %} 
        
             {% set main_sql %} 
        
                 select * 
        
                 from {{ target_relation }} 
        
             {% endset %} 
        
             {{ adapter.commit() }} 
        
           {% else %} 
        
               {% set main_sql = sql %} 
        
           {% endif %}

with the overall changes, however the issue we've faced is that get_test_sql is too aggressive in expecting only failures to be reported out of the main_sql, so we need to work around that.

Is there any possibility to expose some configuration that can be passed down to get_test_sql and expected out of the general test reporting to:

allow, on configuration, get_test_sql to accept the test reporting both passing records and failing records
request that a test reports the result of records using some {{ metadata_column }} that is configured ahead of time or defaulted to some standard one

If this was the case, then the override may be limited to only ensuring that tests report all that data back, and then handling the persistence of failing/passing/both type of records in the test materialization.

Some additional notes

The way that dbt is currently set-up, no pre-handling/post-handling of the test results in the test block can be done to ensure persistence and single test execution at the same time since the immediate output of the test block is expected to be a SQL query that will be enriched by dbt with CTEs related to ephemeral models etc.

Hence any type of SQL statement that is prepended, for example, in the test block, will get lumped together in a weird way once dbt has done CTE injection and other operations. Also, any SQL statement that is prepended has to be compatible with what get_test_sql is going to do with the main_sql, which is sticking it in a subquery, thus no create table or anything similar can be prepended at the test block time.

github-actions · 2022-12-22T01:57:32Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2022-12-29T01:58:24Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

brittianwarner added enhancement New feature or request triage labels May 31, 2022

github-actions bot changed the title ~~Store test pass and failures using —store-failures flag~~ [CT-701] Store test pass and failures using —store-failures flag May 31, 2022

jtcohen6 mentioned this issue Jun 1, 2022

[CT-689] Store Successful Tests along with Failures using --store-failure flag dbt-labs/dbt-snowflake#164

Closed

jtcohen6 added dbt tests Issues related to built-in dbt testing functionality and removed triage labels Jun 1, 2022

dlawrences mentioned this issue Jun 24, 2022

[CT-93] [Feature] Enrich data from --store-failures #4624

Closed

1 task

github-actions bot added the stale Issues that have gone stale label Dec 22, 2022

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 29, 2022

dbeatty10 changed the title ~~[CT-701] Store test pass and failures using —store-failures flag~~ [CT-701] Store test pass and failures using --store-failures flag Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CT-701] Store test pass and failures using --store-failures flag #5313

[CT-701] Store test pass and failures using --store-failures flag #5313

brittianwarner commented May 31, 2022

brittianwarner commented May 31, 2022

jtcohen6 commented Jun 1, 2022

hiattp commented Jun 22, 2022

dlawrences commented Jun 24, 2022 •

edited

Loading

github-actions bot commented Dec 22, 2022

github-actions bot commented Dec 29, 2022

[CT-701] Store test pass and failures using --store-failures flag #5313

[CT-701] Store test pass and failures using --store-failures flag #5313

Comments

brittianwarner commented May 31, 2022

Is this your first time opening an issue?

Describe the Feature

Describe alternatives you've considered

Who will this benefit?

Are you interested in contributing this feature?

Anything else?

brittianwarner commented May 31, 2022

jtcohen6 commented Jun 1, 2022

hiattp commented Jun 22, 2022

dlawrences commented Jun 24, 2022 • edited Loading

Some additional notes

github-actions bot commented Dec 22, 2022

github-actions bot commented Dec 29, 2022

dlawrences commented Jun 24, 2022 •

edited

Loading