Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3489] [Feature] Update run_results.json as each node finishes #9276

Closed
2 tasks done
sandeepmullangi2 opened this issue Dec 13, 2023 · 3 comments
Closed
2 tasks done
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core

Comments

@sandeepmullangi2
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

when i run dbt build --target dev run_results.json is available after entire run is completed or when i exit the process. The file is empty during dbt build. We use dbt-athena adapter to build our models in AWS Athena

Expected Behavior

run_results.json should be available during run and after each node is completed it should update run_results.json because this feature is already merged #7539

Steps To Reproduce

  1. create 5 different models calendar, calendar1, calendar2, calendar3, calendar4
  2. Paste following code in all files so that it gives us time to see if run_results is populating or not
  3. The code is
{{ config(
    materialized='table',
    table_type='hive',
    ha = true,
    s3_data_naming = "schema_table_unique",
    partitioned_by=['year'],
    format='parquet'
) }}


WITH base AS (
    SELECT DATE (t.date_seq) AS dt,
           DATE(t.date_seq) - interval '3' month AS fiscal_dt
    FROM (
            SELECT SEQUENCE (
                current_date - interval '50' year,
                current_date + interval '50' year,
                interval '1' day
            ) dates
        ),
        UNNEST(dates) AS t(date_seq)
) --CREATE A COLUMN WHERE EACH ROW IS A DATE

SELECT
    dt,
	CAST(DATE_FORMAT(dt , '%Y%m%d') AS bigint) AS date_key,
	QUARTER(dt) AS quarter,
	MONTH(dt) AS month,
	DAY(dt) AS day,
	YEAR(fiscal_dt) AS fiscal_year,
	QUARTER(fiscal_dt) AS fiscal_quarter,
    DAY_OF_WEEK(dt + interval '1' day)  AS day_of_week, {# In Athena it consider Monday as the first day while in GBQ we use sunday as the first day #}
	DATE_FORMAT(dt, '%M %d, %Y') AS full_date,
    DATE_FORMAT(dt, '%b') AS month_name,
    DATE_FORMAT(dt, '%a') AS day_name,
	WEEK_OF_YEAR(dt) AS week_of_year,
	WEEK_OF_YEAR(fiscal_dt) AS fiscal_week_of_year,
	DATE(DATE_TRUNC('month', dt)) AS first_day_of_month,
	LAST_DAY_OF_MONTH(dt) AS last_day_of_month,
	DATE(DATE_TRUNC('week', dt)) AS first_day_of_week,
	DATE_TRUNC('week', dt)+ interval '6' day AS last_day_of_week,
	DATE(DATE_TRUNC('quarter', dt)) AS first_day_of_quarter,
	DATE(DATE_TRUNC('quarter', dt)) + interval '3' month - interval '1' day AS last_day_of_quarter,
	DATE_DIFF('day',DATE_TRUNC('quarter', dt),dt + interval '1' day) AS day_of_quarter,
	DAY_OF_YEAR(dt) AS day_of_year,
	DAY_OF_YEAR(fiscal_dt) AS fiscal_day_of_year,
	DAY_OF_WEEK(dt) in (6, 7) AS is_weekend,
    YEAR(dt) AS year
FROM base
  1. Use command dbt build to run dbt
  2. I have a infinite while loop that is copying run_results from dbt folder to other folder to verify if there is any data in run_results.json. The last run_results.json file has data and all files created before it has 0 bytes
image

Relevant log output

No response

Environment

- OS: macos Ventura 13.6
- Python: python3.9
- dbt: 
Core:
  - installed: 1.7.3
  - latest:    1.7.3 - Up to date!

Plugins:
  - athena: 1.7.0 - Ahead of latest version!

Which database adapter are you using with dbt?

other (mention it in "Additional Context")

Additional Context

dbt-athena

@sandeepmullangi2 sandeepmullangi2 added bug Something isn't working triage labels Dec 13, 2023
@github-actions github-actions bot changed the title run_results.json is not available until entire run is completed [CT-3489] run_results.json is not available until entire run is completed Dec 13, 2023
@jaypeedevlin
Copy link
Contributor

See #8413

@jmriego
Copy link
Contributor

jmriego commented Dec 13, 2023

wouldn't it be possible to update the file every few minutes at least? our use case is that we have a write-audit-publish ETL so we would like to copy tables after they have been tested. There's no post-hook for tests which would have been our other option

@dbeatty10 dbeatty10 added enhancement New feature or request and removed bug Something isn't working labels Dec 13, 2023
@dbeatty10 dbeatty10 changed the title [CT-3489] run_results.json is not available until entire run is completed [CT-3489] [Feature] Update run_results.json as each node finishes Dec 13, 2023
@dbeatty10 dbeatty10 self-assigned this Dec 13, 2023
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @jmriego !

(And thanks for linking to #8413 @jaypeedevlin)

wouldn't it be possible to update the file every few minutes at least? our use case is that we have a write-audit-publish ETL so we would like to copy tables after they have been tested. There's no post-hook for tests which would have been our other option

At this time, we're not planning on writing run_results.json until the end of invocation. One reason is that that we had performance degradations when we were writing it upon completion of every node. Another reason is that it would be non-trivial to write the file at less frequent intervals, and it's not something we feel strongly about supporting in the foreseeable future.

Workarounds

To enable the workflow you were imagining with run_results.json, there's a many other ways you could do it instead.

Here's just a few:

  1. Use the structured logging events to see when nodes are finished
  2. Find an example of someone else doing write-audit-publish using dbt (like this) and copy their pattern
  3. Create additional nodes in your DAG to represent the "publish" steps

Here's a very simplified example of that final one:

models/write.sql

{{ config(materialized="table") }}

select null as id

models/audit.yml

models:
  - name: write
    columns:
      - name: id
        tests:
          - not_null

models/publish.sql

{{ config(materialized="table") }}

select * from {{ ref("write") }}

Example output when it fails the test (the final model is skipped an not (re-)built):

$ dbt build
22:09:09  Running with dbt=1.7.3
22:09:09  Registered adapter: postgres=1.7.3
22:09:09  Found 2 models, 1 test, 0 sources, 0 exposures, 0 metrics, 401 macros, 0 groups, 0 semantic models
22:09:09  
22:09:09  Concurrency: 5 threads (target='blue')
22:09:09  
22:09:09  1 of 3 START sql table model dbt_blue_1702505349.write ......................... [RUN]
22:09:10  1 of 3 OK created sql table model dbt_blue_1702505349.write .................... [SELECT 1 in 0.12s]
22:09:10  2 of 3 START test not_null_write_id ............................................ [RUN]
22:09:10  2 of 3 FAIL 1 not_null_write_id ................................................ [FAIL 1 in 0.06s]
22:09:10  3 of 3 SKIP relation dbt_blue_1702505339.publish ............................... [SKIP]
22:09:10  
22:09:10  Finished running 2 table models, 1 test in 0 hours 0 minutes and 0.41 seconds (0.41s).
22:09:10  
22:09:10  Completed with 1 error and 0 warnings:
22:09:10  
22:09:10  Failure in test not_null_write_id (models/audit.yml)
22:09:10    Got 1 result, configured to fail if != 0
22:09:10  
22:09:10    compiled Code at target/compiled/my_project/models/audit.yml/not_null_write_id.sql
22:09:10  
22:09:10  Done. PASS=1 WARN=0 ERROR=1 SKIP=1 TOTAL=3

But if you change models/write.sql to have this instead and re-run dbt build, then it will pass the test and publish the final version of the model:

select null as id

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 13, 2023
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Dec 13, 2023
@dbeatty10 dbeatty10 removed their assignment Dec 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

4 participants