This package builds a mart of tables from dbt artifacts loaded into a table. It is compatible with Snowflake only. The models are based off of the v1 schema introduced in dbt 0.19.0.
Models included:
dim_dbt__models
fct_dbt__model_executions
fct_dbt__latest_full_model_executions
fct_dbt__critical_path
fct_dbt_run_results
The critical path model determines the slowest route through your DAG, which provides you with the information needed to make a targeted effort to reducing dbt run
times. For example:
-
Add this package to your
packages.yml
following these instructions -
Configure the following variables in your
dbt_project.yml
:
vars:
dbt_artifacts:
dbt_artifacts_database: your_db # optional, default is your target database
dbt_artifacts_schema: your_schema # optional, default is 'dbt_artifacts'
dbt_artifacts_table: your_table # optional, default is 'artifacts'
models:
...
dbt_artifacts:
+schema: your_destination_schema
staging:
+schema: your_destination_schema
Note that the model materializations are defined in this package's dbt_project.yml
, so do not set them in your project.
- Run
dbt deps
.
This package requires that the source data exists in a table in Snowflake.
Snowflake makes it possible to load local files into your warehouse. We've included a number of macros to assist with this. This method can be used by both dbt Cloud users, and users of other orchestration tools.
-
To initially create these tables, execute
dbt run-operation create_artifact_resources
(source). This will create a stage and a table named{{ target.database }}.dbt_artifacts.artifacts
— you can override this name using the variables listed in the Installation section, above. -
Add operations to your production run to load files into your table, via the
upload_artifacts
macro (source). You'll need to specify which files to upload through use of the--args
flag. Here's an example setup.
$ dbt seed
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
$ dbt run
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [manifest, run_results]}'
$ dbt test
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [run_results]}'
$ dbt source snapshot-freshness
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [sources]}'
$ dbt docs generate
$ dbt run-operation upload_dbt_artifacts --args '{filenames: [catalog]}'
If you are using an orchestrator, you might instead upload these files to cloud storage — the method to do this will depend on your orchestrator. Then, link the cloud storage destination to a Snowflake external stage, and use a snowpipe to copy these files into the source table:
copy into ${snowflake_table.dbt_artifacts.database}.${snowflake_table.dbt_artifacts.schema}.${snowflake_table.dbt_artifacts.name}
from (
select
$1 as data,
$1:metadata:generated_at::timestamp_tz as generated_at,
metadata$filename as path,
regexp_substr(metadata$filename, '([a-z_]+.json)$') as artifact_type
from @${snowflake_stage.dbt_artifacts.database}.${snowflake_stage.dbt_artifacts.schema}.${snowflake_stage.dbt_artifacts.name}
)
file_format = (type = 'JSON')
The models will be picked up on your next dbt run
command. You can also run the package specifically with dbt run -m dbt_artifacts
.
Thank you to Tails.com for initial development and maintenance of this package. On 2021/12/20, the repository was transferred from the Tails.com GitHub organization to Brooklyn Data Co.
The macros in this package have been adapted from code shared by Kevin Chan and Jonathan Talmi of Snaptravel.
Thank you for sharing your work with the community!