From e0697d7b7c18d64b6841e321c675cac95cec38bd Mon Sep 17 00:00:00 2001 From: Matt Shaver <60105315+matthewshaver@users.noreply.github.com> Date: Thu, 12 Dec 2024 13:42:49 -0500 Subject: [PATCH] Enhancing node selection and state selector --- .../blog/2022-04-14-add-ci-cd-to-bitbucket.md | 2 +- website/docs/docs/build/exposures.md | 2 +- website/docs/docs/build/groups.md | 2 +- .../docs/collaborate/govern/model-versions.md | 2 +- .../core-upgrade/07-upgrading-to-v1.8.md | 2 +- .../core-upgrade/09-upgrading-to-v1.6.md | 2 +- .../11-Older versions/13-upgrading-to-v1.3.md | 2 +- .../11-Older versions/14-upgrading-to-v1.2.md | 2 +- .../11-Older versions/15-upgrading-to-v1.1.md | 2 +- .../11-Older versions/upgrading-to-0-17-0.md | 2 +- website/docs/docs/deploy/ci-jobs.md | 2 +- website/docs/guides/set-up-ci.md | 4 +- .../docs/reference/node-selection/methods.md | 316 +++++++++--------- .../state-comparison-caveats.md | 2 +- .../docs/reference/node-selection/syntax.md | 8 +- .../resource-properties/latest_version.md | 2 +- 16 files changed, 175 insertions(+), 179 deletions(-) diff --git a/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md b/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md index e871687d8cd..381a457d855 100644 --- a/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md +++ b/website/blog/2022-04-14-add-ci-cd-to-bitbucket.md @@ -197,7 +197,7 @@ Reading the file over, you can see that we: In summary, anytime anything is pushed to main, we’ll ensure our production database reflects the dbt transformation, and we’ve saved the resulting artifacts to defer to. -> ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#the-state-method), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details. +> ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#state), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details. ### Slim Continuous Integration: Retrieve the artifacts and do a state-based run diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index 16dfd0e5f73..a3ac7bcb3ce 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -77,5 +77,5 @@ When we generate the [dbt Explorer site](/docs/collaborate/explore-projects), yo ## Related docs * [Exposure properties](/reference/exposure-properties) -* [`exposure:` selection method](/reference/node-selection/methods#the-exposure-method) +* [`exposure:` selection method](/reference/node-selection/methods#exposure) * [Data health tiles](/docs/collaborate/data-tile) diff --git a/website/docs/docs/build/groups.md b/website/docs/docs/build/groups.md index 890ee96901a..1be4388c246 100644 --- a/website/docs/docs/build/groups.md +++ b/website/docs/docs/build/groups.md @@ -119,4 +119,4 @@ dbt.exceptions.DbtReferenceError: Parsing Error * [Model Access](/docs/collaborate/govern/model-access#groups) * [Group configuration](/reference/resource-configs/group) -* [Group selection](/reference/node-selection/methods#the-group-method) +* [Group selection](/reference/node-selection/methods#group) diff --git a/website/docs/docs/collaborate/govern/model-versions.md b/website/docs/docs/collaborate/govern/model-versions.md index eefcf76e824..0bd16a03b3a 100644 --- a/website/docs/docs/collaborate/govern/model-versions.md +++ b/website/docs/docs/collaborate/govern/model-versions.md @@ -99,7 +99,7 @@ Let's say that `dim_customers` has three versions defined: `v2` is the "latest", As you'll see in the implementation section below, a versioned model can reuse the majority of its YAML properties and configuration. Each version needs to only say how it _differs_ from the shared set of attributes. This gives you, as the producer of a versioned model, the opportunity to highlight the differences across versions—which is otherwise difficult to detect in models with dozens or hundreds of columns—and to clearly track, in one place, all versions of the model which are currently live. -dbt also supports [`version`-based selection](/reference/node-selection/methods#the-version-method). For example, you could define a [default YAML selector](/reference/node-selection/yaml-selectors#default) that avoids running any old model versions in development, even while you continue to run them in production through a sunset and migration period. (You could accomplish something similar by applying `tags` to these models, and cycling through those tags over time.) +dbt also supports [`version`-based selection](/reference/node-selection/methods#version). For example, you could define a [default YAML selector](/reference/node-selection/yaml-selectors#default) that avoids running any old model versions in development, even while you continue to run them in production through a sunset and migration period. (You could accomplish something similar by applying `tags` to these models, and cycling through those tags over time.) diff --git a/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.8.md b/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.8.md index 026fb1a2a11..2c4370f929c 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.8.md +++ b/website/docs/docs/dbt-versions/core-upgrade/07-upgrading-to-v1.8.md @@ -46,7 +46,7 @@ Historically, dbt's test coverage was confined to [“data” tests](/docs/build In v1.8, we're introducing native support for [unit testing](/docs/build/unit-tests). Unit tests validate your SQL modeling logic on a small set of static inputs __before__ you materialize your full model in production. They support a test-driven development approach, improving both the efficiency of developers and the reliability of code. -Starting from v1.8, when you execute the `dbt test` command, it will run both unit and data tests. Use the [`test_type`](/reference/node-selection/methods#the-test_type-method) method to run only unit or data tests: +Starting from v1.8, when you execute the `dbt test` command, it will run both unit and data tests. Use the [`test_type`](/reference/node-selection/methods#test_type) method to run only unit or data tests: ```shell diff --git a/website/docs/docs/dbt-versions/core-upgrade/09-upgrading-to-v1.6.md b/website/docs/docs/dbt-versions/core-upgrade/09-upgrading-to-v1.6.md index bbb2535a74c..4a210e23fc0 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/09-upgrading-to-v1.6.md +++ b/website/docs/docs/dbt-versions/core-upgrade/09-upgrading-to-v1.6.md @@ -101,7 +101,7 @@ The ability for installed packages to override built-in materializations without ### Quick hits -- [`state:unmodified` and `state:old`](/reference/node-selection/methods#the-state-method) for [MECE](https://en.wikipedia.org/wiki/MECE_principle) stateful selection +- [`state:unmodified` and `state:old`](/reference/node-selection/methods#state) for [MECE](https://en.wikipedia.org/wiki/MECE_principle) stateful selection - [`invocation_args_dict`](/reference/dbt-jinja-functions/flags#invocation_args_dict) includes full `invocation_command` as string - [`dbt debug --connection`](/reference/commands/debug) to test just the data platform connection specified in a profile - [`dbt docs generate --empty-catalog`](/reference/commands/cmd-docs) to skip catalog population while generating docs diff --git a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/13-upgrading-to-v1.3.md b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/13-upgrading-to-v1.3.md index 250aa76ab26..2dd78727c65 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/13-upgrading-to-v1.3.md +++ b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/13-upgrading-to-v1.3.md @@ -54,5 +54,5 @@ GitHub discussion with details: [dbt-labs/dbt-core#6011](https://github.com/dbt- ### Quick hits - **["Full refresh"](/reference/resource-configs/full_refresh)** flag supports a short name, `-f`. -- **[The "config" selection method](/reference/node-selection/methods#the-config-method)** supports boolean and list config values, in addition to strings. +- **[The "config" selection method](/reference/node-selection/methods#config)** supports boolean and list config values, in addition to strings. - Two new dbt-Jinja context variables for accessing invocation metadata: [`invocation_args_dict`](/reference/dbt-jinja-functions/flags#invocation_args_dict) and [`dbt_metadata_envs`](/reference/dbt-jinja-functions/env_var#custom-metadata). diff --git a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/14-upgrading-to-v1.2.md b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/14-upgrading-to-v1.2.md index f2102560dfa..1b393df2f01 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/14-upgrading-to-v1.2.md +++ b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/14-upgrading-to-v1.2.md @@ -32,7 +32,7 @@ See GitHub discussion [dbt-labs/dbt-core#5468](https://github.com/dbt-labs/dbt-c - **[Grants](/reference/resource-configs/grants)** are natively supported in `dbt-core` for the first time. That support extends to all standard materializations, and the most popular adapters. If you already use hooks to apply simple grants, we encourage you to use built-in `grants` to configure your models, seeds, and snapshots instead. This will enable you to [DRY](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself) up your duplicated or boilerplate code. - **[Metrics](/docs/build/build-metrics-intro)** now support an `expression` type (metrics-on-metrics), as well as a `metric()` function to use when referencing metrics from within models, macros, or `expression`-type metrics. For more information on how to use expression metrics, check out the [**`dbt_metrics` package**](https://github.com/dbt-labs/dbt_metrics) - **[dbt-Jinja functions](/reference/dbt-jinja-functions)** now include the [`itertools` Python module](/reference/dbt-jinja-functions/modules#itertools), as well as the [set](/reference/dbt-jinja-functions/set) and [zip](/reference/dbt-jinja-functions/zip) functions. -- **[Node selection](/reference/node-selection/syntax)** includes a [file selection method](/reference/node-selection/methods#the-file-method) (`-s model.sql`), and [yaml selector](/reference/node-selection/yaml-selectors) inheritance. +- **[Node selection](/reference/node-selection/syntax)** includes a [file selection method](/reference/node-selection/methods#file) (`-s model.sql`), and [yaml selector](/reference/node-selection/yaml-selectors) inheritance. - **[Global configs](/reference/global-configs/about-global-configs)** now include CLI flag and environment variable settings for [`target-path`](/reference/global-configs/json-artifacts) and [`log-path`](/reference/global-configs/logs), which can be used to override the values set in `dbt_project.yml` ### Specific adapters diff --git a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/15-upgrading-to-v1.1.md b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/15-upgrading-to-v1.1.md index 0dc3d279b87..01bbabf9d16 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/15-upgrading-to-v1.1.md +++ b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/15-upgrading-to-v1.1.md @@ -45,7 +45,7 @@ Expected a schema version of "https://schemas.getdbt.com/dbt/manifest/v5.json" i ### Advanced and experimental functionality -**Fresh Rebuilds.** There's a new _experimental_ selection method in town: [`source_status:fresher`](/reference/node-selection/methods#the-source_status-method). Much like the `state:` and `result` methods, the goal is to use dbt metadata to run your DAG more efficiently. If dbt has access to previous and current results of `dbt source freshness` (the `sources.json` artifact), dbt can compare them to determine which sources have loaded new data, and select only resources downstream of "fresher" sources. Read more in [Understanding State](/reference/node-selection/syntax#about-node-selection) and [CI/CD in dbt Cloud](/docs/deploy/continuous-integration). +**Fresh Rebuilds.** There's a new _experimental_ selection method in town: [`source_status:fresher`](/reference/node-selection/methods#source_status). Much like the `state:` and `result` methods, the goal is to use dbt metadata to run your DAG more efficiently. If dbt has access to previous and current results of `dbt source freshness` (the `sources.json` artifact), dbt can compare them to determine which sources have loaded new data, and select only resources downstream of "fresher" sources. Read more in [Understanding State](/reference/node-selection/syntax#about-node-selection) and [CI/CD in dbt Cloud](/docs/deploy/continuous-integration). [**dbt-Jinja functions**](/reference/dbt-jinja-functions) have a new landing page, and two new members: diff --git a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-17-0.md b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-17-0.md index e26a69fd1c7..6a19bdcf808 100644 --- a/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-17-0.md +++ b/website/docs/docs/dbt-versions/core-upgrade/11-Older versions/upgrading-to-0-17-0.md @@ -247,7 +247,7 @@ BigQuery: ## New and changed documentation **Core** -- [`path:` selectors](/reference/node-selection/methods#the-path-method) +- [`path:` selectors](/reference/node-selection/methods#path) - [`--fail-fast` command](/reference/commands/run#failing-fast) - `as_text` Jinja filter: removed this defunct filter - [accessing nodes in the `graph` object](/reference/dbt-jinja-functions/graph) diff --git a/website/docs/docs/deploy/ci-jobs.md b/website/docs/docs/deploy/ci-jobs.md index 0f9b6ba377a..38bfb56a728 100644 --- a/website/docs/docs/deploy/ci-jobs.md +++ b/website/docs/docs/deploy/ci-jobs.md @@ -146,7 +146,7 @@ For semantic nodes and models that aren't downstream of modified models, dbt Clo -To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#stateful-selection)): +To only validate modified semantic nodes, use the following command (with [state selection](/reference/node-selection/syntax#state-selection)): ```bash dbt sl validate --select state:modified+ diff --git a/website/docs/guides/set-up-ci.md b/website/docs/guides/set-up-ci.md index 3c1ece9451d..79761e88e57 100644 --- a/website/docs/guides/set-up-ci.md +++ b/website/docs/guides/set-up-ci.md @@ -50,7 +50,7 @@ Use the **Continuous Integration Job** template, and call the job **CI Check**. In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down: - [`dbt build`](/reference/commands/build) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped. -- The [`state:modified+` selector](/reference/node-selection/methods#the-state-method) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. +- The [`state:modified+` selector](/reference/node-selection/methods#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. To be able to find modified nodes, dbt needs to have something to compare against. dbt Cloud uses the last successful run of any job in your Production environment as its [comparison state](/reference/node-selection/syntax#about-node-selection). As long as you identified your Production environment in Step 2, you won't need to touch this. If you didn't, pick the right environment from the dropdown. @@ -344,7 +344,7 @@ Use the **Continuous Integration Job** template, and call the job **QA Check**. In the Execution Settings, your command will be preset to `dbt build --select state:modified+`. Let's break this down: - [`dbt build`](/reference/commands/build) runs all nodes (seeds, models, snapshots, tests) at once in DAG order. If something fails, nodes that depend on it will be skipped. -- The [`state:modified+` selector](/reference/node-selection/methods#the-state-method) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. +- The [`state:modified+` selector](/reference/node-selection/methods#state) means that only modified nodes and their children will be run ("Slim CI"). In addition to [not wasting time](https://discourse.getdbt.com/t/how-we-sped-up-our-ci-runs-by-10x-using-slim-ci/2603) building and testing nodes that weren't changed in the first place, this significantly reduces compute costs. To be able to find modified nodes, dbt needs to have something to compare against. Normally, we use the Production environment as the source of truth, but in this case there will be new code merged into `qa` long before it hits the `main` branch and Production environment. Because of this, we'll want to defer the Release environment to itself. diff --git a/website/docs/reference/node-selection/methods.md b/website/docs/reference/node-selection/methods.md index 7587a9fd2b1..600a578ef8e 100644 --- a/website/docs/reference/node-selection/methods.md +++ b/website/docs/reference/node-selection/methods.md @@ -1,5 +1,6 @@ --- -title: "Methods" +title: "Node selector methods" +sidebar: "Node selector methods" --- Selector methods return all resources that share a common property, using the @@ -22,51 +23,67 @@ dbt list --select "*.folder_name.*" dbt list --select "package:*_source" ``` -### The "tag" method -The `tag:` method is used to select models that match a specified [tag](/reference/resource-configs/tags). +### access +The `access` method selects models based on their [access](/reference/resource-configs/access) property. - ```bash -dbt run --select "tag:nightly" # run all models with the `nightly` tag +```bash +dbt list --select "access:public" # list all public models +dbt list --select "access:private" # list all private models +dbt list --select "access:protected" # list all protected models ``` +### config + +The `config` method is used to select models that match a specified [node config](/reference/configs-and-properties). -### The "source" method -The `source` method is used to select models that select from a specified [source](/docs/build/sources#using-sources). Use in conjunction with the `+` operator. ```bash -dbt run --select "source:snowplow+" # run all models that select from Snowplow sources +dbt run --select "config.materialized:incremental" # run all models that are materialized incrementally +dbt run --select "config.schema:audit" # run all models that are created in the `audit` schema +dbt run --select "config.cluster_by:geo_country" # run all models clustered by `geo_country` ``` -### The "resource_type" method -Use the `resource_type` method to select nodes of a particular type (`model`, `test`, `exposure`, and so on). This is similar to the `--resource-type` flag used by the [`dbt ls` command](/reference/commands/list). +While most config values are strings, you can also use the `config` method to match boolean configs, dictionary keys, and values in lists. - ```bash -dbt build --select "resource_type:exposure" # build all resources upstream of exposures -dbt list --select "resource_type:test" # list all tests in your project -dbt list --select "resource_type:source" # list all sources in your project +For example, given a model with the following configurations: + +```bash +{{ config( + materialized = 'incremental', + unique_key = ['column_a', 'column_b'], + grants = {'select': ['reporter', 'analysts']}, + meta = {"contains_pii": true}, + transient = true +) }} + +select ... ``` -### The "path" method -The `path` method is used to select models/sources defined at or under a specific path. -Model definitions are in SQL/Python files (not YAML), and source definitions are in YAML files. -While the `path` prefix is not explicitly required, it may be used to make -selectors unambiguous. + You can select using any of the following: +```bash +dbt ls -s config.materialized:incremental +dbt ls -s config.unique_key:column_a +dbt ls -s config.grants.select:reporter +dbt ls -s config.meta.contains_pii:true +dbt ls -s config.transient:true +``` - ```bash - # These two selectors are equivalent - dbt run --select "path:models/staging/github" - dbt run --select "models/staging/github" +### exposure - # These two selectors are equivalent - dbt run --select "path:models/staging/github/stg_issues.sql" - dbt run --select "models/staging/github/stg_issues.sql" - ``` +The `exposure` method is used to select parent resources of a specified [exposure](/docs/build/exposures). Use in conjunction with the `+` operator. -### The "file" method + ```bash +dbt run --select "+exposure:weekly_kpis" # run all models that feed into the weekly_kpis exposure +dbt test --select "+exposure:*" # test all resources upstream of all exposures +dbt ls --select "+exposure:*" --resource-type source # list all source tables upstream of all exposures +``` + +### file + The `file` method can be used to select a model by its filename, including the file extension (`.sql`). ```bash @@ -76,7 +93,7 @@ dbt run --select "some_model.sql" dbt run --select "some_model" ``` -### The "fqn" method +### fqn The `fqn` method is used to select nodes based off their "fully qualified names" (FQN) within the dbt graph. The default output of [`dbt list`](/reference/commands/list) is a listing of FQN. The default FQN format is composed of the project name, subdirectories within the path, and the file name (without extension) separated by periods. @@ -88,7 +105,26 @@ dbt run --select "fqn:some_path.some_model" dbt run --select "fqn:your_project.some_path.some_model" ``` -### The "package" method + +### group + +The `group` method is used to select models defined within a [group](/reference/resource-configs/group). + + +```bash +dbt run --select "group:finance" # run all models that belong to the finance group. +``` + +### metric + +The `metric` method is used to select parent resources of a specified [metric](/docs/build/build-metrics-intro). Use in conjunction with the `+` operator. + +```bash +dbt build --select "+metric:weekly_active_users" # build all resources upstream of weekly_active_users metric +dbt ls --select "+metric:*" --resource-type source # list all source tables upstream of all metrics +``` + +### package The `package` method is used to select models defined within the root project or an installed dbt package. While the `package:` prefix is not explicitly required, it may be used to make @@ -102,91 +138,86 @@ selectors unambiguous. dbt run --select "snowplow.*" ``` +### path +The `path` method is used to select models/sources defined at or under a specific path. +Model definitions are in SQL/Python files (not YAML), and source definitions are in YAML files. +While the `path` prefix is not explicitly required, it may be used to make +selectors unambiguous. -### The "config" method -The `config` method is used to select models that match a specified [node config](/reference/configs-and-properties). + ```bash + # These two selectors are equivalent + dbt run --select "path:models/staging/github" + dbt run --select "models/staging/github" + # These two selectors are equivalent + dbt run --select "path:models/staging/github/stg_issues.sql" + dbt run --select "models/staging/github/stg_issues.sql" + ``` +### resource_type +Use the `resource_type` method to select nodes of a particular type (`model`, `test`, `exposure`, and so on). This is similar to the `--resource-type` flag used by the [`dbt ls` command](/reference/commands/list). ```bash -dbt run --select "config.materialized:incremental" # run all models that are materialized incrementally -dbt run --select "config.schema:audit" # run all models that are created in the `audit` schema -dbt run --select "config.cluster_by:geo_country" # run all models clustered by `geo_country` +dbt build --select "resource_type:exposure" # build all resources upstream of exposures +dbt list --select "resource_type:test" # list all tests in your project +dbt list --select "resource_type:source" # list all sources in your project ``` -While most config values are strings, you can also use the `config` method to match boolean configs, dictionary keys, and values in lists. - -For example, given a model with the following configurations: +### result -```bash -{{ config( - materialized = 'incremental', - unique_key = ['column_a', 'column_b'], - grants = {'select': ['reporter', 'analysts']}, - meta = {"contains_pii": true}, - transient = true -) }} - -select ... -``` +The `result` method is related to the `state` method described above and can be used to select resources based on their result status from a prior run. Note that one of the dbt commands [`run`, `test`, `build`, `seed`] must have been performed in order to create the result on which a result selector operates. You can use `result` selectors in conjunction with the `+` operator. - You can select using any of the following: ```bash -dbt ls -s config.materialized:incremental -dbt ls -s config.unique_key:column_a -dbt ls -s config.grants.select:reporter -dbt ls -s config.meta.contains_pii:true -dbt ls -s config.transient:true +dbt run --select "result:error" --state path/to/artifacts # run all models that generated errors on the prior invocation of dbt run +dbt test --select "result:fail" --state path/to/artifacts # run all tests that failed on the prior invocation of dbt test +dbt build --select "1+result:fail" --state path/to/artifacts # run all the models associated with failed tests from the prior invocation of dbt build +dbt seed --select "result:error" --state path/to/artifacts # run all seeds that generated errors on the prior invocation of dbt seed. ``` -### The "test_type" method +### saved_query - - -The `test_type` method is used to select tests based on their type, `singular` or `generic`: +The `saved_query` method selects [saved queries](/docs/build/saved-queries). ```bash -dbt test --select "test_type:generic" # run all generic tests -dbt test --select "test_type:singular" # run all singular tests +dbt list --select "saved_query:*" # list all saved queries +dbt list --select "+saved_query:orders_saved_query" # list your saved query named "orders_saved_query" and all upstream resources ``` - +### semantic_model - +The `semantic_model` method selects [semantic models](/docs/build/semantic-models). -The `test_type` method is used to select tests based on their type: +```bash +dbt list --select "semantic_model:*" # list all semantic models +dbt list --select "+semantic_model:orders" # list your semantic model named "orders" and all upstream resources +``` -- [Unit tests](/docs/build/unit-tests) -- [Data tests](/docs/build/data-tests): - - [Singular](/docs/build/data-tests#singular-data-tests) - - [Generic](/docs/build/data-tests#generic-data-tests) +### source +The `source` method is used to select models that select from a specified [source](/docs/build/sources#using-sources). Use in conjunction with the `+` operator. -```bash -dbt test --select "test_type:unit" # run all unit tests -dbt test --select "test_type:data" # run all data tests -dbt test --select "test_type:generic" # run all generic data tests -dbt test --select "test_type:singular" # run all singular data tests + ```bash +dbt run --select "source:snowplow+" # run all models that select from Snowplow sources ``` - +### source_status + +Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page. -### The "test_name" method +The following dbt commands produce `sources.json` artifacts whose results can be referenced in subsequent dbt invocations: +- `dbt source freshness` -The `test_name` method is used to select tests based on the name of the generic test -that defines it. For more information about how generic tests are defined, read about -[tests](/docs/build/data-tests). +After issuing one of the above commands, you can reference the source freshness results by adding a selector to a subsequent command as follows: - ```bash -dbt test --select "test_name:unique" # run all instances of the `unique` test -dbt test --select "test_name:equality" # run all instances of the `dbt_utils.equality` test -dbt test --select "test_name:range_min_max" # run all instances of a custom schema test defined in the local project, `range_min_max` +```bash +# You can also set the DBT_STATE environment variable instead of the --state flag. +dbt source freshness # must be run again to compare current to previous state +dbt build --select "source_status:fresher+" --state path/to/prod/artifacts ``` - -### The "state" method +### state **N.B.** State-based selection is a powerful, complex feature. Read about [known caveats and limitations](/reference/node-selection/state-comparison-caveats) to state comparison. @@ -196,14 +227,12 @@ The `state` method is used to select nodes by comparing them against a previous `state:modified`: All new nodes, plus any changes to existing nodes. - ```bash dbt test --select "state:new" --state path/to/artifacts # run all tests on new models + and new tests on old models dbt run --select "state:modified" --state path/to/artifacts # run all models that have been modified dbt ls --select "state:modified" --state path/to/artifacts # list all modified nodes (not just models) ``` - Because state comparison is complex, and everyone's project is different, dbt supports subselectors that include a subset of the full `modified` criteria: - `state:modified.body`: Changes to node body (e.g. model SQL, seed values) - `state:modified.configs`: Changes to any node configs, excluding `database`/`schema`/`alias` @@ -220,105 +249,60 @@ There are two additional `state` selectors that complement `state:new` and `stat These selectors can help you shorten run times by excluding unchanged nodes. Currently, no subselectors are available at this time, but that might change as use cases evolve. -### The "exposure" method - -The `exposure` method is used to select parent resources of a specified [exposure](/docs/build/exposures). Use in conjunction with the `+` operator. +### tag +The `tag:` method is used to select models that match a specified [tag](/reference/resource-configs/tags). ```bash -dbt run --select "+exposure:weekly_kpis" # run all models that feed into the weekly_kpis exposure -dbt test --select "+exposure:*" # test all resources upstream of all exposures -dbt ls --select "+exposure:*" --resource-type source # list all source tables upstream of all exposures -``` - -### The "metric" method - -The `metric` method is used to select parent resources of a specified [metric](/docs/build/build-metrics-intro). Use in conjunction with the `+` operator. - -```bash -dbt build --select "+metric:weekly_active_users" # build all resources upstream of weekly_active_users metric -dbt ls --select "+metric:*" --resource-type source # list all source tables upstream of all metrics -``` - -### The "result" method - -The `result` method is related to the `state` method described above and can be used to select resources based on their result status from a prior run. Note that one of the dbt commands [`run`, `test`, `build`, `seed`] must have been performed in order to create the result on which a result selector operates. You can use `result` selectors in conjunction with the `+` operator. - -```bash -dbt run --select "result:error" --state path/to/artifacts # run all models that generated errors on the prior invocation of dbt run -dbt test --select "result:fail" --state path/to/artifacts # run all tests that failed on the prior invocation of dbt test -dbt build --select "1+result:fail" --state path/to/artifacts # run all the models associated with failed tests from the prior invocation of dbt build -dbt seed --select "result:error" --state path/to/artifacts # run all seeds that generated errors on the prior invocation of dbt seed. +dbt run --select "tag:nightly" # run all models with the `nightly` tag ``` -### The "source_status" method - -Supported in v1.1 or higher. - -Another element of job state is the `source_status` of a prior dbt invocation. After executing `dbt source freshness`, for example, dbt creates the `sources.json` artifact which contains execution times and `max_loaded_at` dates for dbt sources. You can read more about `sources.json` on the ['sources'](/reference/artifacts/sources-json) page. - -The following dbt commands produce `sources.json` artifacts whose results can be referenced in subsequent dbt invocations: -- `dbt source freshness` +### test_name -After issuing one of the above commands, you can reference the source freshness results by adding a selector to a subsequent command as follows: +The `test_name` method is used to select tests based on the name of the generic test +that defines it. For more information about how generic tests are defined, read about +[tests](/docs/build/data-tests). -```bash -# You can also set the DBT_STATE environment variable instead of the --state flag. -dbt source freshness # must be run again to compare current to previous state -dbt build --select "source_status:fresher+" --state path/to/prod/artifacts + ```bash +dbt test --select "test_name:unique" # run all instances of the `unique` test +dbt test --select "test_name:equality" # run all instances of the `dbt_utils.equality` test +dbt test --select "test_name:range_min_max" # run all instances of a custom schema test defined in the local project, `range_min_max` ``` -### The "group" method - -The `group` method is used to select models defined within a [group](/reference/resource-configs/group). - - -```bash -dbt run --select "group:finance" # run all models that belong to the finance group. -``` +### The test_type -### The "access" method + -The `access` method selects models based on their [access](/reference/resource-configs/access) property. +The `test_type` method is used to select tests based on their type, `singular` or `generic`: ```bash -dbt list --select "access:public" # list all public models -dbt list --select "access:private" # list all private models -dbt list --select "access:protected" # list all protected models +dbt test --select "test_type:generic" # run all generic tests +dbt test --select "test_type:singular" # run all singular tests ``` -### The "version" method - -The `version` method selects [versioned models](/docs/collaborate/govern/model-versions) based on their [version identifier](/reference/resource-properties/versions) and [latest version](/reference/resource-properties/latest_version). + -```bash -dbt list --select "version:latest" # only 'latest' versions -dbt list --select "version:prerelease" # versions newer than the 'latest' version -dbt list --select "version:old" # versions older than the 'latest' version + -dbt list --select "version:none" # models that are *not* versioned -``` +The `test_type` method is used to select tests based on their type: -### The "semantic_model" method +- [Unit tests](/docs/build/unit-tests) +- [Data tests](/docs/build/data-tests): + - [Singular](/docs/build/data-tests#singular-data-tests) + - [Generic](/docs/build/data-tests#generic-data-tests) -The `semantic_model` method selects [semantic models](/docs/build/semantic-models). ```bash -dbt list --select "semantic_model:*" # list all semantic models -dbt list --select "+semantic_model:orders" # list your semantic model named "orders" and all upstream resources +dbt test --select "test_type:unit" # run all unit tests +dbt test --select "test_type:data" # run all data tests +dbt test --select "test_type:generic" # run all generic data tests +dbt test --select "test_type:singular" # run all singular data tests ``` -### The "saved_query" method - -The `saved_query` method selects [saved queries](/docs/build/saved-queries). - -```bash -dbt list --select "saved_query:*" # list all saved queries -dbt list --select "+saved_query:orders_saved_query" # list your saved query named "orders_saved_query" and all upstream resources -``` + -### The "unit_test" method +### unit_test Supported in v1.8 or newer. @@ -333,3 +317,15 @@ dbt list --select "+unit_test:orders_with_zero_items" # list your unit test nam ``` + +### version + +The `version` method selects [versioned models](/docs/collaborate/govern/model-versions) based on their [version identifier](/reference/resource-properties/versions) and [latest version](/reference/resource-properties/latest_version). + +```bash +dbt list --select "version:latest" # only 'latest' versions +dbt list --select "version:prerelease" # versions newer than the 'latest' version +dbt list --select "version:old" # versions older than the 'latest' version + +dbt list --select "version:none" # models that are *not* versioned +``` diff --git a/website/docs/reference/node-selection/state-comparison-caveats.md b/website/docs/reference/node-selection/state-comparison-caveats.md index adaf35bd710..f83a4f37c89 100644 --- a/website/docs/reference/node-selection/state-comparison-caveats.md +++ b/website/docs/reference/node-selection/state-comparison-caveats.md @@ -4,7 +4,7 @@ title: "Caveats to state comparison" import StateModified from '/snippets/_state-modified-compare.md'; -The [`state:` selection method](/reference/node-selection/methods#the-state-method) is a powerful feature, with a lot of underlying complexity. Below are a handful of considerations when setting up automated jobs that leverage state comparison. +The [`state:` selection method](/reference/node-selection/methods#state) is a powerful feature, with a lot of underlying complexity. Below are a handful of considerations when setting up automated jobs that leverage state comparison. ### Seeds diff --git a/website/docs/reference/node-selection/syntax.md b/website/docs/reference/node-selection/syntax.md index c61ab598a88..2e53eff72df 100644 --- a/website/docs/reference/node-selection/syntax.md +++ b/website/docs/reference/node-selection/syntax.md @@ -118,19 +118,19 @@ dbt ls --select "result:+" state:modified+ --state ./ -## Stateful selection +## State selection One of the greatest underlying assumptions about dbt is that its operations should be **stateless** and ****. That is, it doesn't matter how many times a model has been run before, or if it has ever been run before. It doesn't matter if you run it once or a thousand times. Given the same raw data, you can expect the same transformed result. A given run of dbt doesn't need to "know" about _any other_ run; it just needs to know about the code in the project and the objects in your database as they exist _right now_. -That said, dbt does store "state"—a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results—in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and : given the same manifest and the same raw data, dbt will produce the same transformed result. +That said, dbt does store "state" — a detailed, point-in-time view of project resources (also referred to as nodes), database objects, and invocation results — in the form of its [artifacts](/docs/deploy/artifacts). If you choose, dbt can use these artifacts to inform certain operations. Crucially, the operations themselves are still stateless and : given the same manifest and the same raw data, dbt will produce the same transformed result. dbt can leverage artifacts from a prior invocation as long as their file path is passed to the `--state` flag. This is a prerequisite for: -- [The `state:` selector](/reference/node-selection/methods#the-state-method), whereby dbt can identify resources that are new or modified +- [The `state` selector](/reference/node-selection/methods#state), whereby dbt can identify resources that are new or modified by comparing code in the current project against the state manifest. - [Deferring](/reference/node-selection/defer) to another environment, whereby dbt can identify upstream, unselected resources that don't exist in your current environment and instead "defer" their references to the environment provided by the state manifest. - The [`dbt clone` command](/reference/commands/clone), whereby dbt can clone nodes based on their location in the manifest provided to the `--state` flag. -Together, the `state:` selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag. +Together, the [`state`](/reference/node-selection/methods#state) selector and deferral enable ["slim CI"](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci). We expect to add more features in future releases that can leverage artifacts passed to the `--state` flag. ### Establishing state diff --git a/website/docs/reference/resource-properties/latest_version.md b/website/docs/reference/resource-properties/latest_version.md index 567ea5e7e1f..6635bd3fecb 100644 --- a/website/docs/reference/resource-properties/latest_version.md +++ b/website/docs/reference/resource-properties/latest_version.md @@ -21,7 +21,7 @@ models: The latest version of this model. The "latest" version is relevant for: 1. Resolving `ref()` calls to this model that are "unpinned" (a version is not explicitly specified) -2. Selecting model versions using the [`version:` selection method](/reference/node-selection/methods#the-version-method), based on whether a given model version is `latest`, `prerelease`, or `old` +2. Selecting model versions using the [`version:` selection method](/reference/node-selection/methods#version), based on whether a given model version is `latest`, `prerelease`, or `old` This value can be a string or a numeric (integer or float) value. It must be one of the [version identifiers](/reference/resource-properties/versions#v) specified in this model's list of `versions`.