Skip to content

Commit

Permalink
Beta docs for compare changes (#5903)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?

Beta docs for Advanced CI's compare changes 

- Update [Set up CI jobs
page](https://docs.getdbt.com/docs/deploy/ci-jobs) with new **Run
compare changes** option
- Add new "Compare changes" section to [CI overview
page](https://docs.getdbt.com/docs/deploy/continuous-integration)
- On the [Run visibility
page](https://docs.getdbt.com/docs/deploy/run-visibility), add a new
section for the "Compare" tab.
- On the run visibility page, add missing tabs (Lineage and Artifacts)
for completeness
- On [dbt Cloud environments
page](https://docs.getdbt.com/docs/dbt-cloud-environments), add a new
section for the Advanced CI account access option (beneath Partial
parsing content)

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [x] For [docs
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning),
review how to [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
- [ ] Needs PM review

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Matt Shaver <[email protected]>
  • Loading branch information
3 people authored Aug 23, 2024
1 parent 55088cc commit 1be2036
Show file tree
Hide file tree
Showing 9 changed files with 96 additions and 35 deletions.
41 changes: 25 additions & 16 deletions website/docs/docs/deploy/ci-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy

### Prerequisites
- You have a dbt Cloud account.
- For the [Concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [Smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/).
- For the [concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/).
- For the [compare changes](/docs/deploy/continuous-integration#compare-changes) feature, your dbt Cloud account must have access to Advanced CI. Please ask your [dbt Cloud administrator to enable](/docs/dbt-cloud-environments#account-access-to-advanced-ci-features) this for you.
- Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering.
- If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab.

Expand All @@ -21,40 +22,48 @@ To make CI job creation easier, many options on the **CI job** page are set to d

1. On your deployment environment page, click **Create job** > **Continuous integration job** to create a new CI job.

2. Options in the **Job settings** section:
1. Options in the **Job settings** section:
- **Job name** &mdash; Specify the name for this CI job.
- **Description** &mdash; Provide a description about the CI job.
- **Environment** &mdash; By default, it’s set to the environment you created the CI job from.
- **Environment** &mdash; By default, it’s set to the environment you created the CI job from. Use the dropdown to change the default setting.

1. Options in the **Git trigger** section:
- **Triggered by pull requests** &mdash; By default, it’s enabled. Every time a developer opens up a pull request or pushes a commit to an existing pull request, this job will get triggered to run.
- **Run on Draft Pull Request** &mdash; Enable this option if you want to also trigger the job to run every time a developer opens up a draft pull request or pushes a commit to that draft pull request.
- **Run on draft pull request** &mdash; Enable this option if you want to also trigger the job to run every time a developer opens up a draft pull request or pushes a commit to that draft pull request.

3. Options in the **Execution settings** section:
1. Options in the **Execution settings** section:
- **Commands** &mdash; By default, it includes the `dbt build --select state:modified+` command. This informs dbt Cloud to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred environment selected to compare state to. Click **Add command** to add more [commands](/docs/deploy/job-commands) that you want to be invoked when this job runs.
- **Run compare changes**<Lifecycle status="beta" /> &mdash; Enable this option to compare the last applied state of the production environment (if one exists) with the latest changes from the pull request, and identify what those differences are. To enable record-level comparison and primary key analysis, you must add a [primary key constraint](/reference/resource-properties/constraints) or [uniqueness test](/reference/resource-properties/data-tests#unique). Otherwise, you'll receive a "Primary key missing" error message in dbt Cloud.

To review the comparison report, navigate to the [Compare tab](/docs/deploy/run-visibility#compare-tab) in the job run's details. A summary of the report is also available from the pull request in your Git provider (see the [CI report example](#example-ci-report)).
- **Compare changes against an environment (Deferral)** &mdash; By default, it’s set to the **Production** environment if you created one. This option allows dbt Cloud to check the state of the code in the PR against the code running in the deferred environment, so as to only check the modified code, instead of building the full table or the entire DAG.

:::info
Older versions of dbt Cloud only allow you to defer to a specific job instead of an environment. Deferral to a job compares state against the project code that was run in the deferred job's last successful run. While deferral to an environment is more efficient as dbt Cloud will compare against the project representation (which is stored in the `manifest.json`) of the last successful deploy job run that executed in the deferred environment. By considering _all_ [deploy jobs](/docs/deploy/deploy-jobs) that run in the deferred environment, dbt Cloud will get a more accurate, latest project representation state.
:::
:::info
Older versions of dbt Cloud only allow you to defer to a specific job instead of an environment. Deferral to a job compares state against the project code that was run in the deferred job's last successful run. Deferral to an environment is more efficient as dbt Cloud will compare against the project representation (which is stored in the `manifest.json`) of the last successful deploy job run that executed in the deferred environment. By considering _all_ [deploy jobs](/docs/deploy/deploy-jobs) that run in the deferred environment, dbt Cloud will get a more accurate, latest project representation state.
:::

- **Run timeout** &mdash; Cancel the CI job if the run time exceeds the timeout value. You can use this option to help ensure that a CI check doesn't consume too much of your warehouse resources. If you enable the **Run compare changes** option, the timeout value defaults to `3600` (one hour) to prevent long-running comparisons.

- **Generate docs on run** &mdash; Enable this option if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this job runs. This option is disabled by default since most teams do not want to test doc generation on every CI check.

4. (optional) Options in the **Advanced settings** section:
1. (optional) Options in the **Advanced settings** section:
- **Environment variables** &mdash; Define [environment variables](/docs/build/environment-variables) to customize the behavior of your project when this CI job runs. You can specify that a CI job is running in a _Staging_ or _CI_ environment by setting an environment variable and modifying your project code to behave differently, depending on the context. It's common for teams to process only a subset of data for CI runs, using environment variables to branch logic in their dbt project code.
- **Target name** &mdash; Define the [target name](/docs/build/custom-target-names). Similar to **Environment Variables**, this option lets you customize the behavior of the project. You can use this option to specify that a CI job is running in a _Staging_ or _CI_ environment by setting the target name and modifying your project code to behave differently, depending on the context.
- **Run timeout** &mdash; Cancel this CI job if the run time exceeds the timeout value. You can use this option to help ensure that a CI check doesn't consume too much of your warehouse resources.
- **dbt version** &mdash; By default, it’s set to inherit the [dbt version](/docs/dbt-versions/core) from the environment. dbt Labs strongly recommends that you don't change the default setting. This option to change the version at the job level is useful only when you upgrade a project to the next dbt version; otherwise, mismatched versions between the environment and job can lead to confusing behavior.
- **Threads** &mdash; By default, it’s set to 4 [threads](/docs/core/connect-data-platform/connection-profiles#understanding-threads). Increase the thread count to increase model execution concurrency.
- **Generate docs on run** &mdash; Enable this if you want to [generate project docs](/docs/collaborate/build-and-view-your-docs) when this job runs. This is disabled by default since testing doc generation on every CI check is not a recommended practice.
- **Run source freshness** &mdash; Enable this option to invoke the `dbt source freshness` command before running this CI job. Refer to [Source freshness](/docs/deploy/source-freshness) for more details.

### Examples
<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/create-ci-job.png" width="90%" title="Example of CI Job page in the dbt Cloud UI"/>

- Example of creating a CI job:
<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/create-ci-job.png" title="Example of CI Job page in dbt Cloud UI"/>
### Example of CI check in pull request {#example-ci-check}
The following is an example of a CI check in a GitHub pull request. The green checkmark means the dbt build and tests were successful. Clicking on the dbt Cloud section takes you to the relevant CI run in dbt Cloud.

- Example of GitHub pull request. The green checkmark means the dbt build and tests were successful. Clicking on the dbt Cloud section navigates you to the relevant CI run in dbt Cloud.
<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-github-pr.png" width="60%" title="Example of CI check in GitHub pull request"/>

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-github-pr.png" title="GitHub pull request example"/>
### Example of CI report in pull request <Lifecycle status="beta" /> {#example-ci-report}
The following is an example of a CI report in a GitHub pull request, which is shown when the **Run compare changes** option is enabled for the CI job. It displays a high-level summary of the models that changed from the pull request.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-github-ci-report.png" width="75%" title="Example of CI report comment in GitHub pull request"/>

## Trigger a CI job with the API

Expand Down
33 changes: 26 additions & 7 deletions website/docs/docs/deploy/continuous-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,7 @@ dbt Cloud deletes the temporary schema from your <Term id="data-warehouse" /> w

The [dbt Cloud scheduler](/docs/deploy/job-scheduler) executes CI jobs differently from other deployment jobs in these important ways:

- **Concurrent CI checks** &mdash; CI runs triggered by the same dbt Cloud CI job execute concurrently (in parallel), when appropriate
- **Smart cancellation of stale builds** &mdash; Automatically cancels stale, in-flight CI runs when there are new commits to the PR
- **Run slot treatment** &mdash; CI runs don't consume a run slot

### Concurrent CI checks
<Expandable alt_header="Concurrent CI checks">

When you have teammates collaborating on the same dbt project creating pull requests on the same dbt repository, the same CI job will get triggered. Since each run builds into a dedicated, temporary schema that’s tied to the pull request, dbt Cloud can safely execute CI runs _concurrently_ instead of _sequentially_ (differing from what is done with deployment dbt Cloud jobs). Because no one needs to wait for one CI run to finish before another one can start, with concurrent CI checks, your whole team can test and integrate dbt code faster.

Expand All @@ -44,12 +40,35 @@ Below describes the conditions when CI checks are run concurrently and when they
- CI runs with the _same_ PR number and _different_ commit SHAs execute serially because they’re building into the same schema. dbt Cloud will run the latest commit and cancel any older, stale commits. For details, refer to [Smart cancellation of stale builds](#smart-cancellation).
- CI runs with the same PR number and same commit SHA, originating from different dbt Cloud projects will execute jobs concurrently. This can happen when two CI jobs are set up in different dbt Cloud projects that share the same dbt repository.

### Smart cancellation of stale builds {#smart-cancellation}
</Expandable>

<Expandable alt_header="Smart cancellation of stale builds">

When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the latest commit and cancels any CI run that is (now) stale and still in flight. This can happen when you’re pushing new commits while a CI build is still in process and not yet done. By cancelling runs in a safe and deliberate way, dbt Cloud helps improve productivity and reduce data platform spend on wasteful CI runs.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-smart-cancel-job.png" width="70%" title="Example of an automatically canceled run"/>

### Run slot treatment <Lifecycle status="team,enterprise" />
</Expandable>

<Expandable alt_header="Run slot treatment" lifecycle="team,enterprise">

CI runs don't consume run slots. This guarantees a CI check will never block a production run.

</Expandable>

<Expandable alt_header="Compare changes" lifecycle="beta" >

When a pull request is opened or new commits are pushed, dbt Cloud compares the changes between the last applied state of the production environment (defaulting to deferral for lower computation costs) and the latest changes from the pull request for CI jobs that have the **Run compare changes** option enabled. By analyzing these comparisons, you can gain a better understanding of how the data changes are affected by code changes to help ensure you always ship the correct changes to production and create trusted data products.

:::info Beta feature

The compare changes feature is currently in limited beta for select accounts. If you're interested in gaining access or learning more, please stay tuned for updates.

:::

dbt reports the comparison differences:

- **In dbt Cloud** &mdash; Shows the changes (if any) to the data's primary keys, rows, and columns. To learn more, refer to the [Compare tab](/docs/deploy/run-visibility#compare-tab) in the [Job run details](/docs/deploy/run-visibility#job-run-details).
- **In the pull request from your Git provider** &mdash; Shows a summary of the changes, as a git comment.

</Expandable>
38 changes: 33 additions & 5 deletions website/docs/docs/deploy/run-visibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ You can view the history of your runs and the model timing dashboard to help ide

## Run history

The **Run history** dashboard in dbt Cloud helps you monitor the health of your dbt project. It provides a detailed overview of all of your project's job runs and empowers you with a variety of filters to help you focus on specific aspects. You can also use it to review recent runs, find errored runs, and track the progress of runs in progress. You can access it on the top navigation menu by clicking **Deploy** and then **Run history**.
The **Run history** dashboard in dbt Cloud helps you monitor the health of your dbt project. It provides a detailed overview of all your project's job runs and empowers you with a variety of filters that enable you to focus on specific aspects. You can also use it to review recent runs, find errored runs, and track the progress of runs in progress. You can access it from the top navigation menu by clicking **Deploy** and then **Run history**.

The dashboard displays your full run history, including job name, status, associated environment, job trigger, commit SHA, schema, and timing info.

dbt Cloud developers can access their run history for the last 365 days through the dbt Cloud user interface (UI) and API.

We limit self-service retrieval of run history metadata to 365 days to improve dbt Cloud's performance.
dbt Labs limits self-service retrieval of run history metadata to 365 days to improve dbt Cloud's performance.

<Lightbox src="/img/docs/dbt-cloud/deployment/run-history.png" width="85%" title="Run history dashboard allows you to monitor the health of your dbt project and displays jobs, job status, environment, timing, and more."/>

Expand All @@ -29,16 +29,44 @@ An example of a completed run with a configuration for a [job completion trigger

<Lightbox src="/img/docs/dbt-cloud/deployment/example-job-details.png" width="65%" title="Example of run details" />

### Access logs
### Run summary tab

You can view or download in-progress and historical logs for your dbt runs. This makes it easier for the team to debug errors more efficiently.

<Lightbox src="/img/docs/dbt-cloud/deployment/access-logs.gif" width="85%" title="Access logs for run steps" />

### Model timing <Lifecycle status="team,enterprise" />
### Lineage tab

The **Model timing** dashboard displays the composition, order, and time taken by each model in a job run. The visualization appears for successful jobs and highlights the top 1% of model durations. This helps you identify bottlenecks in your runs, so you can investigate them and potentially make changes to improve their performance.
View the lineage graph associated with the job run so you can better understand the dependencies and relationships of the resources in your project. To view a node's metadata directly in [dbt Explorer](/docs/collaborate/explore-projects), select it (double-click) from the graph.

<Lightbox src="/img/docs/collaborate/dbt-explorer/explorer-from-lineage.gif" width="85%" title="Example of accessing dbt Explorer from the Lineage tab" />

### Model timing tab <Lifecycle status="team,enterprise" />

The **Model timing** tab displays the composition, order, and time each model takes in a job run. The visualization appears for successful jobs and highlights the top 1% of model durations. This helps you identify bottlenecks in your runs so you can investigate them and potentially make changes to improve their performance.

You can find the dashboard on the [job's run details](#job-run-details).

<Lightbox src="/img/docs/dbt-cloud/model-timing.png" width="85%" title="The Model timing tab displays the top 1% of model durations and visualizes model bottlenecks" />

### Artifacts tab

This provides a list of the artifacts generated by the job run. The files are saved and available for download.

<Lightbox src="/img/docs/dbt-cloud/example-artifacts-tab.png" width="85%" title="Example of the Artifacts tab" />

### Compare tab <Lifecycle status="beta"/>

The **Compare** tab is shown for [CI job runs](/docs/deploy/ci-jobs) with the **Run compare changes** setting enabled. It displays details about [the changes from the comparison dbt performed](/docs/deploy/continuous-integration#compare-changes) between what's in your production environment and the pull request. To help you better visualize the differences, dbt Cloud highlights changes to your models in red (deletions) and green (inserts).

From the **Modified** section, you can view the following:

- **Overview** &mdash; High-level summary about the changes to the models such as the number of primary keys that were added or removed.
- **Primary keys** &mdash; Details about the changes to the records.
- **Modified rows** &mdash; Details about the modified rows. Click **Show full preview** to display all columns.
- **Columns** &mdash; Details about the changes to the columns.

To view the dependencies and relationships of the resources in your project more closely, click **View in Explorer** to launch [dbt Explorer](/docs/collaborate/explore-projects).

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

Loading

0 comments on commit 1be2036

Please sign in to comment.