diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 8aaf0375007..1983b0201d9 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -4,14 +4,14 @@ * @dbt-labs/product-docs # Adapter & Package Development Docs -/website/docs/docs/supported-data-platforms.md @dbt-labs/product-docs @dataders -/website/docs/reference/warehouse-setups @dbt-labs/product-docs @dataders +/website/docs/docs/supported-data-platforms.md @dbt-labs/product-docs @amychen1776 +/website/docs/reference/warehouse-setups @dbt-labs/product-docs @amychen1776 # `resource-configs` contains more than just warehouse setups -/website/docs/reference/resource-configs/*-configs.md @dbt-labs/product-docs @dataders -/website/docs/guides/advanced/adapter-development @dbt-labs/product-docs @dataders @dbeatty10 +/website/docs/reference/resource-configs/*-configs.md @dbt-labs/product-docs @amychen1776 +/website/docs/guides/advanced/adapter-development @dbt-labs/product-docs @amychen1776 -/website/docs/guides/building-packages @dbt-labs/product-docs @amychen1776 @dataders @dbeatty10 -/website/docs/guides/creating-new-materializations @dbt-labs/product-docs @dataders @dbeatty10 +/website/docs/guides/building-packages @dbt-labs/product-docs @amychen1776 +/website/docs/guides/creating-new-materializations @dbt-labs/product-docs # Require approval from the Multicell team when making # changes to the public facing migration documentation. diff --git a/.github/ISSUE_TEMPLATE/internal-orch-team.yml b/.github/ISSUE_TEMPLATE/internal-orch-team.yml deleted file mode 100644 index 8c4d61df10c..00000000000 --- a/.github/ISSUE_TEMPLATE/internal-orch-team.yml +++ /dev/null @@ -1,49 +0,0 @@ -name: Orchestration team - Request changes to docs -description: File a docs update request that is not already tracked in Orch team's Release Plans (Notion database). -labels: ["content","internal-orch-team"] -body: - - type: markdown - attributes: - value: | - * You can ask questions or submit ideas for the dbt docs in [Issues](https://github.com/dbt-labs/docs-internal/issues/new/choose) - * Before you file an issue read the [Contributing guide](https://github.com/dbt-labs/docs-internal#contributing). - * Check to make sure someone hasn't already opened a similar [issue](https://github.com/dbt-labs/docs-internal/issues). - - - type: checkboxes - id: contributions - attributes: - label: Contributions - description: Please read the contribution docs before opening an issue or pull request. - options: - - label: I have read the contribution docs, and understand what's expected of me. - - - type: textarea - attributes: - label: Link to the page on docs.getdbt.com requiring updates - description: Please link to the page or pages you'd like to see improved. - validations: - required: true - - - type: textarea - attributes: - label: What part(s) of the page would you like to see updated? - description: | - - Give as much detail as you can to help us understand the change you want to see. - - Why should the docs be changed? What use cases does it support? - - What is the expected outcome? - validations: - required: true - - - type: textarea - attributes: - label: Reviewers/Stakeholders/SMEs - description: List the reviewers, stakeholders, and subject matter experts (SMEs) to collaborate with for the docs update. - validations: - required: true - - - type: textarea - attributes: - label: Related Jira tickets - description: Add any other context or screenshots about the feature request here. - validations: - required: false diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 870dadcd183..d2bb72552bd 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -9,6 +9,7 @@ To learn more about the writing conventions used in the dbt Labs docs, see the [ - [ ] I have reviewed the [Content style guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md) so my content adheres to these guidelines. - [ ] The topic I'm writing about is for specific dbt version(s) and I have versioned it according to the [version a whole page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version) and/or [version a block of content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content) guidelines. - [ ] I have added checklist item(s) to this list for anything anything that needs to happen before this PR is merged, such as "needs technical review" or "change base branch." +- [ ] The content in this PR requires a dbt release note, so I added one to the [release notes page](https://docs.getdbt.com/docs/dbt-versions/dbt-cloud-release-notes). '; + const endMarker = ''; + + // Get the deployment URL and links from environment variables + const deploymentUrl = process.env.DEPLOYMENT_URL; + const links = process.env.LINKS; + + // Build the deployment content without leading whitespace + const deploymentContent = [ + `${startMarker}`, + '---', + '🚀 Deployment available! Here are the direct links to the updated files:', + '', + `${links}`, + '', + `${endMarker}` + ].join('\n'); + + // Remove existing deployment content between markers + const regex = new RegExp(`${startMarker}[\\s\\S]*?${endMarker}`, 'g'); + body = body.replace(regex, '').trim(); + + // Append the new deployment content + body = `${body}\n\n${deploymentContent}`; + + // Update the PR description + await github.rest.pulls.update({ + owner: context.repo.owner, + repo: context.repo.repo, + pull_number: prNumber, + body: body, + }); + env: + DEPLOYMENT_URL: ${{ steps.vercel_url.outputs.deployment_url }} + LINKS: ${{ steps.links.outputs.links }} diff --git a/.github/workflows/vale.yml b/.github/workflows/vale.yml new file mode 100644 index 00000000000..8abc5e2f50b --- /dev/null +++ b/.github/workflows/vale.yml @@ -0,0 +1,79 @@ +name: Vale linting + +on: + pull_request: + types: [opened, synchronize, reopened] + paths: + - 'website/docs/**/*' + - 'website/blog/**/*' + - 'website/**/*' + +jobs: + vale: + name: Vale linting + runs-on: ubuntu-latest + steps: + - name: Checkout code + uses: actions/checkout@v3 + with: + fetch-depth: 1 + + - name: List repository contents + run: | + pwd + ls -R + + - name: Set up Python + uses: actions/setup-python@v5 + with: + python-version: '3.x' + + - name: Install Vale + run: pip install vale==3.9.1.0 # Install a stable version of Vale + + - name: Get changed files + id: changed-files + uses: tj-actions/changed-files@v45 + with: + files: | + website/**/*.md + separator: ' ' + + - name: Debugging - Print changed files + if: ${{ steps.changed-files.outputs.any_changed == 'true' }} + run: | + echo "Changed files:" + echo "${{ steps.changed-files.outputs.all_changed_and_modified_files }}" + + - name: Confirm files exist + if: ${{ steps.changed-files.outputs.any_changed == 'true' }} + run: | + echo "Checking if files exist..." + for file in ${{ steps.changed-files.outputs.all_changed_and_modified_files }}; do + if [ -f "$file" ]; then + echo "Found: $file" + else + echo "File not found: $file" + exit 1 + fi + done + + - name: Run vale + if: ${{ steps.changed-files.outputs.any_changed == 'true' }} + uses: errata-ai/vale-action@reviewdog + with: + token: ${{ secrets.GITHUB_TOKEN }} + reporter: github-pr-review + files: ${{ steps.changed-files.outputs.all_changed_and_modified_files }} + separator: ' ' + +# - name: Post summary comment +# if: ${{ steps.changed-files.outputs.any_changed == 'true' }} +# run: | +# COMMENT="❗️Oh no, some Vale linting found issues! Please check the **Files change** tab for detailed results and make the necessary updates." +# COMMENT+=$'\n' +# COMMENT+=$'\n\n' +# COMMENT+="➡️ Link to detailed report: [Files changed](${{ github.event.pull_request.html_url }}/files)" +# gh pr comment ${{ github.event.pull_request.number }} --body "$COMMENT" +# env: +# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} diff --git a/.hyperlint/config.yaml b/.hyperlint/config.yaml new file mode 100644 index 00000000000..03082114ae1 --- /dev/null +++ b/.hyperlint/config.yaml @@ -0,0 +1,10 @@ +content_dir: /docs +authorized_users: + - mirnawong1 + - matthewshaver + - nghi-ly + - runleonarun + - nataliefiann + +vale: + enabled: true diff --git a/.vale.ini b/.vale.ini new file mode 100644 index 00000000000..58aff923afe --- /dev/null +++ b/.vale.ini @@ -0,0 +1,7 @@ +StylesPath = styles +MinAlertLevel = warning + +Vocab = EN + +[*.md] +BasedOnStyles = custom diff --git a/README.md b/README.md index c749fedf95a..d306651f545 100644 --- a/README.md +++ b/README.md @@ -62,18 +62,3 @@ You can click a link available in a Vercel bot PR comment to see and review your Advisory: - If you run into an `fatal error: 'vips/vips8' file not found` error when you run `npm install`, you may need to run `brew install vips`. Warning: this one will take a while -- go ahead and grab some coffee! - -## Running the Cypress tests locally - -Method 1: Utilizing the Cypress GUI -1. `cd` into the repo: `cd docs.getdbt.com` -2. `cd` into the `website` subdirectory: `cd website` -3. Install the required node packages: `npm install` -4. Run `npx cypress open` to open the Cypress GUI, and choose `E2E Testing` as the Testing Type, before finally selecting your browser and clicking `Start E2E testing in {broswer}` -5. Click on a test and watch it run! - -Method 2: Running the Cypress E2E tests headlessly -1. `cd` into the repo: `cd docs.getdbt.com` -2. `cd` into the `website` subdirectory: `cd website` -3. Install the required node packages: `npm install` -4. Run `npx cypress run` diff --git a/contributing/adding-page-components.md b/contributing/adding-page-components.md index 68294e7d149..7a92d627995 100644 --- a/contributing/adding-page-components.md +++ b/contributing/adding-page-components.md @@ -4,7 +4,7 @@ You can use the following components to provide code snippets for each supported Identify code by labeling with the warehouse names: -```code +```sql
@@ -32,7 +32,7 @@ You can use the following components to provide code snippets in a tabbed view. Identify code and code files by labeling with the component they are describing: -```code +```sql ` tag. This allows you to share a link to a page with a pre-selected tab so that clicking on a tab creates a unique hyperlink for that tab. However, this feature doesn't provide an anchor link, which means the browser won't scroll to the tab. Additionally, you can define the search parameter name to use. If the tabs content is under a header, you can alternatively link to the header itself, instaed of the `queryString` prop. +You can use the [queryString](https://docusaurus.io/docs/next/markdown-features/tabs?current-os=ios#query-string) prop in the `` tag. This allows you to share a link to a page with a pre-selected tab so that clicking on a tab creates a unique hyperlink for that tab. However, this feature doesn't provide an anchor link, which means the browser won't scroll to the tab. Additionally, you can define the search parameter name to use. If the tabs content is under a header, you can alternatively link to the header itself, instead of the `queryString` prop. In the following example, clicking a tab adds a search parameter to the end of the URL: `?current-os=android or ?current-os=ios`. -``` +```sql Android @@ -105,3 +105,48 @@ In the following example, clicking a tab adds a search parameter to the end of t ``` + +## Markdown Links + +Refer to the Links section of the Content Style Guide to read about how you can use links in the dbt product documentation. + +## Collapsible header + + +
+

Shows and hides children elements

+
+
+ +```markdown + +
+

Shows and hides children elements

+
+
+
+``` + +## File component + +```yml + + +```yaml +password: hunter2 +``` + +``` + +## LoomVideo component + +
{``}
+ + + +## YoutubeVideo component + +
{``}
+ + + diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md index 022efa127a5..a8520bc0e0d 100644 --- a/contributing/content-style-guide.md +++ b/contributing/content-style-guide.md @@ -624,6 +624,12 @@ When describing icons that appear on-screen, use the [_Google Material Icons_](h :white_check_mark:Click on the menu icon +#### Upload icons +If you're using icons to document things like [third-party vendors](https://docs.getdbt.com/docs/cloud-integrations/avail-sl-integrations), etc. — you need to add the icon file in the following locations to ensure the icons render correctly in light and dark mode: + +- website/static/img/icons +- website/static/img/icons/white + ### Image names Two words that are either adjectives or nouns describing the name of a file separated by an underscore `_` (known as `snake_case`). The two words can also be separated by a hyphen (`kebab-case`). diff --git a/contributing/developer-blog.md b/contributing/developer-blog.md deleted file mode 100644 index 0d9b3becba2..00000000000 --- a/contributing/developer-blog.md +++ /dev/null @@ -1,67 +0,0 @@ - -* [Contributing](#contributing) -* [Core Principles](#core-principles) - -## Contributing - -The dbt Developer Blog is a place where analytics practitioners can go to share their knowledge with the community. Analytics Engineering is a discipline we’re all building together. The developer blog exists to cultivate the collective knowledge that exists on how to build and scale effective data teams. - -We currently have editorial capacity for a few Community contributed developer blogs per quarter - if we are oversubscribed we suggest you post on another platform or hold off until the editorial team is ready to take on more posts. - -### What makes a good developer blog post? - -- The short answer: Practical, hands on analytics engineering tutorials and stories - - [Slim CI/CD with Bitbucket](https://docs.getdbt.com/blog/slim-ci-cd-with-bitbucket-pipelines) - - [So You Want to Build a dbt Package](https://docs.getdbt.com/blog/so-you-want-to-build-a-package) - - [Founding an Analytics Engineering Team](https://docs.getdbt.com/blog/founding-an-analytics-engineering-team-smartsheet) -- See the [Developer Blog Core Principles](#core-principles) - -### How do I submit a proposed post? - -To submit a proposed post, open a `Contribute to the dbt Developer Blog` issue on the [Developer Hub repo](https://github.com/dbt-labs/docs.getdbt.com/issues/new/choose). You will be asked for: - -- A short (one paragraph) summary of the post you’d like to publish -- An outline of the post - -You’ll hear back from a member of the dbt Labs teams within 7 days with one of three responses: - -- The post looks good to go as is! We’ll ask you to start creating a draft based off of the initial outline you submitted -- Proposed changes to the outline. This could be additional focus on a topic you mention that’s of high community interest or a tweak to the structure to help with narrative flow -- Not a fit for the developer blog right now. We hugely appreciate *any* interest in submitting to the Developer Blog - right now our biggest backlog is capacity to help folks get these published. See below on how we are thinking about and evaluating potential posts. - -### What is the process once my blog is accepted? - -Once a blog is accepted, we’ll ask you for a date when we can expect the draft by. Typically we’ll ask that you can commit to having this ready within a month of submitting the issue. - -Once you submit a draft, we’ll return a first set of edits within 5 business days. - -The typical turnaround time from issue creation to going live on the developer blog is ~4 to 6 weeks. - -### What happens after my blog is published? - -We’ll share the blog on the dbt Labs social media channels! We also encourage you to share on the dbt Slack in #i-made-this. - -### What if my post doesn’t get approved? - -We want to publish as many community contributors as possible, but not every post will be a fit for the Developer Blog. That’s ok! There are many different reasons why we might not be able to publish a post right now and none of them reflect on the quality of the proposed post. - -- **dbt Labs capacity**: We’re committed to providing hands-on feedback and coaching throughout the process. Our goal is not just to generate great developer blogs - it’s to help build a community of great writers / practitioners who can share their knowledge with the community for years to come. This necessarily means we will be able to take on a lower absolute number of posts in the short term, but will hopefully be helpful for the community long term. -- **Focus on narrative / problem solving - not industry trends**: The developer blog exists, primarily, to tell the stories of analytics engineering practitioners and how they solve problems. The idea is that reading the developer blog gives a feel for what it is like to be a data practitioner on the ground today. This is not a hard and fast rule, but a good way to approach this is “How I/we solved X problem” rather than “How everyone should solve X problem”. - -We are very interested in stacks, new tools and integrations and will happily publish posts about this - with the caveat that the *focus* of the post should be solving real world problems. Hopefully if you are writing about these, this is something that you have used yourself in a hands on, production implementation. - -- **Right sized scope**: We want to be able to cover a topic in-depth and dig into the nuances. Big topics like “How should you structure your data team” or “How to ensure data quality in your organization” will be tough to cover in the scope of a single post. If you have a big idea - try subdividing it! “How should you structure your data team” could become “How we successfully partnered with our RevOps team on improving lead tracking” and “How to ensure data quality in your organization” might be “How we cleaned up our utm tracking”. - -### What if I need help / have questions: - -- Feel free to post any questions in #community-writers on the dbt Slack. - -## Core Principles - -- 🧑🏻‍🤝‍🧑🏾 The dbt Developer blog is written by humans **- individual analytics professionals sharing their insight with the world. To the extent feasible, a community member posting on the developer blog is not staking an official organizational stance, but something that *they* have learned or believe based on their work. This is true for dbt Labs employees as well. -- 💍 Developer blog content is knowledge rich - these are posts that readers share, bookmark and come back to time and time again. -- ⛹🏼‍♂️ Developer blog content is written by and for *practitioners* - end users of analytics tools (and sometimes people that work with practitioners). -- ⭐ Developer blog content is best when it is *the story which the author is uniquely positioned to tell.* Authors are encouraged to consider what insight they have that is specific to them and the work they have done. -- 🏎️ Developer blog content is actionable - readers walk away with a clear sense of how they can use this information to be a more effective practitioner. Posts include code snippets, Loom walkthroughs and hands-on, practical information that can be integrated into daily workflows. -- 🤏 Nothing is too small to share - what you think is simple has the potential to change someone's week. -- 🔮 Developer blog content is present focused —posts tell a story of a thing that you've already done or are actively doing, not something that you may do in the future. diff --git a/contributing/lightbox.md b/contributing/lightbox.md index 5f35b4d9639..95feccbe779 100644 --- a/contributing/lightbox.md +++ b/contributing/lightbox.md @@ -25,4 +25,9 @@ You can use the Lightbox component to add an image or screenshot to your page. I /> ``` +Note that if you're using icons to document things like third party vendors, etc, — you need to add the icon file in the following locations to ensure the icons render correctly in light and dark mode: + +- `website/static/img/icons` +- `website/static/img/icons/white` + diff --git a/contributing/single-sourcing-content.md b/contributing/single-sourcing-content.md index 537980ebdfb..6dc14d760b1 100644 --- a/contributing/single-sourcing-content.md +++ b/contributing/single-sourcing-content.md @@ -90,7 +90,7 @@ This component can be added directly to a markdown file in a similar way as othe Both properties can be used together to set a range where the content should show. In the example below, this content will only show if the selected version is between **0.21** and **1.0**: ```markdown - + Versioned content here diff --git a/styles/config/vocabularies/EN/accept.txt b/styles/config/vocabularies/EN/accept.txt new file mode 100644 index 00000000000..083b42f7bed --- /dev/null +++ b/styles/config/vocabularies/EN/accept.txt @@ -0,0 +1,69 @@ +dbt Cloud +dbt Core +dbt Semantic Layer +dbt Explorer +dbt +dbt-tonic +dbtonic +IDE +CLI +Config +info +docs +yaml +YAML +SQL +bash +shell +MetricFlow +jinja +jinja2 +sqlmesh +Snowflake +Databricks +Fabric +Redshift +Azure +DevOps +Athena +Amazon +UI +CSV +S3 +SCD +repo +dbt_project.yml +boolean +defaultValue= +DWH +DWUs +shoutout +ADF +BQ +gcloud +MSFT +DDL +APIs +API +SSIS +PBI +PowerBI +datetime +PySpark +:::caution +:::note +:::info +:::tip +:::warning +\<[^>]+\> +\b[A-Z]{2,}(?:/[A-Z]{2,})?\b +\w+-\w+ +\w+/\w+ +n/a +N/A +\ diff --git a/styles/custom/LatinAbbreviations.yml b/styles/custom/LatinAbbreviations.yml new file mode 100644 index 00000000000..44a3c9d6e8c --- /dev/null +++ b/styles/custom/LatinAbbreviations.yml @@ -0,0 +1,15 @@ +# LatinAbbreviations.yml +extends: substitution +message: "Avoid Latin abbreviations: '%s'. Consider using '%s' instead." +level: warning + +swap: + 'e.g.': 'for example' + 'e.g': 'for example' + 'eg': 'for example' + 'i.e.': 'that is' + 'i.e': 'that is' + 'etc.': 'and so on' + 'etc': 'and so on' + 'N.B.': 'Note' + 'NB': 'Note' diff --git a/styles/custom/Repitition.yml b/styles/custom/Repitition.yml new file mode 100644 index 00000000000..4cd620146cf --- /dev/null +++ b/styles/custom/Repitition.yml @@ -0,0 +1,6 @@ +extends: repetition +message: "'%s' is repeated!" +level: warning +alpha: true +tokens: + - '[^\s]+' diff --git a/styles/custom/SentenceCaseHeaders.yml b/styles/custom/SentenceCaseHeaders.yml new file mode 100644 index 00000000000..d1d6cd97c67 --- /dev/null +++ b/styles/custom/SentenceCaseHeaders.yml @@ -0,0 +1,34 @@ +extends: capitalization +message: "'%s' should use sentence-style capitalization. Try '%s' instead." +level: warning +scope: heading +match: $sentence # Enforces sentence-style capitalization +indicators: + - ":" +exceptions: + - '\bdbt\b' + - '\bdbt\s+Cloud\b' + - '\bdbt\s+Core\b' + - '\bdbt\s+Cloud\s+CLI\b' + - Snowflake + - Databricks + - Azure + - GCP + - AWS + - SQL + - CLI + - API + - YAML + - JSON + - HTML + - Redshift + - Google + - BigQuery + - SnowSQL + - Snowsight + - Snowpark + - Fabric + - Microsoft + - Postgres + - Explorer + - IDE diff --git a/styles/custom/Typos.yml b/styles/custom/Typos.yml new file mode 100644 index 00000000000..c1ad5cbe95b --- /dev/null +++ b/styles/custom/Typos.yml @@ -0,0 +1,41 @@ +extends: spelling + +message: "Oops there's a typo -- did you really mean '%s'? " +level: warning + +action: + name: suggest + params: + - spellings + +custom: true +filters: + - '\bdbt\b' + - '\bdbt\s+Cloud\b' + - '\bdbt\s+Core\b' + - '\bdbt\s+Cloud\s+CLI\b' + - '\bdbt\s+.*?\b' + - '<[^>]+>' # Ignore all HTML-like components starting with < and ending with > + - '<[^>]+>.*<\/[^>]+>' + +--- + +extends: existence + +message: "Ignore specific patterns" +level: skip +tokens: + - '\bdbt\b' + - '\bdbt\s+Cloud\b' + - '\bdbt\s+Core\b' + - '\bdbt\s+Cloud\s+CLI\b' + - '\bdbt\s+.*?\b' + - '<[^>]+>' # Ignore all HTML-like components starting with < and ending with > + - '<[^>]+>.*<\/[^>]+>' + - '\w+-\w+' + - '\w+/\w+' + - '\w+/\w+|\w+-\w+|n/a' + - 'n/a' + - 'N/A' + - 'dbt v\\d+\\.\\d+' + - 'v\\d+\\.\\d+ ' diff --git a/styles/custom/UIElements.yml b/styles/custom/UIElements.yml new file mode 100644 index 00000000000..ff9c5f86187 --- /dev/null +++ b/styles/custom/UIElements.yml @@ -0,0 +1,19 @@ +extends: existence +message: "UI elements like '%s' should be bold." +level: warning +tokens: + # Match UI elements that are not bolded (i.e., not within **) + - '(? ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#the-state-method), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details. +> ❓ **What are artifacts and why should I defer to them?** dbt artifacts are metadata of the last run - what models and tests were defined, which ones ran successfully, and which failed. If a future dbt run is set to ***defer*** to this metadata, it means that it can select models and tests to run based on their state, including and especially their difference from the reference metadata. See [Artifacts](https://docs.getdbt.com/reference/artifacts/dbt-artifacts), [Selection methods: “state”](https://docs.getdbt.com/reference/node-selection/methods#state), and [Caveats to state comparison](https://docs.getdbt.com/reference/node-selection/state-comparison-caveats) for details. ### Slim Continuous Integration: Retrieve the artifacts and do a state-based run diff --git a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md index 6758a28638c..f75675ba7ee 100644 --- a/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md +++ b/website/blog/2022-05-03-making-dbt-cloud-api-calls-using-dbt-cloud-cli.md @@ -113,7 +113,7 @@ After the initial release I started to expand to cover the rest of the dbt Cloud In this example we’ll download a `catalog.json` artifact from the latest run of a dbt Cloud job using `dbt-cloud run list` and `dbt-cloud get-artifact` and then write a simple Data Catalog CLI application using the same tools that are used in `dbt-cloud-cli` (i.e., `click` and `pydantic`). Let’s dive right in! -The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/): +The first command we need is the `dbt-cloud run list` which uses an [API endpoint](https://docs.getdbt.com/dbt-cloud/api-v2#/operations/List%20Runs) that returns runs sorted by creation date, with the most recent run appearing first. The command returns a JSON response that has one top-level attribute `data` that contains a list of runs. We’ll need to extract the `id` attribute of the first one and to do that we use [jq](https://stedolan.github.io/jq/): ``` latest_run_id=$(dbt-cloud run list --job-id $DBT_CLOUD_JOB_ID | jq .data[0].id -r) diff --git a/website/blog/2022-07-13-star-sql-love-letter.md b/website/blog/2022-07-13-star-sql-love-letter.md index a84750198de..0d5aec181a2 100644 --- a/website/blog/2022-07-13-star-sql-love-letter.md +++ b/website/blog/2022-07-13-star-sql-love-letter.md @@ -44,7 +44,7 @@ So what does this mean for the example from above? Instead of writing out all 55 ```sql select - {{ dbt_utils.star(from=ref('table_a'), except=['column_56'] }} + {{ dbt_utils.star(from=ref('table_a'), except=['column_56']) }} from {{ ref('table_a') }} ``` diff --git a/website/blog/2024-04-22-extended-attributes.md b/website/blog/2024-04-22-extended-attributes.md index 18d4ff0b64c..57636cc8f6b 100644 --- a/website/blog/2024-04-22-extended-attributes.md +++ b/website/blog/2024-04-22-extended-attributes.md @@ -80,7 +80,7 @@ All you need to do is configure an environment as staging and enable the **Defer ## Upgrading on a curve -Lastly, let’s consider a more specialized use case. Imagine we have a "tiger team" (consisting of a lone analytics engineer named Dave) tasked with upgrading from dbt version 1.6 to the new **Versionless** setting, to take advantage of added stability and feature access. We want to keep the rest of the data team being productive in dbt 1.6 for the time being, while enabling Dave to upgrade and do his work in the new versionless mode. +Lastly, let’s consider a more specialized use case. Imagine we have a "tiger team" (consisting of a lone analytics engineer named Dave) tasked with upgrading from dbt version 1.6 to the new **[Latest release track](/docs/dbt-versions/cloud-release-tracks)**, to take advantage of new features and performance improvements. We want to keep the rest of the data team being productive in dbt 1.6 for the time being, while enabling Dave to upgrade and do his work with Latest (and greatest) dbt. ### Development environment diff --git a/website/blog/2024-05-22-latest-dbt-stability-improvement-innovation.md b/website/blog/2024-05-22-latest-dbt-stability-improvement-innovation.md index 078dab198fa..f2c25f3da8c 100644 --- a/website/blog/2024-05-22-latest-dbt-stability-improvement-innovation.md +++ b/website/blog/2024-05-22-latest-dbt-stability-improvement-innovation.md @@ -1,5 +1,5 @@ --- -title: "How we're making sure you can confidently go \"Versionless\" in dbt Cloud" +title: "How we're making sure you can confidently switch to the \"Latest\" release track in dbt Cloud" description: "Over the past 6 months, we've laid a stable foundation for continuously improving dbt." slug: latest-dbt-stability @@ -12,23 +12,27 @@ date: 2024-05-02 is_featured: true --- +import Latest from '/snippets/_release-stages-from-versionless.md' + + + As long as dbt Cloud has existed, it has required users to select a version of dbt Core to use under the hood in their jobs and environments. This made sense in the earliest days, when dbt Core minor versions often included breaking changes. It provided a clear way for everyone to know which version of the underlying runtime they were getting. However, this came at a cost. While bumping a project's dbt version *appeared* as simple as selecting from a dropdown, there was real effort required to test the compatibility of the new version against existing projects, package dependencies, and adapters. On the other hand, putting this off meant foregoing access to new features and bug fixes in dbt. -But no more. Today, we're ready to announce the general availability of a new option in dbt Cloud: [**"Versionless."**](https://docs.getdbt.com/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) +But no more. Today, we're ready to announce the general availability of a new option in dbt Cloud: [**the "Latest" release track.**](/docs/dbt-versions/cloud-release-tracks) For customers, this means less maintenance overhead, faster access to bug fixes and features, and more time to focus on what matters most: building trusted data products. This will be our stable foundation for improvement and innovation in dbt Cloud. -But we wanted to go a step beyond just making this option available to you. In this blog post, we aim to shed a little light on the extensive work we've done to ensure that using "Versionless" is a stable, reliable experience for the thousands of customers who rely daily on dbt Cloud. +But we wanted to go a step beyond just making this option available to you. In this blog post, we aim to shed a little light on the extensive work we've done to ensure that using the "Latest" release track is a stable and reliable experience for the thousands of customers who rely daily on dbt Cloud. ## How we safely deploy dbt upgrades to Cloud We've put in place a rigorous, best-in-class suite of tests and control mechanisms to ensure that all changes to dbt under the hood are fully vetted before they're deployed to customers of dbt Cloud. -This pipeline has in fact been in place since January! It's how we've already been shipping continuous changes to the hundreds of customers who've selected "Versionless" while it's been in Beta and Preview. In that time, this process has enabled us to prevent multiple regressions before they were rolled out to any customers. +This pipeline has in fact been in place since January! It's how we've already been shipping continuous changes to the hundreds of customers who've selected the "Latest" release track while it's been in Beta and Preview. In that time, this process has enabled us to prevent multiple regressions before they were rolled out to any customers. We're very confident in the robustness of this process**. We also know that we'll need to continue building trust with time.** We're sharing details about this work in the spirit of transparency and to build that trust. @@ -82,9 +86,9 @@ All incidents are retrospected to make sure we not only identify and fix the roo ::: -The outcome of this process is that, when you select "Versionless" in dbt Cloud, the time between an improvement being made to dbt Core and you *safely* getting access to it in your projects is a matter of days — rather than months of waiting for the next dbt Core release, on top of any additional time it may have taken to actually carry out the upgrade. +The outcome of this process is that, when you select the "Latest" release track in dbt Cloud, the time between an improvement being made to dbt Core and you *safely* getting access to it in your projects is a matter of days — rather than months of waiting for the next dbt Core release, on top of any additional time it may have taken to actually carry out the upgrade. -We’re pleased to say that since the beta launch of “Versionless” in dbt Cloud in March, **we have not had any functional regressions reach customers**, while we’ve also been shipping multiple improvements to dbt functionality every day. This is a foundation that we aim to build on for the foreseeable future. +We’re pleased to say that, at the time of writing (May 2, 2024), since the beta launch of the "Latest" release track in dbt Cloud in March, **we have not had any functional regressions reach customers**, while we’ve also been shipping multiple improvements to dbt functionality every day. This is a foundation that we aim to build on for the foreseeable future. ## Stability as a feature @@ -98,7 +102,7 @@ The adapter interface — i.e. how dbt Core actually connects to a third-party d To solve that, we've released a new set of interfaces that are entirely independent of the `dbt-core` library: [`dbt-adapters==1.0.0`](https://github.com/dbt-labs/dbt-adapters). From now on, any changes to `dbt-adapters` will be backward and forward-compatible. This also decouples adapter maintenance from the regular release cadence of dbt Core — meaning maintainers get full control over when they ship implementations of new adapter-powered features. -Note that adapters running in dbt Cloud **must** be [migrated to the new decoupled architecture](https://github.com/dbt-labs/dbt-adapters/discussions/87) as a baseline in order to support the new "Versionless" option. +Note that adapters running in dbt Cloud **must** be [migrated to the new decoupled architecture](https://github.com/dbt-labs/dbt-adapters/discussions/87) as a baseline in order to support the new "Latest" release track. ### Managing behavior changes: stability as a feature @@ -118,7 +122,7 @@ We’ve now [formalized our development best practices](https://github.com/dbt-l In conclusion, we’re putting a lot of new muscle behind our commitments to dbt Cloud customers, the dbt Community, and the broader ecosystem: -- **Continuous updates**: "Versionless" dbt Cloud simplifies the update process, ensuring you always have the latest features and bug fixes without the maintenance overhead. +- **Continuous updates**: The "Latest" release track in dbt Cloud simplifies the update process, ensuring you always have the latest features and bug fixes without the maintenance overhead. - **A rigorous new testing and deployment process**: Our new testing pipeline ensures that every update is carefully vetted against documented interfaces, Cloud-supported adapters, and popular packages before it reaches you. This process minimizes the risk of regressions — and has now been successful at entirely preventing them for hundreds of customers over multiple months. - **A commitment to stability**: We’ve reworked our approaches to adapter interfaces, behaviour change management, and metadata artifacts to give you more stability and control. diff --git a/website/blog/2024-06-12-putting-your-dag-on-the-internet.md b/website/blog/2024-06-12-putting-your-dag-on-the-internet.md index 535cfc34d6e..54864916d0e 100644 --- a/website/blog/2024-06-12-putting-your-dag-on-the-internet.md +++ b/website/blog/2024-06-12-putting-your-dag-on-the-internet.md @@ -12,7 +12,7 @@ date: 2024-06-14 is_featured: true --- -**New in dbt: allow Snowflake Python models to access the internet** +## New in dbt: allow Snowflake Python models to access the internet With dbt 1.8, dbt released support for Snowflake’s [external access integrations](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview) further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, [EQT AB](https://eqtgroup.com/). Learn about why they needed it and how they helped build the feature and get it shipped! @@ -45,7 +45,7 @@ This API is open and if it requires an API key, handle it similarly to managing For simplicity’s sake, we will show how to create them using [pre-hooks](/reference/resource-configs/pre-hook-post-hook) in a model configuration yml file: -``` +```yml models: - name: external_access_sample config: @@ -57,7 +57,7 @@ models: Then we can simply use the new external_access_integrations configuration parameter to use our network rule within a Python model (called external_access_sample.py): -``` +```python import snowflake.snowpark as snowpark def model(dbt, session: snowpark.Session): dbt.config( @@ -75,7 +75,7 @@ def model(dbt, session: snowpark.Session): The result is a model with some json I can parse, for example, in a SQL model to extract some information: -``` +```sql {{ config( materialized='incremental', @@ -108,12 +108,12 @@ The result is a model that will keep track of dbt invocations, and the current U This is a very new area to Snowflake and dbt -- something special about SQL and dbt is that it’s very resistant to external entropy. The second we rely on API calls, Python packages and other external dependencies, we open up to a lot more external entropy. APIs will change, break, and your models could fail. -Traditionally dbt is the T in ELT (dbt overview [here](https://docs.getdbt.com/terms/elt)), and this functionality unlocks brand new EL capabilities for which best practices do not yet exist. What’s clear is that EL workloads should be separated from T workloads, perhaps in a different modeling layer. Note also that unless using incremental models, your historical data can easily be deleted. dbt has seen a lot of use cases for this, including this AI example as outlined in this external [engineering blog post](https://klimmy.hashnode.dev/enhancing-your-dbt-project-with-large-language-models). +Traditionally dbt is the T in ELT (dbt overview [here](https://docs.getdbt.com/terms/elt)), and this functionality unlocks brand new EL capabilities for which best practices do not yet exist. What’s clear is that EL workloads should be separated from T workloads, perhaps in a different modeling layer. Note also that unless using incremental models, your historical data can easily be deleted. dbt has seen a lot of use cases for this, including this AI example as outlined in this external [engineering blog post](https://klimmy.hashnode.dev/enhancing-your-dbt-project-with-large-language-models). -**A few words about the power of Commercial Open Source Software** +## A few words about the power of Commercial Open Source Software In order to get this functionality shipped quickly, EQT opened a pull request, Snowflake helped with some problems we had with CI and a member of dbt Labs helped write the tests and merge the code in! -dbt now features this functionality in dbt 1.8+ or the “Versionless” option of dbt Cloud (dbt overview [here](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless)). +dbt now features this functionality in dbt 1.8+ and all [Release tracks](/docs/dbt-versions/cloud-release-tracks) in dbt Cloud. dbt Labs staff and community members would love to chat more about it in the [#db-snowflake](https://getdbt.slack.com/archives/CJN7XRF1B) slack channel. diff --git a/website/blog/2024-10-04-hybrid-mesh.md b/website/blog/2024-10-04-hybrid-mesh.md new file mode 100644 index 00000000000..05a45599318 --- /dev/null +++ b/website/blog/2024-10-04-hybrid-mesh.md @@ -0,0 +1,89 @@ +--- +title: "How Hybrid Mesh unlocks dbt collaboration at scale" +description: A deep-dive into the Hybrid Mesh pattern for enabling collaboration between domain teams using dbt Core and dbt Cloud. +slug: hybrid-mesh +authors: [jason_ganz] +tags: [analytics craft] +hide_table_of_contents: false +date: 2024-09-30 +is_featured: true +--- + +One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge. + +In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform. + +As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams — [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro). + + + +dbt Mesh is powerful because it allows teams to operate _independently_ and _collaboratively_, each team free to build on their own but contributing to a larger, shared set of data outputs. + +The flexibility of dbt Mesh means that it can support [a wide variety of patterns and designs](/best-practices/how-we-mesh/mesh-3-structures). Today, let’s dive into one pattern that is showing promise as a way to enable teams working on very different dbt deployments to work together. + +## How Hybrid Mesh enables collaboration between dbt Core and dbt Cloud teams + +**_Scenario_** — A company with a central data team uses dbt Core. The setup is working well for that team. They want to scale their impact to enable faster decision-making, organization-wide. The current dbt Core setup isn't well suited for onboarding a larger number of less-technical, nontechnical, or less-frequent contributors. + +**_The goal_** — Enable three domain teams of less-technical users to leverage and extend the central data models, with full ownership over their domain-specific dbt models. + + - **Central data team:** Data engineers comfortable using dbt Core and the command line interface (CLI), building and maintaining foundational data models for the entire organization. + + - **Domain teams:** Data analysts comfortable working in SQL but not using the CLI and prefer to start working right away without managing local dbt Core installations or updates. The team needs to build transformations specific to their business context. Some of these users may have tried dbt in the past, but they were not able to successfully onboard to the central team's setup. + +**_Solution: Hybrid Mesh_** — Data teams can use dbt Mesh to connect projects *across* dbt Core and dbt Cloud, creating a workflow where everyone gets to work in their preferred environment while creating a shared lineage that allows for visibility, validation, and ownership across the data pipeline. + +Each team will fully own its dbt code, from development through deployment, using the product that is appropriate to their needs and capabilities _while sharing data products across teams using both dbt Core and dbt Cloud._ + + + +Creating a Hybrid Mesh is mostly the same as creating any other [dbt Mesh](/guides/mesh-qs?step=1) workflow — there are a few considerations but mostly _it just works_. We anticipate it will continue to see adoption as more central data teams look to onboard their downstream domain teams. + +A Hybrid Mesh can be adopted as a stable long-term pattern, or as an intermediary while you perform a [migration from dbt Core to dbt Cloud](/guides/core-cloud-2?step=1). + +## How to build a Hybrid Mesh +Enabling a Hybrid Mesh is as simple as a few additional steps to import the metadata from your Core project into dbt Cloud. Once you’ve done this, you should be able to operate your dbt Mesh like normal and all of our [standard recommendations](/best-practices/how-we-mesh/mesh-1-intro) still apply. + +### Step 1: Prepare your Core project for access through dbt Mesh + +Configure public models to serve as stable interfaces for downstream dbt Projects. + +- Decide which models from your Core project will be accessible in your Mesh. For more information on how to configure public access for those models, refer to the [model access page.](/docs/collaborate/govern/model-access) +- Optionally set up a [model contract](/docs/collaborate/govern/model-contracts) for all public models for better governance. +- Keep dbt Core and dbt Cloud projects in separate repositories to allow for a clear separation between upstream models managed by the dbt Core team and the downstream models handled by the dbt Cloud team. + +### Step 2: Mirror each "producer" Core project in dbt Cloud +This allows dbt Cloud to know about the contents and metadata of your project, which in turn allows for other projects to access its models. + +- [Create a dbt Cloud account](https://www.getdbt.com/signup/) and a dbt project for each upstream Core project. + - Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud. +- Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models. +- Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing. + - Optional: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud. +- Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage). + +### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh +Now that dbt Cloud has the necessary information about your Core project, you can begin setting up your downstream projects, building on top of the public models from the project you brought into Cloud in [Step 2](#step-2-mirror-each-producer-core-project-in-dbt-cloud). To do this: +- Initialize each new downstream dbt Cloud project and create a [`dependencies.yml` file](/docs/collaborate/govern/project-dependencies#use-cases). +- In that `dependencies.yml` file, add the dbt project name from the `dbt_project.yml` of the upstream project(s). This sets up cross-project references between different dbt projects: + + ```yaml + # dependencies.yml file in dbt Cloud downstream project + projects: + - name: upstream_project_name + ``` +- Use [cross-project references](/reference/dbt-jinja-functions/ref#ref-project-specific-models) for public models in upstream project. Add [version](/reference/dbt-jinja-functions/ref#versioned-ref) to references of versioned models: + ```yaml + select * from {{ ref('upstream_project_name', 'monthly_revenue') }} + ``` + +And that’s all it takes! From here, the domain teams that own each dbt Project can build out their models to fit their own use cases. You can now build out your Hybrid Mesh however you want, accessing the full suite of dbt Cloud features. +- Orchestrate your Mesh to ensure timely delivery of data products and make them available to downstream consumers. +- Use [dbt Explorer](/docs/collaborate/explore-projects) to trace the lineage of your data back to its source. +- Onboard more teams and connect them to your Mesh. +- Build [semantic models](/docs/build/semantic-models) and [metrics](/docs/build/metrics-overview) into your projects to query them with the [dbt Semantic Layer](https://www.getdbt.com/product/semantic-layer). + + +## Conclusion + +In a world where organizations have complex and ever-changing data needs, there is no one-size fits all solution. Instead, data practitioners need flexible tooling that meets them where they are. The Hybrid Mesh presents a model for this approach, where teams that are comfortable and getting value out of dbt Core can collaborate frictionlessly with domain teams on dbt Cloud. diff --git a/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md b/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md new file mode 100644 index 00000000000..dc9b78bba8d --- /dev/null +++ b/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md @@ -0,0 +1,84 @@ +--- +title: "Iceberg Is An Implementation Detail" +description: "This blog will talk about iceberg table support and why it both matters and doesn't" +slug: icebeg-is-an-implementation-detail + +authors: [amy_chen] + +tags: [table formats, iceberg] +hide_table_of_contents: false + +date: 2024-10-04 +is_featured: false +--- + +If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. It’s one of many open table formats like Delta Lake, Hudi, and Hive. These formats are changing the way data is stored and metadata accessed. They are groundbreaking in many ways. + +But I have to be honest: **I don’t care**. But not for the reasons you think. + + + +## What is Iceberg? + +To have this conversation, we need to start with the same foundational understanding of Iceberg. Apache Iceberg is a high-performance open table format developed for modern data lakes. It was designed for large-scale datasets, and within the project, there are many ways to interact with it. When people talk about Iceberg, it often means multiple components including but not limited to: + +1. Iceberg Table Format - an open-source table format with large-scale data. Tables materialized in iceberg table format are stored on a user’s infrastructure, such as S3 Bucket. +2. Iceberg Data Catalog - an open-source metadata management system that tracks the schema, partition, and versions of Iceberg tables. +3. Iceberg REST Protocol (also called Iceberg REST API) is how engines can support and speak to other Iceberg-compatible catalogs. + +If you have been in the industry, you also know that everything I just wrote above about Iceberg could easily be replaced by `Hive,` `Hudi,` or `Delta.` This is because they were all designed to solve essentially the same problem. Ryan Blue (creator of Iceberg) and Michael Armbrust (creator of Delta Lake) recently sat down for this [fantastic chat](https://vimeo.com/1012543474) and said two points that resonated with me: + +- “We never intended for people to pay attention to this area. It’s something we wanted to fix, but people should be able to not pay attention and just work with their data. Storage systems should just work.” +- “We solve the same challenges with different approaches.” + +At the same time, the industry is converging on Apache Iceberg. [Iceberg has the highest availability of read and write support](https://medium.com/sundeck/2024-lakehouse-format-rundown-7edd75015428). + + + + +Snowflake launched Iceberg support in 2022. Databricks launched Iceberg support via Uniform last year. Microsoft announced Fabric support for Iceberg in September 2024 at Fabric Con. **Customers are demanding interoperability, and vendors are listening**. + +Why does this matter? Standardization of the industry benefits customers. When the industry standardizes - customers have the gift of flexibility. Everyone has a preferred way of working, and with standardization — they can always bring their preferred tools to their organization’s data. + +## Just another implementation detail + +I’m not saying open table formats aren't important. The metadata management and performance make them very meaningful and should be paid attention to. Our users are already excited to use it to create data lakes to save on storage costs, create more abstraction from their computing, etc. + +But when building data models or focusing on delivering business value through analytics, my primary concern is not *how* the data is stored—it's *how* I can leverage it to generate insights and drive decisions. The analytics development lifecycle is hard enough without having to take into every detail. dbt abstracts the underlying platform and lets me focus on writing SQL and orchestrating my transformations. It’s a feature that I don’t need to think about how tables are stored or optimized—I just need to know that when I reference dim_customers or fct_sales, the correct data is there and ready to use. **It should just work.** + +## Sometimes the details do matter + +While table formats are an implementation detail for data transformation — Iceberg can impact dbt developers when the implementation details aren’t seamless. Currently, using Iceberg requires a significant amount of upfront configuration and integration work beyond just creating tables to get started. + +One of the biggest hurdles is managing Iceberg’s metadata layer. This metadata often needs to be synced with external catalogs, which requires careful setup and ongoing maintenance to prevent inconsistencies. Permissions and access controls add another layer of complexity—because multiple engines can access Iceberg tables, you have to ensure that all systems have the correct access to both the data files and the metadata catalog. Currently, setting up integrations between these engines is also far from seamless; while some engines natively support Iceberg, others require brittle workarounds to ensure the metadata is synced correctly. This fragmented landscape means you could land with a web of interconnected components. + +## Fixing it + +**Today, we announced official support for the Iceberg table format in dbt.** By supporting the Iceberg table format, it’s one less thing you have to worry about on your journey to adopting Iceberg. + +With support for Iceberg Table Format, it is now easier to convert your dbt models using proprietary table formats to Iceberg by updating your configuration. After you have set up your external storage for Iceberg and connected it to your platforms, you will be able to jump into your dbt model and update the configuration to look something like this: + + + +It is available on these adapters: + +- Athena +- Databricks +- Snowflake +- Spark +- Starburst/Trino +- Dremio + +As with the beauty of any open-source project, Iceberg support grew organically, so the implementations vary. However, this will change in the coming months as we converge onto one dbt standard. This way, no matter which adapter you jump into, the configuration will always be the same. + +## dbt the Abstraction Layer + +dbt is more than about abstracting away the DDL to create and manage objects. It’s also about ensuring an opinionated approach to managing and optimizing your data. That remains true for our strategy around Iceberg Support. + +In our dbt-snowflake implementation, we have already started to [enforce best practices centered around how to manage the base location](https://docs.getdbt.com/reference/resource-configs/snowflake-configs#base-location) to ensure you don’t create technical debt accidentally, ensuring your Iceberg implementation scales over time. And we aren’t done yet. + +That said, while we can create the models, there is a *lot* of initial work to get to that stage. dbt developers must still consider the implementation, like how their external volume has been set up or where dbt can access the metadata. We have to make this better. + +Given the friction of getting launched on Iceberg, over the coming months, we will enable more capabilities to empower users to adopt Iceberg. It should be easier to read from foreign Iceberg catalogs. It should be easier to mount your volume. It should be easier to manage refreshes. And you should also trust that permissions and governance are consistently enforced. + +And this work doesn’t stop at Iceberg. The framework we are building is also compatible with other table formats, ensuring that whatever table format works for you is supported on dbt. This way — dbt users can also stop caring about table formats. **It’s just another implementation detail.** diff --git a/website/blog/2024-10-05-snowflake-feature-store.md b/website/blog/2024-10-05-snowflake-feature-store.md new file mode 100644 index 00000000000..cf5c55be1b5 --- /dev/null +++ b/website/blog/2024-10-05-snowflake-feature-store.md @@ -0,0 +1,273 @@ +--- +title: "Snowflake feature store and dbt: A bridge between data pipelines and ML" +description: A deep-dive into the workflow steps you can take to build and deploy ML models within a single platform. +slug: snowflake-feature-store +authors: [randy_pettus, luis_leon] +tags: [snowflake ML] +hide_table_of_contents: false +date: 2024-10-08 +is_featured: true +--- + +Flying home into Detroit this past week working on this blog post on a plane and saw for the first time, the newly connected deck of the Gordie Howe International [bridge](https://www.freep.com/story/news/local/michigan/detroit/2024/07/24/gordie-howe-bridge-deck-complete-work-moves-to-next-phase/74528258007/) spanning the Detroit River and connecting the U.S. and Canada. The image stuck out because, in one sense, a feature store is a bridge between the clean, consistent datasets and the machine learning models that rely upon this data. But, more interesting than the bridge itself is the massive process of coordination needed to build it. This construction effort — I think — can teach us more about processes and the need for feature stores in machine learning (ML). + +Think of the manufacturing materials needed as our data and the building of the bridge as the building of our ML models. There are thousands of engineers and construction workers taking materials from all over the world, pulling only the specific pieces needed for each part of the project. However, to make this project truly work at this scale, we need the warehousing and logistics to ensure that each load of concrete rebar and steel meets the standards for quality and safety needed and is available to the right people at the right time — as even a single fault can have catastrophic consequences or cause serious delays in project success. This warehouse and the associated logistics play the role of the feature store, ensuring that data is delivered consistently where and when it is needed to train and run ML models. + + + +## What is a feature? + +A feature is a transformed or enriched data that serves as an input into a machine learning model to make predictions. In machine learning, a data scientist derives features from various data sources to build a model that makes predictions based on historical data. To capture the value from this model, the enterprise must operationalize the data pipeline, ensuring that the features being used in production at inference time match those being used in training and development. + +## What role does dbt play in getting data ready for ML models? + +dbt is the standard for data transformation in the enterprise. Organizations leverage dbt at scale to deliver clean and well-governed datasets wherever and whenever they are needed. Using dbt to manage the data transformation processes to cleanse and prepare datasets used in feature development will ensure consistent datasets of guaranteed data quality — meaning that feature development will be consistent and reliable. + + +## Who is going to use this and what benefits will they see? + +Snowflake and dbt are already a well-established and trusted combination for delivering data excellence across the enterprise. The ability to register dbt pipelines in the Snowflake Feature Store further extends this combination for ML and AI workloads, while fitting naturally into the data engineering and feature pipelines already present in dbt. + + +Some of the key benefits are: + +- **Feature collaboration** — Data scientists, data analysts, data engineers, and machine learning engineers collaborate on features used in machine learning models in both Python and SQL, enabling teams to share and reuse features. As a result, teams can improve the time to value of models while improving the understanding of their components. This is all backed by Snowflake’s role-based access control (RBAC) and governance. +- **Feature consistency** — Teams are assured that features generated for training sets and those served for model inference are consistent. This can especially be a concern for large organizations where multiple versions of the truth might persist. Much like how dbt and Snowflake help enterprises have a single source of data truth, now they can have a single source of truth for features. +- **Feature visibility and use** — The Snowflake Feature Store provides an intuitive SDK to work with ML features and their associated metadata. In addition, users can browse and search for features in the Snowflake UI, providing an easy way to identify features +- **Point-in-time correctness** — Snowflake retrieves point-in-time correct features using ASOF Joins, removing the significant complexity in generating the right feature value for a given time period whether for training or batch prediction retrieval. +- **Integration with data pipelines** — Teams that have already built data pipelines in dbt can continue to use these with the Snowflake Feature Store. No additional migration or feature re-creation is necessary as teams plug into the same pipelines. + +## Why did we integrate/build this with Snowflake? + +How does dbt help with ML workloads today? dbt plays a pivotal role in preparing data for ML models by transforming raw data into a format suitable for feature engineering. It helps orchestrate and automate these transformations, ensuring that data is clean, consistent, and ready for ML applications. The combination of Snowflake’s powerful AI Data Cloud and dbt’s transformation prowess makes it an unbeatable pair for organizations aiming to scale their ML operations efficiently. + +## Making it easier for ML/Data Engineers to both build & deploy ML data & models + +dbt is a perfect tool to promote collaboration between data engineers, ML engineers, and data scientists. dbt is designed to support collaboration and quality of data pipelines through features including version control, environments and development life cycles, as well as built-in data and pipeline testing. Leveraging dbt means that data engineers and data scientists can collaborate and develop new models and features while maintaining the rigorous governance and high quality that's needed. + +Additionally, dbt Mesh makes maintaining domain ownership extremely easy by breaking up portions of our data projects and pipelines into connected projects where critical models can be published for consumption by others with strict data contracts enforcing quality and governance. This paradigm supports rapid development as each project can be kept to a maintainable size for its contributors and developers. Contracting on published models used between these projects ensures the consistency of the integration points between them. + +Finally, dbt Cloud also provides [dbt Explorer](/docs/collaborate/explore-projects) — a perfect tool to catalog and share knowledge about organizational data across disparate teams. dbt Explorer provides a central place for information on data pipelines, including lineage information, data freshness, and quality. Best of all, dbt Explorer updates every time dbt jobs run, ensuring this information is always up-to-date and relevant. + +## What tech is at play? + +Here’s what you need from dbt. dbt should be used to manage data transformation pipelines and generate the datasets needed by ML engineers and data scientists maintaining the Snowflake Feature Store. dbt Cloud Enterprise users should leverage dbt Mesh to create different projects with clear owners for these different domains of data pipelines. This Mesh design will promote easier collaboration by keeping each dbt project smaller and more manageable for the people building and maintaining it. dbt also supports both SQL and Python-based transformations making it an ideal fit for AI/ML workflows, which commonly leverage both languages. + +Using dbt for the data transformation pipelines will also ensure the quality and consistency of data products, which is critical for ensuring successful AI/ML efforts. + +## Snowflake ML overview + +The Feature Store is one component of [Snowflake ML’s](https://www.snowflake.com/en/data-cloud/snowflake-ml/) integrated suite of machine learning features that powers end-to-end machine learning within a single platform. Data scientists and ML engineers leverage ready-to-use ML functions or build custom ML workflows all without any data movement or without sacrificing governance. Snowflake ML includes scalable feature engineering and model training capabilities. Meanwhile, the Feature Store and Model Registry allow teams to store and use features and models in production, providing an end-to-end suite for operating ML workloads at scale. + + +## What do you need to do to make it all work? + +dbt Cloud offers the fastest and easiest way to run dbt. It offers a Cloud-based IDE, Cloud-attached CLI, and even a low-code visual editor option (currently in beta), meaning it’s perfect for connecting users across different teams with different workflows and tooling preferences, which is very common in AI/ML workflows. This is the tool you will use to prepare and manage data for AI/ML, promote collaboration across the different teams needed for a successful AI/ML workflow, and ensure the quality and consistency of the underlying data that will be used to create features and train models. + +Organizations interested in AI/ML workflows through Snowflake should also look at the new dbt Snowflake Native App — a Snowflake Native Application that extends the functionality of dbt Cloud into Snowflake. Of particular interest is Ask dbt — a chatbot that integrates directly with Snowflake Cortex and the dbt Semantic Layer to allow natural language questions of Snowflake data. + + +## How to power ML pipelines with dbt and Snowflake’s Feature Store + +Let’s provide a brief example of what this workflow looks like in dbt and Snowflake to build and use the powerful capabilities of a Feature Store. For this example, consider that we have a data pipeline in dbt to process customer transaction data. Various data science teams in the organization need to derive features from these transactions to use in various models, including to predict fraud and perform customer segmentation and personalization. These different use cases all benefit from having related features, such as the count of transactions or purchased amounts over different periods of time (for example, the last day, 7 days, or 30 days) for a given customer. + +Instead of the data scientists building out their own workflows to derive these features, let’s look at the flow of using dbt to manage the feature pipeline and Snowflake’s Feature Store to solve this problem. The following subsections describe the workflow step by step. + +### Create feature tables as dbt models + +The first step consists of building out a feature table as a dbt model. Data scientists and data engineers plug in to existing dbt pipelines and derive a table that includes the underlying entity (for example, customer id, timestamp and feature values). The feature table aggregates the needed features at the appropriate timestamp for a given entity. Note that Snowflake provides various common feature and query patterns available [here](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/examples). So, in our example, we would see a given customer, timestamp, and features representing transaction counts and sums over various periods. Data scientists can use SQL or Python directly in dbt to build this table, which will push down the logic into Snowflake, allowing data scientists to use their existing skill set. + +Window aggregations play an important role in the creation of features. Because the logic for these aggregations is often complex, let’s see how Snowflake and dbt make this process easier by leveraging Don’t Repeat Yourself (DRY) principles. We’ll create a macro that will allow us to use Snowflake’s `range between` syntax in a repeatable way: + +```sql +{% macro rolling_agg(column, partition_by, order_by, interval='30 days', agg_function='sum') %} + {{ agg_function }}({{ column }}) over ( + partition by {{ partition_by }} + order by {{ order_by }} + range between interval '{{ interval }}' preceding and current row + ) +{% endmacro %} + +``` + +Now, we use this macro in our feature table to build out various aggregations of customer transactions over the last day, 7 days, and 30 days. Snowflake has just taken significant complexity away in generating appropriate feature values and dbt has just made the code even more readable and repeatable. While the following example is built in SQL, teams can also build these pipelines using Python directly. + +```sql + +select + tx_datetime, + customer_id, + tx_amount, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "1 days", "sum") }} + as tx_amount_1d, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "7 days", "sum") }} + as tx_amount_7d, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "30 days", "sum") }} + as tx_amount_30d, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "1 days", "avg") }} + as tx_amount_avg_1d, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "7 days", "avg") }} + as tx_amount_avg_7d, + {{ rolling_agg("TX_AMOUNT", "CUSTOMER_ID", "TX_DATETIME", "30 days", "avg") }} + as tx_amount_avg_30d, + {{ rolling_agg("*", "CUSTOMER_ID", "TX_DATETIME", "1 days", "count") }} + as tx_cnt_1d, + {{ rolling_agg("*", "CUSTOMER_ID", "TX_DATETIME", "7 days", "count") }} + as tx_cnt_7d, + {{ rolling_agg("*", "CUSTOMER_ID", "TX_DATETIME", "30 days", "count") }} + as tx_cnt_30d +from {{ ref("stg_transactions") }} + +``` + +### Create or connect to a Snowflake Feature Store + +Once a feature table is built in dbt, data scientists use Snowflake’s [snowflake-ml-python](https://docs.snowflake.com/en/developer-guide/snowflake-ml/snowpark-ml) package to create or connect to an existing Feature Store in Snowflake. Data scientists can do this all in Python, including in Jupyter Notebooks or directly in Snowflake using [Snowflake Notebooks](https://docs.snowflake.com/en/user-guide/ui-snowsight/notebooks). + +Let’s go ahead and create the Feature Store in Snowflake: + + +```sql +from snowflake.ml.feature_store import ( + FeatureStore, + FeatureView, + Entity, + CreationMode +) + +fs = FeatureStore( + session=session, + database=fs_db, + name=fs_schema, + default_warehouse='WH_DBT', + creation_mode=CreationMode.CREATE_IF_NOT_EXIST, +) + +``` + +### Create and register feature entities + +The next step consists of creating and registering [entities](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/entities). These represent the underlying objects that features are associated with, forming the join keys used for feature lookups. In our example, the data scientist can register various entities, including for the customer, a transaction id, or other necessary attributes. + +Let’s create some example entities. + +```python +customer = Entity(name="CUSTOMER", join_keys=["CUSTOMER_ID"]) +transaction = Entity(name="TRANSACTION", join_keys=["TRANSACTION_ID"]) +fs.register_entity(customer) +fs.register_entity(transaction) + +``` + +### Register feature tables as feature views + +After registering entities, the next step is to register a [feature view](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/feature-views). This represents a group of related features that stem from the features tables created in the dbt model. In this case, note that the feature logic, refresh, and consistency is managed by the dbt pipeline. The feature view in Snowflake enables versioning of the features while providing discoverability among teams. + +```python +# Create a dataframe from our feature table produced in dbt +customers_transactions_df = session.sql(f""" + SELECT + CUSTOMER_ID, + TX_DATETIME, + TX_AMOUNT_1D, + TX_AMOUNT_7D, + TX_AMOUNT_30D, + TX_AMOUNT_AVG_1D, + TX_AMOUNT_AVG_7D, + TX_AMOUNT_AVG_30D, + TX_CNT_1D, + TX_CNT_7D, + TX_CNT_30D + FROM {fs_db}.{fs_data_schema}.ft_customer_transactions + """) + +# Create a feature view on top of these features +customer_transactions_fv = FeatureView( + name="customer_transactions_fv", + entities=[customer], + feature_df=customers_transactions_df, + timestamp_col="TX_DATETIME", + refresh_freq=None, + desc="Customer transaction features with window aggregates") + +# Register the feature view for use beyond the session +customer_transactions_fv = fs.register_feature_view( + feature_view=customer_transactions_fv, + version="1", + #overwrite=True, + block=True) + +``` + +### Search and discover features in the Snowflake UI + +Now, with features created, teams can view their features directly in the Snowflake UI, as shown below. This enables teams to easily search and browse features, all governed through Snowflake’s role-based access control (RBAC). + + + +### Generate training dataset + +Now that the feature view is created, data scientists produce a [training dataset](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/modeling#generating-tables-for-training) that uses the feature view. In our example, whether the data scientist is building a fraud or segmentation model, they will retrieve point-in-time correct features for a customer at a specific point in time using the Feature Store’s `generate_training_set` method. + +To generate the training set, we need to supply a spine dataframe, representing the entities and timestamp values that we will need to retrieve features for. The following example shows this using a few records, although teams can leverage other tables to produce this spine. + +```python +spine_df = session.create_dataframe( + [ + ('1', '3937', "2019-05-01 00:00"), + ('2', '2', "2019-05-01 00:00"), + ('3', '927', "2019-05-01 00:00"), + ], + schema=["INSTANCE_ID", "CUSTOMER_ID", "EVENT_TIMESTAMP"]) + +train_dataset = fs.generate_dataset( + name= "customers_fv", + version= "1_0", + spine_df=spine_df, + features=[customer_transactions_fv], + spine_timestamp_col= "EVENT_TIMESTAMP", + spine_label_cols = [] +) + +``` + +Now that we have produced the training dataset, let’s see what it looks like. + + + +### Train and deploy a model + +Now with this training set, data scientists can use [Snowflake Snowpark](https://docs.snowflake.com/en/developer-guide/snowpark/index) and [Snowpark ML Modeling](https://docs.snowflake.com/en/developer-guide/snowflake-ml/modeling) to use familiar Python frameworks for additional preprocessing, feature engineering, and model training all within Snowflake. The model can be registered in the Snowflake [Model Registry](https://docs.snowflake.com/en/developer-guide/snowflake-ml/model-registry/overview) for secure model management. Note that we will leave the model training for you as part of this exercise. + +### Retrieve features for predictions + +For inference, data pipelines retrieve feature values using the [retrieve_feature_values](https://docs.snowflake.com/en/developer-guide/snowflake-ml/feature-store/modeling#retrieving-features-and-making-predictions) method. These retrieved values can be fed directly to a model’s predict capability in your Python session using a developed model or by invoking a model’s predict method from Snowflake’s Model Registry. For batch scoring purposes, teams can build this entire pipeline using [Snowflake ML](https://docs.snowflake.com/en/developer-guide/snowflake-ml/overview). The following code demonstrates how the features are retrieved using this method. + +```python +infernce_spine = session.create_dataframe( + [ + ('1', '3937', "2019-07-01 00:00"), + ('2', '2', "2019-07-01 00:00"), + ('3', '927', "2019-07-01 00:00"), + ], + schema=["INSTANCE_ID", "CUSTOMER_ID", "EVENT_TIMESTAMP"]) + +inference_dataset = fs.retrieve_feature_values( + spine_df=infernce_spine, + features=[customer_transactions_fv], + spine_timestamp_col="EVENT_TIMESTAMP", +) + +inference_dataset.to_pandas() + +``` + +Here’s an example view of our features produced for model inferencing. + + + +## Conclusion + +We’ve just seen how quickly and easily you can begin to develop features through dbt and leverage the Snowflake Feature Store to deliver predictive modeling as part of your data pipelines. The ability to build and deploy ML models, including integrating feature storage, data transformation, and ML logic within a single platform, simplifies the entire ML life cycle. Combining this new power with the well-established partnership of dbt and Snowflake unlocks even more potential for organizations to safely build and explore new AI/ML use cases and drive further collaboration in the organization. + +The code used in the examples above is publicly available on [GitHub](https://github.com/sfc-gh-rpettus/dbt-feature-store). Also, you can run a full example yourself in this [quickstart guide](https://quickstarts.snowflake.com/guide/getting-started-with-feature-store-and-dbt/index.html?index=..%2F..index#0) from the Snowflake docs. diff --git a/website/blog/2024-11-04-test-smarter-not-harder.md b/website/blog/2024-11-04-test-smarter-not-harder.md new file mode 100644 index 00000000000..58adfb38cb9 --- /dev/null +++ b/website/blog/2024-11-04-test-smarter-not-harder.md @@ -0,0 +1,163 @@ +--- +title: "Test smarter not harder: add the right tests to your dbt project" +description: "Testing your data should drive action, not accumulate alerts. We synthesized countless customer experiences to build a repeatable testing framework." +slug: test-smarter-not-harder + +authors: [faith_mckenna, jerrie_kumalah_kenney] + +tags: [analytics craft] +hide_table_of_contents: false + +date: 2024-11-11 +is_featured: true +--- + + + +The [Analytics Development Lifecycle (ADLC)](https://www.getdbt.com/resources/guides/the-analytics-development-lifecycle) is a workflow for improving data maturity and velocity. Testing is a key phase here. Many dbt developers tend to focus on [primary keys and source freshness.](https://www.getdbt.com/blog/building-a-data-quality-framework-with-dbt-and-dbt-cloud) We think there is a more holistic and in-depth path to tread. Testing is a key piece of the ADLC, and it should drive data quality. + +In this blog, we’ll walk through a plan to define data quality. This will look like: + +- identifying *data hygiene* issues +- identifying *business-focused anomaly* issues +- identifying *stats-focused anomaly* issues + +Once we have *defined* data quality, we’ll move on to *prioritize* those concerns. We will: + +- think through each concern in terms of the breadth of impact +- decide if each concern should be at error or warning severity + + + +### Who are we? + +Let’s start with introductions - we’re Faith and Jerrie, and we work on dbt Labs’s training and services teams, respectively. By working closely with countless companies using dbt, we’ve gained unique perspectives of the landscape. + +The training team collates problems organizations think about today and gauge how our solutions fit. These are shorter engagements, which means we see the data world shift and change in real time. Resident Architects spend much more time with teams to craft much more in-depth solutions, figure out where those solutions are helping, and where problems still need to be addressed. Trainers help identify patterns in the problems data teams face, and Resident Architects dive deep on solutions. + +Today, we’ll guide you through a particularly thorny problem: testing. + +## Why testing? + +Mariah Rogers broke early ground on data quality and testing in her [Coalesce 2022 talk](https://www.youtube.com/watch?v=hxvVhmhWRJA). We’ve seen similar talks again at Coalesce 2024, like [this one](https://www.youtube.com/watch?v=iCG-5vqMRAo) from the data team at Aiven and [this one](https://www.youtube.com/watch?v=5bRG3y9IM4Q&list=PL0QYlrC86xQnWJ72sJlzDqPS0peE7j9Ed&index=71) from the co-founder at Omni Analytics. These talks share a common theme: testing your dbt project too much can get out of control quickly, leading to alert fatigue. + +In our customer engagements, we see *wildly different approaches* to testing data. We’ve definitely seen what Mariah, the Aiven team, and the Omni team have described, which is so many tests that errors and alerts just become noise. We’ve also seen the opposite end of the spectrum—only primary keys being tested. From our field experiences, we believe there’s room for a middle path. +A desire for a better approach to data quality and testing isn’t just anecdotal to Coalesce, or to dbt’s training and services. The dbt community has long called for a more intentional approach to data quality and testing - data quality is on the industry’s mind! In fact, [57% of respondents](https://www.getdbt.com/resources/reports/state-of-analytics-engineering-2024) to dbt’s 2024 State of Analytics Engineering survey said that data quality is a predominant issue facing their day-to-day work. + +### What does d@tA qUaL1Ty even mean?! + +High-quality data is *trusted* and *used frequently.* It doesn’t get argued over or endlessly scrutinized for matching to other data. Data *testing* should lead to higher data *quality* and insights, period. + +Best practices in data quality are still nascent. That said, a lot of important baseline work has been done here. There are [case](https://medium.com/@AtheonAnalytics/mastering-data-testing-with-dbt-part-1-689b2a025675) [studies](https://medium.com/@AtheonAnalytics/mastering-data-testing-with-dbt-part-2-c4031af3df18) on implementing dbt testing well. dbt Labs also has an [Advanced Testing](https://learn.getdbt.com/courses/advanced-testing) course, emphasizing that testing should spur action and be focused and informative enough to help address failures. You can even enforce testing best practices and dbt Labs’s own best practices using the [dbt_meta_testing](https://hub.getdbt.com/tnightengale/dbt_meta_testing/latest/) or [dbt_project_evaluator](https://github.com/dbt-labs/dbt-project-evaluator) packages and dbt Explorer’s [Recommendations](https://docs.getdbt.com/docs/collaborate/project-recommendations) page. + +The missing piece is still cohesion and guidance for everyday practitioners to help develop their testing framework. + +To recap, we’re going to start with: + +- identifying *data hygiene* issues +- identifying *business-focused anomaly* issues +- identifying *stats-focused anomaly* issues + +Next, we’ll prioritize. We will: + +- think through each concern in terms of the breadth of impact +- decide if each concern should be at error or warning severity + +Get a pen and paper (or a google doc) and join us in constructing your own testing framework. + +## Identifying data quality issues in your pipeline + +Let’s start our framework by *identifying* types of data quality issues. + +In our daily work with customers, we find that data quality issues tend to fall into one of three broad buckets: *data hygiene, business-focused anomalies,* and *stats-focused anomalies.* Read the bucket descriptions below, and list 2-3 data quality concerns in your own business context that fall into each bucket. + +### Bucket 1: Data hygiene + +*Data hygiene* issues are concerns you address in your [staging layer.](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) Hygienic data meets your expectations around formatting, completeness, and granularity requirements. Here are a few examples. + +- *Granularity:* primary keys are unique and not null. Duplicates throw off calculations. +- *Completeness:* columns that should always contain text, *do.* Incomplete data often has to get excluded, reducing your overall analytical power. +- *Formatting:* email addresses always have a valid domain. Incorrect emails may affect things like marketing outreach. + +### Bucket 2: Business-focused anomalies + +*Business-focused anomalies* catch unexpected behavior. You can flag unexpected behavior by clearly defining *expected* behavior. *Business-focused anomalies* are when aspects of the data differ from what you know to be typical in your business. You’ll know what’s typical either through your own analyses, your colleagues’ analyses, or things your stakeholder homies point out to you. + +Since business-focused anomaly testing is set by a human, it will be fluid and need to be adjusted periodically. Here’s an example. + +Imagine you’re a sales analyst. Generally, you know that if your daily sales amount goes up or down by more than 20% daily, that’s bad. Specifically, it’s usually a warning sign for fraud or the order management system (OMS) dropping orders. You set a test in dbt to fail if any given day’s sales amount is a delta of 20% from the previous day. This works for a while. + +Then, you have a stretch of 3 months where your test fails 5 times a week! Every time you investigate, it turns out to be valid consumer behavior. You’re suddenly in hypergrowth, and sales are legitimately increasing that much. + +Your 20%-change fraud and OMS failure detector is no longer valid. You need to investigate anew which sales spikes or drops indicate fraud or OMS problems. Once you figure out a new threshold, you’ll go back and adjust your testing criteria. + +Although your data’s expected behavior will shift over time, you should still commit to defining business-focused anomalies to grow your understanding of what is normal for your data. + +Here’s how to identify potential anomalies. + +Start at your business intelligence (BI) layer. Pick 1-3 dashboards or tables that you *know* are used frequently. List these 1-3 dashboards or tables. For each dashboard or table you have, identify 1-3 “expected” behaviors that your end-users rely on. Here are a few examples to get you thinking: + +- Revenue numbers should not change by more than X% in Y amount of time. This could indicate fraud or OMS problems. +- Monthly active users should not decline more than X% after the initial onboarding period. This might indicate user dissatisfaction, usability issues, or that users not finding a feature valuable. +- Exam passing rates should stay above Y%. A decline below that threshold may indicate recent content changes or technical issues are affecting understanding or accessibility. + +You should also consider what data issues you have had in the past! Look through recent data incidents and pick out 3 or 4 to guard against next time. These might be in a #data-questions channel or perhaps a DM from a stakeholder. + +### Bucket 3: Stats-focused anomalies + +*Stats-focused anomalies* are fluctuations that go against your expected volumes or metrics. Some examples include: + +- Volume anomalies. This could be site traffic amounts that may indicate illicit behavior, or perhaps site traffic dropping one day then doubling the next, indicating that a chunk of data were not loaded properly. +- Dimensional anomalies, like too many product types underneath a particular product line that may indicate incorrect barcodes. +- Column anomalies, like sale values more than a certain number of standard deviations from a mean, that may indicate improper discounting. + +Overall, stats-focused anomalies can indicate system flaws, illicit site behavior, or fraud, depending on your industry. They also tend to require more advanced testing practices than we are covering in this blog. We feel stats-based anomalies are worth exploring once you have a good handle on your data hygiene and business-focused anomalies. We won’t give recommendations on stats-focused anomalies in this post. + +## How to prioritize data quality concerns in your pipeline + +Now, you have a written and categorized list of data hygiene concerns and business-focused anomalies to guard against. It’s time to *prioritize* which quality issues deserve to fail your pipelines. + +To prioritize your data quality concerns, think about real-life impact. A couple of guiding questions to consider are: + +- Are your numbers *customer-facing?* For example, maybe you work with temperature-tracking devices. Your customers rely on these devices to show them average temperatures on perishable goods like strawberries in-transit. What happens if the temperature of the strawberries reads as 300C when they know their refrigerated truck was working just fine? How is your brand perception impacted when the numbers are wrong? +- Are your numbers *used to make financial decisions?* For example, is the marketing team relying on your numbers to choose how to spend campaign funds? +- Are your numbers *executive-facing?* Will executives use these numbers to reallocate funds or shift priorities? + +We think these 3 categories above constitute high-impact, pipeline-failing events, and should be your top priorities. Of course, adjust priority order if your business context calls for it. + +Consult your list of data quality issues in the categories we mention above. Decide and mark if any are customer facing, used for financial decisions, or are executive-facing. Mark any data quality issues in those categories as “error”. These are your pipeline-failing events. + +If any data quality concerns fall outside of these 3 categories, we classify them as **nice-to-knows**. **Nice-to-know** data quality testing *can* be helpful. But if you don’t have a *specific action you can immediately take* when a nice-to-know quality test fails, the test *should be a warning, not an error.* + +You could also remove nice-to-know tests altogether. Data testing should drive action. The more alerts you have in your pipeline, the less action you will take. Configure alerts with care! + +However, we do think nice-to-know tests are worth keeping *if and only if* you are gathering evidence for action you plan to take within the next 6 months, like product feature research. In a scenario like that, those tests should still be set to warning. + +### Start your action plan + +Now, your data quality concerns are listed and prioritized. Next, add 1 or 2 initial debugging steps you will take if/when the issues surface. These steps should get added to your framework document. Additionally, consider adding them to a [test’s description.](https://discourse.getdbt.com/t/is-it-possible-to-add-a-description-to-singular-tests/5472/4) + +This step is *important.* Data quality testing should spur action, not accumulate alerts. Listing initial debugging steps for each concern will refine your list to the most critical elements. + +If you can't identify an action step for any quality issue, *remove it*. Put it on a backlog and research what you can do when it surfaces later. + +Here’s a few examples from our list of unexpected behaviors above. + +- For calculated field X, a value above Y or below Z is not possible. + - *Debugging initial steps* + - Use dbt test SQL or recent test results in dbt Explorer to find problematic rows + - Check these rows in staging and first transformed model + - Pinpoint where unusual values first appear +- Revenue shouldn’t change by more than X% in Y amount of time. + - *Debugging initial steps:* + - Check recent revenue values in staging model + - Identify transactions near min/max values + - Discuss outliers with sales ops team + +You now have written out a prioritized list of data quality concerns, as well as action steps to take when each concern surfaces. Next, consult [hub.getdbt.com](http://hub.getdbt.com) and find tests that address each of your highest priority concerns. [dbt-expectations](https://hub.getdbt.com/calogica/dbt_expectations/latest/) and [dbt_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/) are great places to start. + +The data tests you’ve marked as “errors” above should get error-level severity. Any concerns falling into that nice-to-know category should either *not get tested* or have their tests *set to warning.* + +Your data quality priorities list is a living reference document. We recommend linking it in your project’s README so that you can go back and edit it as your testing needs evolve. Additionally, developers in your project should have easy access to this document. Maintaining good data quality is everyone’s responsibility! + +As you try these ideas out, come to the dbt Community Slack and let us know what works and what doesn’t. Data is a community of practice, and we are eager to hear what comes out of yours. diff --git a/website/blog/2024-11-27-test-smarter-part-2.md b/website/blog/2024-11-27-test-smarter-part-2.md new file mode 100644 index 00000000000..4fabe066011 --- /dev/null +++ b/website/blog/2024-11-27-test-smarter-part-2.md @@ -0,0 +1,125 @@ +--- +title: "Test smarter not harder: Where should tests go in your pipeline?" +description: "Testing your data should drive action, not accumulate alerts. We take our testing framework developed in our last post and make recommendations for where tests ought to go at each transformation stage." +slug: test-smarter-where-tests-should-go + +authors: [faith_mckenna, jerrie_kumalah_kenney] + +tags: [analytics craft] +hide_table_of_contents: false + +date: 2024-12-09 +is_featured: true +--- + +👋 Greetings, dbt’ers! It’s Faith & Jerrie, back again to offer tactical advice on *where* to put tests in your pipeline. + +In [our first post](/blog/test-smarter-not-harder) on refining testing best practices, we developed a prioritized list of data quality concerns. We also documented first steps for debugging each concern. This post will guide you on where specific tests should go in your data pipeline. + +*Note that we are constructing this guidance based on how we [structure data at dbt Labs.](/best-practices/how-we-structure/1-guide-overview#guide-structure-overview)* You may use a different modeling approach—that’s okay! Translate our guidance to your data’s shape, and let us know in the comments section what modifications you made. + +First, here’s our opinions on where specific tests should go: + +- Source tests should be fixable data quality concerns. See the [callout box below](#sources) for what we mean by “fixable”. +- Staging tests should be business-focused anomalies specific to individual tables, such as accepted ranges or ensuring sequential values. In addition to these tests, your staging layer should clean up any nulls, duplicates, or outliers that you can’t fix in your source system. You generally don’t need to test your cleanup efforts. +- Intermediate and marts layer tests should be business-focused anomalies resulting specifically from joins or calculations. You also may consider adding additional primary key and not null tests on columns where it’s especially important to protect the grain. + + + +## Where should tests go in your pipeline? + +![A horizontal, multicolored diagram that shows examples of where tests ought to be placed in a data pipeline.](/img/blog/2024-11-27-test-smarter-part-2/testing_pipeline.png) + +This diagram above outlines where you might put specific data tests in your pipeline. Let’s expand on it and discuss where each type of data quality issue should be tested. + +### Sources + +Tests applied to your sources should indicate *fixable-at-the-source-system* issues. If your source tests flag source system issues that aren’t fixable, remove the test and mitigate the problem in your staging layer instead. + +:::tip[What does fixable mean?] +We consider a "fixable-at-the-source-system" issue to be something that: + +- You yourself can fix in the source system. +- You know the right person to fix it and have a good enough relationship with them that you know you can *get it fixed.* + +You may have issues that can *technically* get fixed at the source, but it won't happen till the next planning cycle, or you need to develop better relationships to get the issue fixed, or something similar. This demands a more nuanced approach than we'll cover in this post. If you have thoughts on this type of situation, let us know! + +::: + +Here’s our recommendation for what tests belong on your sources. + +- Source freshness: testing data freshness for sources that are critical to your pipelines. + - If any sources feed into any of the “top 3” [priority categories](https://docs.getdbt.com/blog/test-smarter-not-harder#how-to-prioritize-data-quality-concerns-in-your-pipeline) in our last post, use [`dbt source freshness`](https://docs.getdbt.com/docs/deploy/source-freshness) in your job execution commands and set the severity to `error`. That way, if source freshness fails, so does your job. + - If none of your sources feed into high priority categories, set your source freshness severity to `warn` and add source freshness to your job execution commands. That way, you still get source freshness information but stale data won't fail your pipeline. +- Data hygiene: tests that are *fixable* in the source system (see our note above on “fixability”). + - Examples: + - Duplicate customer records that can be deleted in the source system + - Null records, such as a customer name or email address, that can be entered into the source system + - Primary key testing where duplicates are removable in the source system + +### Staging + +In the staging layer, your models should be cleaning up or mitigating data issues that can't be fixed at the source. Your tests should be focused on business anomaly detection. + +- Data cleanup and issue mitigation: Use our [best practices around staging layers](https://docs.getdbt.com/best-practices/how-we-structure/2-staging) to clean things up. Don’t add tests to your cleanup efforts. If you’re filtering out nulls in a column, adding a not_null test is repetitive! 🌶️ +- Business-focused anomaly examples: these are data quality issues you *should* test for in your staging layer, because they fall outside of your business’s defined norms. These might be: + - Values inside a single column that fall outside of an acceptable range. For example, a store selling a greater quantity of limited-edition items than they received in their stock delivery. + - Values that should always be positive, are positive. This might look like a negative transaction amount that isn’t classified as a return. This failing test would then spur further investigation into the offending transaction. + - An unexpected uptick in volume of a quantity column beyond a pre-defined percentage. This might look like a store’s customer volume spiking unexpectedly and outside of expected seasonal norms. This is an anomaly that could indicate a bug or modeling issue. + +### Intermediate (if applicable) + +In your intermediate layer, focus on data hygiene and anomaly tests for new columns. Don’t re-test passthrough columns from sources or staging. Here are some examples of tests you might put in your intermediate layer based on the use cases of intermediate models we [outline in this guide](/best-practices/how-we-structure/3-intermediate#intermediate-models). + +- Intermediate models often re-grain models to prepare them for marts. + - Add a primary key test to any re-grained models. + - Additionally, consider adding a primary key test to models where the grain *has remained the same* but has been *enriched.* This helps future-proof your enriched models against future developers who may not be able to glean your intention from SQL alone. +- Intermediate models may perform a first set of joins or aggregations to reduce complexity in a final mart. + - Add simple anomaly tests to verify the behavior of your sets of joins and aggregations. This may look like: + - An [accepted_values](/reference/resource-properties/data-tests#accepted_values) test on a newly calculated categorical column. + - A [mutually_exclusive_ranges](https://github.com/dbt-labs/dbt-utils#mutually_exclusive_ranges-source) test on two columns whose values behave in relation to one another (ex: asserting age ranges do not overlap). + - A [not_constant](https://github.com/dbt-labs/dbt-utils#not_constant-source) test on a column whose value should be continually changing (ex: page view counts on website analytics). +- Intermediate models may isolate complex operations. + - The anomaly tests we list above may suffice here. + - You might also consider [unit testing](/docs/build/unit-tests) any particularly complex pieces of SQL logic. + +### Marts + +Marts layer testing will follow the same hygiene-or-anomaly pattern as staging and intermediate. Similar to your intermediate layer, you should focus your testing on net-new columns in your marts layer. This might look like: + +- Unit tests: validate especially complex transformation logic. For example: + - Calculating dates in a way that feeds into forecasting. + - Customer segmentation logic, especially logic that has a lot of CASE-WHEN statements. +- Primary key tests: focus on where where your mart's granularity has changed from its staging/intermediate inputs. + - Similar to the intermediate models above, you may also want to add primary key tests to models whose grain hasn’t changed, but have been enriched with other data. Primary key tests here communicate your intent. +- Business focused anomaly tests: focus on *new* calculated fields, such as: + - Singular tests on high-priority, high-impact tables where you have a specific problem you want forewarning about. + - This might be something like fuzzy matching logic to detect when the same person is making multiple emails to extend a free trial beyond its acceptable end date. + - A test for calculated numerical fields that shouldn’t vary by more than certain percentage in a week. + - A calculated ledger table that follows certain business rules, i.e. today’s running total of spend must always be greater than yesterday’s. + +### CI/CD + +All of the testing you’ve applied in your different layers is the manual work of constructing your framework. CI/CD is where it gets automated. + +You should run a [slim CI](/best-practices/best-practice-workflows#run-only-modified-models-to-test-changes-slim-ci) to optimize your resource consumption. + +With CI/CD and your regular production runs, your testing framework can be on autopilot. 😎 + +If and when you encounter failures, consult your trusty testing framework doc you built in our [earlier post](/blog/test-smarter-not-harder). + +### Advanced CI + +In the early stages of your smarter testing journey, start with dbt Cloud’s built-in flags for [advanced CI](/docs/deploy/advanced-ci). In PRs with advanced CI enabled, dbt Cloud will flag what has been modified, added, or removed in the “compare changes” section. These three flags offer confidence and evidence that your changes are what you expect. Then, hand them off for peer review. Advanced CI helps jump start your colleague’s review of your work by bringing all of the implications of the change into one place. + +We consider usage of Advanced CI beyond the modified, added, or changed gut checks to be an advanced (heh) testing strategy, and look forward to hearing how you use it. + +## Wrapping it all up + +Judicious data testing is like training for a marathon. It’s not productive to go run 20 miles a day and hope that you’ll be marathon-ready and uninjured. Similarly, throwing data tests randomly at your data pipeline without careful thought is not going to tell you much about your data quality. + +Runners go into marathons with training plans. Analytics engineers who care about data quality approach the issue with a plan, too. + +As you try out some of the guidance above here, remember that your testing needs are going to evolve over time. Don’t be afraid to revise your original testing strategy. + +Let us know your thoughts on these strategies in the comments section. Try them out, and share your thoughts to help us refine them. diff --git a/website/blog/authors.yml b/website/blog/authors.yml index 85f05a545f9..3070ec806b5 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -1,7 +1,7 @@ --- amy_chen: image_url: /img/blog/authors/achen.png - job_title: Product Ecosystem Manager + job_title: Product Manager links: - icon: fa-linkedin url: https://www.linkedin.com/in/yuanamychen/ @@ -214,6 +214,14 @@ euan_johnston: - icon: fa-github url: https://github.com/euanjohnston-dev name: Euan Johnston +faith_mckenna: + image_url: /img/blog/authors/faith_pic.png + job_title: Senior Technical Instructor + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/faithlierheimer/ + name: Faith McKenna + organization: dbt Labs filip_byrĂŠn: image_url: /img/blog/authors/filip-eqt.png job_title: VP and Software Architect @@ -275,6 +283,14 @@ jeremy_cohen: job_title: Product Manager name: Jeremy Cohen organization: dbt Labs +jerrie_kumalah_kenney: + image_url: /img/blog/authors/jerrie.jpg + job_title: Resident Architect + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/jerriekumalah/ + name: Jerrie Kumalah Kenney + organization: dbt Labs jess_williams: image_url: /img/blog/authors/jess.png job_title: Head of Professional Services @@ -386,6 +402,14 @@ lucas_bergodias: job_title: Analytics Engineer name: Lucas Bergo Dias organization: Indicium Tech +luis_leon: + image_url: /img/blog/authors/luis-leon.png + job_title: Partner Solutions Architect + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/luis-leon-03965463/ + name: Luis Leon + organization: dbt Labs matt_winkler: description: Matt is an ex-data scientist who chose to embrace the simplicity of using SQL to manage and testing data pipelines with dbt. He previously worked as a hands-on ML practitioner, and consulted with Fortune 500 clients to build and maintain ML Ops pipelines using (mostly) AWS Sagemaker. He lives in the Denver area, and you can say hello on dbt Slack or on LinkedIn. image_url: /img/blog/authors/matt-winkler.jpeg @@ -449,6 +473,14 @@ pedro_brito_de_sa: url: https://www.linkedin.com/in/pbritosa/ name: Pedro Brito de Sa organization: Sage +randy_pettus: + image_url: /img/blog/authors/randy-pettus.png + job_title: Senior Partner Sales Engineer + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/randypettus/ + name: Randy Pettus + organization: Snowflake rastislav_zdechovan: image_url: /img/blog/authors/rastislav-zdechovan.png job_title: Analytics Engineer @@ -590,4 +622,4 @@ yu_ishikawa: - icon: fa-linkedin url: https://www.linkedin.com/in/yuishikawa0301 name: Yu Ishikawa - organization: Ubie \ No newline at end of file + organization: Ubie diff --git a/website/blog/ctas.yml b/website/blog/ctas.yml index ac56d4cc749..1f9b13afa7b 100644 --- a/website/blog/ctas.yml +++ b/website/blog/ctas.yml @@ -25,3 +25,8 @@ subheader: Coalesce is the premiere analytics engineering conference! Sign up now for innovation, collaboration, and inspiration. Don't miss out! button_text: Register now url: https://coalesce.getdbt.com/register +- name: coalesce_2024_catchup + header: Missed Coalesce 2024? + subheader: Catch up on Coalesce 2024 and register to access a select number of on-demand sessions. + button_text: Register and watch + url: https://coalesce.getdbt.com/register/online diff --git a/website/blog/metadata.yml b/website/blog/metadata.yml index d0009fd62c4..8b53a7a2a04 100644 --- a/website/blog/metadata.yml +++ b/website/blog/metadata.yml @@ -2,7 +2,7 @@ featured_image: "" # This CTA lives in right sidebar on blog index -featured_cta: "coalesce_2024_signup" +featured_cta: "coalesce_2024_catchup" # Show or hide hero title, description, cta from blog index show_title: true diff --git a/website/dbt-versions.js b/website/dbt-versions.js index 60efef64f75..bee90e3b9ed 100644 --- a/website/dbt-versions.js +++ b/website/dbt-versions.js @@ -15,8 +15,13 @@ */ exports.versions = [ { - version: "1.9.1", - customDisplay: "Cloud (Versionless)", + version: "1.10", + customDisplay: "Cloud (Latest)", + }, + { + version: "1.9", + customDisplay: "1.9 (Compatible)", + EOLDate: "2025-12-08", }, { version: "1.8", @@ -24,11 +29,7 @@ exports.versions = [ }, { version: "1.7", - EOLDate: "2024-10-30", - }, - { - version: "1.6", - EOLDate: "2024-07-31", + EOLDate: "2024-11-01", }, ]; @@ -42,6 +43,14 @@ exports.versions = [ * @property {string} lastVersion The last version the page is visible in the sidebar */ exports.versionedPages = [ + { + page: "docs/build/incremental-microbatch", + firstVersion: "1.9", + }, + { + page: "reference/resource-configs/snapshot_meta_column_names", + firstVersion: "1.9", + }, { page: "reference/resource-configs/target_database", lastVersion: "1.8", @@ -54,134 +63,6 @@ exports.versionedPages = [ page: "reference/global-configs/indirect-selection", firstVersion: "1.8", }, - { - page: "reference/resource-configs/store_failures_as", - firstVersion: "1.7", - }, - { - page: "docs/build/build-metrics-intro", - firstVersion: "1.6", - }, - { - page: "docs/build/sl-getting-started", - firstVersion: "1.6", - }, - { - page: "docs/build/about-metricflow", - firstVersion: "1.6", - }, - { - page: "docs/build/join-logic", - firstVersion: "1.6", - }, - { - page: "docs/build/validation", - firstVersion: "1.6", - }, - { - page: "docs/build/semantic-models", - firstVersion: "1.6", - }, - { - page: "docs/build/group-by", - firstVersion: "1.6", - }, - { - page: "docs/build/entities", - firstVersion: "1.6", - }, - { - page: "docs/build/metrics-overview", - firstVersion: "1.6", - }, - { - page: "docs/build/cumulative", - firstVersion: "1.6", - }, - { - page: "docs/build/derived", - firstVersion: "1.6", - }, - { - page: "docs/build/measure-proxy", - firstVersion: "1.6", - }, - { - page: "docs/build/ratio", - firstVersion: "1.6", - }, - { - page: "reference/commands/clone", - firstVersion: "1.6", - }, - { - page: "docs/collaborate/govern/project-dependencies", - firstVersion: "1.6", - }, - { - page: "reference/dbt-jinja-functions/thread_id", - firstVersion: "1.6", - }, - { - page: "reference/resource-properties/deprecation_date", - firstVersion: "1.6", - }, - { - page: "reference/commands/retry", - firstVersion: "1.6", - }, - { - page: "docs/build/groups", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-contracts", - firstVersion: "1.5", - }, - { - page: "reference/commands/show", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-access", - firstVersion: "1.5", - }, - { - page: "docs/collaborate/govern/model-versions", - firstVersion: "1.5", - }, - { - page: "reference/programmatic-invocations", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/contract", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/group", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/access", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/constraints", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/latest_version", - firstVersion: "1.5", - }, - { - page: "reference/resource-properties/versions", - firstVersion: "1.5", - }, - { - page: "reference/resource-configs/on_configuration_change", - firstVersion: "1.6", - }, ]; /** @@ -194,12 +75,5 @@ exports.versionedPages = [ * @property {string} firstVersion The first version the category is visible in the sidebar */ exports.versionedCategories = [ - { - category: "Model governance", - firstVersion: "1.5", - }, - { - category: "Build your metrics", - firstVersion: "1.6", - }, + ]; diff --git a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md index 7990cf6752f..da882dba6c5 100644 --- a/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md +++ b/website/docs/best-practices/how-we-build-our-metrics/semantic-layer-3-build-semantic-models.md @@ -241,7 +241,9 @@ measures: ## Reviewing our work -Our completed code will look like this, our first semantic model! +Our completed code will look like this, our first semantic model! Here are two examples showing different organizational approaches: + + @@ -288,6 +290,68 @@ semantic_models: description: The total tax paid on each order. agg: sum ``` + + + + + + +```yml +semantic_models: + - name: orders + defaults: + agg_time_dimension: ordered_at + description: | + Order fact table. This table is at the order grain with one row per order. + + model: ref('stg_orders') + + entities: + - name: order_id + type: primary + - name: location + type: foreign + expr: location_id + - name: customer + type: foreign + expr: customer_id + + dimensions: + - name: ordered_at + expr: date_trunc('day', ordered_at) + # use date_trunc(ordered_at, DAY) if using BigQuery + type: time + type_params: + time_granularity: day + - name: is_large_order + type: categorical + expr: case when order_total > 50 then true else false end + + measures: + - name: order_total + description: The total revenue for each order. + agg: sum + - name: order_count + description: The count of individual orders. + expr: 1 + agg: sum + - name: tax_paid + description: The total tax paid on each order. + agg: sum +``` + + +As you can see, the content of the semantic model is identical in both approaches. The key differences are: + +1. **File location** + - Co-located approach: `models/marts/orders.yml` + - Parallel sub-folder approach: `models/semantic_models/sem_orders.yml` + +2. **File naming** + - Co-located approach: Uses the same name as the corresponding mart (`orders.yml`) + - Parallel sub-folder approach: Prefixes the file with `sem_` (`sem_orders.yml`) + +Choose the approach that best fits your project structure and team preferences. The co-located approach is often simpler for new projects, while the parallel sub-folder approach can be clearer for migrating large existing projects to the Semantic Layer. ## Next steps diff --git a/website/docs/best-practices/how-we-mesh/mesh-2-who-is-dbt-mesh-for.md b/website/docs/best-practices/how-we-mesh/mesh-2-who-is-dbt-mesh-for.md index b6fadc2d7a6..4c8adfa86a1 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-2-who-is-dbt-mesh-for.md +++ b/website/docs/best-practices/how-we-mesh/mesh-2-who-is-dbt-mesh-for.md @@ -23,9 +23,6 @@ Is dbt Mesh a good fit in this scenario? Absolutely! There is no other way to sh - Onboarding hundreds of people and dozens of projects is full of friction! The challenges of a scaled, global organization are not to be underestimated. To start the migration, prioritize teams that have strong dbt familiarity and fundamentals. dbt Mesh is an advancement of core dbt deployments, so these teams are likely to have a smoother transition. Additionally, prioritize teams that manage strategic data assets that need to be shared widely. This ensures that dbt Mesh will help your teams deliver concrete value quickly. -- Bi-directional project dependencies -- currently, projects in dbt Mesh are treated like dbt resources in that they cannot depend on each other. However, many teams may want to be able to share data assets back and forth between teams. - - We've added support for [enabling bidirectional dependencies](/best-practices/how-we-mesh/mesh-3-structures#cycle-detection) across projects. If this sounds like your organization, dbt Mesh is the architecture you should pursue. ✅ diff --git a/website/docs/best-practices/how-we-mesh/mesh-3-structures.md b/website/docs/best-practices/how-we-mesh/mesh-3-structures.md index c75c566610b..38066811d8a 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-3-structures.md +++ b/website/docs/best-practices/how-we-mesh/mesh-3-structures.md @@ -66,7 +66,7 @@ Since the launch of dbt Mesh, the most common pattern we've seen is one where pr Users may need to contribute models across multiple projects and this is fine. There will be some friction doing this, versus a single repo, but this is _useful_ friction, especially if upstreaming a change from a “spoke” to a “hub.” This should be treated like making an API change, one that the other team will be living with for some time to come. You should be concerned if your teammates find they need to make a coordinated change across multiple projects very frequently (every week), or as a key prerequisite for ~20%+ of their work. -### Cycle detection +### Cycle detection import CycleDetection from '/snippets/_mesh-cycle-detection.md'; diff --git a/website/docs/best-practices/how-we-mesh/mesh-4-implementation.md b/website/docs/best-practices/how-we-mesh/mesh-4-implementation.md index f1fb7422acf..a884de90c49 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-4-implementation.md +++ b/website/docs/best-practices/how-we-mesh/mesh-4-implementation.md @@ -80,7 +80,7 @@ models: ## Split your projects 1. **Move your grouped models into a subfolder**. This will include any model in the selected group, it's associated YAML entry, as well as its parent or child resources as appropriate depending on where this group sits in your DAG. - 1. Note that just like in your dbt project, circular refereneces are not allowed! Project B cannot have parents and children in Project A, for example. + 1. Note that just like in your dbt project, circular references are not allowed! Project B cannot have parents and children in Project A, for example. 2. **Create a new `dbt_project.yml` file** in the subdirectory. 3. **Copy any macros** used by the resources you moved. 4. **Create a new `packages.yml` file** in your subdirectory with the packages that are used by the resources you moved. diff --git a/website/docs/best-practices/how-we-mesh/mesh-5-faqs.md b/website/docs/best-practices/how-we-mesh/mesh-5-faqs.md index 1ae49928ae5..9f12f7d2c20 100644 --- a/website/docs/best-practices/how-we-mesh/mesh-5-faqs.md +++ b/website/docs/best-practices/how-we-mesh/mesh-5-faqs.md @@ -215,7 +215,7 @@ There’s model-level access within dbt, role-based access for users and groups First things first: access to underlying data is always defined and enforced by the underlying data platform (for example, BigQuery, Databricks, Redshift, Snowflake, Starburst, etc.) This access is managed by executing “DCL statements” (namely `grant`). dbt makes it easy to [configure `grants` on models](/reference/resource-configs/grants), which provision data access for other roles/users/groups in the data warehouse. However, dbt does _not_ automatically define or coordinate those grants unless they are configured explicitly. Refer to your organization's system for managing data warehouse permissions. -[dbt Cloud Enterprise plans](https://www.getdbt.com/pricing) support [role-based access control (RBAC)](/docs/cloud/manage-access/enterprise-permissions#how-to-set-up-rbac-groups-in-dbt-cloud) that manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt Cloud project. A user’s access to dbt Cloud projects also determines whether they can “explore” that project in detail. Roles, users, and groups are defined within the dbt Cloud application via the UI or by integrating with an identity provider. +[dbt Cloud Enterprise plans](https://www.getdbt.com/pricing) support [role-based access control (RBAC)](/docs/cloud/manage-access/about-user-access#role-based-access-control-) that manages granular permissions for users and user groups. You can control which users can see or edit all aspects of a dbt Cloud project. A user’s access to dbt Cloud projects also determines whether they can “explore” that project in detail. Roles, users, and groups are defined within the dbt Cloud application via the UI or by integrating with an identity provider. [Model access](/docs/collaborate/govern/model-access) defines where models can be referenced. It also informs the discoverability of those projects within dbt Explorer. Model `access` is defined in code, just like any other model configuration (`materialized`, `tags`, etc). diff --git a/website/docs/best-practices/how-we-structure/2-staging.md b/website/docs/best-practices/how-we-structure/2-staging.md index 8eb91ff5b7b..1f52a4a9a00 100644 --- a/website/docs/best-practices/how-we-structure/2-staging.md +++ b/website/docs/best-practices/how-we-structure/2-staging.md @@ -223,4 +223,4 @@ This is a welcome change for many of us who have become used to applying the sam :::info Development flow versus DAG order. This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing. -:::info +::: diff --git a/website/docs/best-practices/how-we-structure/4-marts.md b/website/docs/best-practices/how-we-structure/4-marts.md index 21de31a9e0d..995dea7e96f 100644 --- a/website/docs/best-practices/how-we-structure/4-marts.md +++ b/website/docs/best-practices/how-we-structure/4-marts.md @@ -26,7 +26,8 @@ models/marts ✅ **Group by department or area of concern.** If you have fewer than 10 or so marts you may not have much need for subfolders, so as with the intermediate layer, don’t over-optimize too early. If you do find yourself needing to insert more structure and grouping though, use useful business concepts here. In our marts layer, we’re no longer worried about source-conformed data, so grouping by departments (marketing, finance, etc.) is the most common structure at this stage. -✅ **Name by entity.** Use plain English to name the file based on the concept that forms the grain of the mart `customers`, `orders`. Note that for pure marts, there should not be a time dimension (`orders_per_day`) here, that is typically best captured via metrics. +✅ **Name by entity.** Use plain English to name the file based on the concept that forms the grain of the mart’s `customers`, `orders`. Marts that don't include any time-based rollups (pure marts) should not have a time dimension (`orders_per_day`) here, typically best captured via metrics. + ❌ **Build the same concept differently for different teams.** `finance_orders` and `marketing_orders` is typically considered an anti-pattern. There are, as always, exceptions — a common pattern we see is that, finance may have specific needs, for example reporting revenue to the government in a way that diverges from how the company as a whole measures revenue day-to-day. Just make sure that these are clearly designed and understandable as _separate_ concepts, not departmental views on the same concept: `tax_revenue` and `revenue` not `finance_revenue` and `marketing_revenue`. diff --git a/website/docs/best-practices/how-we-structure/5-the-rest-of-the-project.md b/website/docs/best-practices/how-we-structure/5-the-rest-of-the-project.md index c7522bf12eb..9358b507acc 100644 --- a/website/docs/best-practices/how-we-structure/5-the-rest-of-the-project.md +++ b/website/docs/best-practices/how-we-structure/5-the-rest-of-the-project.md @@ -102,12 +102,14 @@ We’ve focused heavily thus far on the primary area of action in our dbt projec ### Project splitting -One important, growing consideration in the analytics engineering ecosystem is how and when to split a codebase into multiple dbt projects. Our present stance on this for most projects, particularly for teams starting out, is straightforward: you should avoid it unless you have no other option or it saves you from an even more complex workaround. If you do have the need to split up your project, it’s completely possible through the use of private packages, but the added complexity and separation is, for most organizations, a hindrance, not a help, at present. That said, this is very likely subject to change! [We want to create a world where it’s easy to bring lots of dbt projects together into a cohesive lineage](https://github.com/dbt-labs/dbt-core/discussions/5244). In a world where it’s simple to break up monolithic dbt projects into multiple connected projects, perhaps inside of a modern mono repo, the calculus will be different, and the below situations we recommend against may become totally viable. So watch this space! +One important, growing consideration in the analytics engineering ecosystem is how and when to split a codebase into multiple dbt projects. Currently, our advice for most teams, especially those just starting, is fairly simple: in most cases, we recommend doing so with [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro)! dbt Mesh allows organizations to handle complexity by connecting several dbt projects rather than relying on one big, monolithic project. This approach is designed to speed up development while maintaining governance. -- ❌ **Business groups or departments.** Conceptual separations within the project are not a good reason to split up your project. Splitting up, for instance, marketing and finance modeling into separate projects will not only add unnecessary complexity but destroy the unifying effect of collaborating across your organization on cohesive definitions and business logic. -- ❌ **ML vs Reporting use cases.** Similarly to the point above, splitting a project up based on different use cases, particularly more standard BI versus ML features, is a common idea. We tend to discourage it for the time being. As with the previous point, a foundational goal of implementing dbt is to create a single source of truth in your organization. The features you’re providing to your data science teams should be coming from the same marts and metrics that serve reports on executive dashboards. +As breaking up monolithic dbt projects into smaller, connected projects, potentially within a modern mono repo becomes easier, the scenarios we currently advise against may soon become feasible. So watch this space! + +- ✅ **Business groups or departments.** Conceptual separations within the project are the primary reason to split up your project. This allows your business domains to own their own data products and still collaborate using dbt Mesh. For more information about dbt Mesh, please refer to our [dbt Mesh FAQs](/best-practices/how-we-mesh/mesh-5-faqs). - ✅ **Data governance.** Structural, organizational needs — such as data governance and security — are one of the few worthwhile reasons to split up a project. If, for instance, you work at a healthcare company with only a small team cleared to access raw data with PII in it, you may need to split out your staging models into their own projects to preserve those policies. In that case, you would import your staging project into the project that builds on those staging models as a [private package](https://docs.getdbt.com/docs/build/packages/#private-packages). - ✅ **Project size.** At a certain point, your project may grow to have simply too many models to present a viable development experience. If you have 1000s of models, it absolutely makes sense to find a way to split up your project. +- ❌ **ML vs Reporting use cases.** Similarly to the point above, splitting a project up based on different use cases, particularly more standard BI versus ML features, is a common idea. We tend to discourage it for the time being. As with the previous point, a foundational goal of implementing dbt is to create a single source of truth in your organization. The features you’re providing to your data science teams should be coming from the same marts and metrics that serve reports on executive dashboards. ## Final considerations diff --git a/website/docs/best-practices/how-we-style/2-how-we-style-our-sql.md b/website/docs/best-practices/how-we-style/2-how-we-style-our-sql.md index 8c61e63b888..35e025faf3f 100644 --- a/website/docs/best-practices/how-we-style/2-how-we-style-our-sql.md +++ b/website/docs/best-practices/how-we-style/2-how-we-style-our-sql.md @@ -8,8 +8,8 @@ id: 2-how-we-style-our-sql - ☁️ Use [SQLFluff](https://sqlfluff.com/) to maintain these style rules automatically. - Customize `.sqlfluff` configuration files to your needs. - Refer to our [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for the rules we use in our own projects. - - - Exclude files and directories by using a standard `.sqlfluffignore` file. Learn more about the syntax in the [.sqlfluffignore syntax docs](https://docs.sqlfluff.com/en/stable/configuration.html#id2). + - Exclude files and directories by using a standard `.sqlfluffignore` file. Learn more about the syntax in the [.sqlfluffignore syntax docs](https://docs.sqlfluff.com/en/stable/configuration/index.html). + - Excluding unnecessary folders and files (such as `target/`, `dbt_packages/`, and `macros/`) can speed up linting, improve run times, and help you avoid irrelevant logs. - 👻 Use Jinja comments (`{# #}`) for comments that should not be included in the compiled SQL. - ⏭️ Use trailing commas. - 4️⃣ Indents should be four spaces. diff --git a/website/docs/best-practices/how-we-style/5-how-we-style-our-yaml.md b/website/docs/best-practices/how-we-style/5-how-we-style-our-yaml.md index 8f817356334..e3b539e8b12 100644 --- a/website/docs/best-practices/how-we-style/5-how-we-style-our-yaml.md +++ b/website/docs/best-practices/how-we-style/5-how-we-style-our-yaml.md @@ -7,6 +7,7 @@ id: 5-how-we-style-our-yaml - 2️⃣ Indents should be two spaces - ➡️ List items should be indented +- 🔠 List items with a single entry can be a string. For example, `'select': 'other_user'`, but it's best practice to provide the argument as an explicit list. For example, `'select': ['other_user']` - 🆕 Use a new line to separate list items that are dictionaries where appropriate - 📏 Lines of YAML should be no longer than 80 characters. - 🛠️ Use the [dbt JSON schema](https://github.com/dbt-labs/dbt-jsonschema) with any compatible IDE and a YAML formatter (we recommend [Prettier](https://prettier.io/)) to validate your YAML files and format them automatically. diff --git a/website/docs/community/resources/oss-expectations.md b/website/docs/community/resources/oss-expectations.md index e6e5d959c96..7b518424e92 100644 --- a/website/docs/community/resources/oss-expectations.md +++ b/website/docs/community/resources/oss-expectations.md @@ -2,112 +2,122 @@ title: "Expectations for OSS contributors" --- -Whether it's a dbt package, a plugin, `dbt-core`, or this very documentation site, contributing to the open source code that supports the dbt ecosystem is a great way to level yourself up as a developer, and to give back to the community. The goal of this page is to help you understand what to expect when contributing to dbt open source software (OSS). While we can only speak for our own experience as open source maintainers, many of these guidelines apply when contributing to other open source projects, too. +Whether it's `dbt-core`, adapters, packages, or this very documentation site, contributing to the open source code that supports the dbt ecosystem is a great way to share your knowledge, level yourself up as a developer, and to give back to the community. The goal of this page is to help you understand what to expect when contributing to dbt open source software (OSS). -Have you seen things in other OSS projects that you quite like, and think we could learn from? [Open a discussion on the dbt Community Forum](https://discourse.getdbt.com), or start a conversation in the dbt Community Slack (for example: `#community-strategy`, `#dbt-core-development`, `#package-ecosystem`, `#adapter-ecosystem`). We always appreciate hearing from you! +Have you seen things in other OSS projects that you quite like, and think we could learn from? [Open a discussion on the dbt Community Forum](https://discourse.getdbt.com), or start a conversation in the [dbt Community Slack](https://www.getdbt.com/community/join-the-community) (for example: `#community-strategy`, `#dbt-core-development`, `#package-ecosystem`, `#adapter-ecosystem`). We always appreciate hearing from you! ## Principles ### Open source is participatory -Why take time out of your day to write code you don’t _have_ to? We all build dbt together. By using dbt, you’re invested in the future of the tool, and an agent in pushing forward the practice of analytics engineering. You’ve already benefited from using code contributed by community members, and documentation written by community members. Contributing to dbt OSS is your way to pay it forward, as an active participant in the thing we’re all creating together. +We all build dbt together -- whether you write code or contribute your ideas. By using dbt, you're invested in the future of the tool, and have an active role in pushing forward the standard of analytics engineering. You already benefit from using code and documentation contributed by community members. Contributing to the dbt community is your way to be an active participant in the thing we're all creating together. -There’s a very practical reason, too: OSS prioritizes our collective knowledge and experience over any one person’s. We don’t have experience using every database, operating system, security environment, ... We rely on the community of OSS users to hone our product capabilities and documentation to the wide variety of contexts in which it operates. In this way, dbt gets to be the handiwork of thousands, rather than a few dozen. +There's a very practical reason, too: OSS prioritizes our collective knowledge and experience over any one person's. We don't have experience using every database, operating system, security environment, ... We rely on the community of OSS users to hone our product capabilities and documentation to the wide variety of contexts in which it operates. In this way, dbt gets to be the handiwork of thousands, rather than a few dozen. -### We take seriously our role as maintainers +### We take seriously our role as maintainers of a standard -In that capacity, we cannot and will not fix every bug ourselves, or code up every feature worth doing. Instead, we’ll do our best to respond to new issues with context (including links to related issues), feedback, alternatives/workarounds, and (whenever possible) pointers to code that would aid a community contributor. If a change is so tricky or involved that the initiative rests solely with us, we’ll do our best to explain the complexity, and when / why we could foresee prioritizing it. Our role also includes maintenance of the backlog of issues, such as closing duplicates, proposals we don’t intend to support, or stale issues (no activity for 180 days). +As a standard, dbt must be reliable and consistent. Our first priority is ensuring the continued high quality of existing dbt capabilities before we introduce net-new capabilities. -### Initiative is everything +We also believe dbt as a framework should be extensible enough to ["make the easy things easy, and the hard things possible"](https://en.wikipedia.org/wiki/Perl#Philosophy). To that end, we _don't_ believe it's appropriate for dbt to have an out-of-the-box solution for every niche problem. Users have the flexibility to achieve many custom behaviors by defining their own macros, materializations, hooks, and more. We view it as our responsibility as maintainers to decide when something should be "possible" — via macros, packages, etc. — and when something should be "easy" — built into the dbt Core standard. -Given that we, as maintainers, will not be able to resolve every bug or flesh out every feature request, we empower you, as a community member, to initiate a change. +So when will we say "yes" to new capabilities for dbt Core? The signals we look for include: +- Upvotes on issues in our GitHub repos +- Open source dbt packages trying to close a gap +- Technical advancements in the ecosystem -- If you open the bug report, it’s more likely to be identified. -- If you open the feature request, it’s more likely to be discussed. -- If you comment on the issue, engaging with ideas and relating it to your own experience, it’s more likely to be prioritized. -- If you open a PR to fix an identified bug, it’s more likely to be fixed. -- If you contribute the code for a well-understood feature, that feature is more likely to be in the next version. -- If you review an existing PR, to confirm it solves a concrete problem for you, it’s more likely to be merged. +In the meantime — we'll do our best to respond to new issues with: +- Clarity about whether the proposed feature falls into the intended scope of dbt Core +- Context (including links to related issues) +- Alternatives and workarounds +- When possible, pointers to code that would aid a community contributor -Sometimes, this can feel like shouting into the void, especially if you aren’t met with an immediate response. We promise that there are dozens (if not hundreds) of folks who will read your comment, maintainers included. It all adds up to a real difference. +### Initiative is everything -# Practicalities +Given that we, as maintainers, will not be able to resolve every bug or flesh out every feature request, we empower you, as a community member, to initiate a change. -As dbt OSS is growing in popularity, and dbt Labs has been growing in size, we’re working to involve new people in the responsibilities of OSS maintenance. We really appreciate your patience as our newest maintainers are learning and developing habits. +- If you open the bug report, it's more likely to be identified. +- If you open the feature request, it's more likely to be discussed. +- If you comment on the issue, engaging with ideas and relating it to your own experience, it's more likely to be prioritized. +- If you open a PR to fix an identified bug, it's more likely to be fixed. +- If you comment on an existing PR, to confirm it solves the concrete problem for your team in practice, it's more likely to be merged. -## Discussions +Sometimes, this can feel like shouting into the void, especially if you aren't met with an immediate response. We promise that there are dozens (if not hundreds) of folks who will read your comment, including us as maintainers. It all adds up to a real difference. -Discussions are a relatively new GitHub feature, and we really like them! +## Practicalities -A discussion is best suited to propose a Big Idea, such as brand-new capability in dbt Core, or a new section of the product docs. Anyone can open a discussion, add a comment to an existing one, or reply in a thread. +### Discussions -What can you expect from a new Discussion? Hopefully, comments from other members of the community, who like your idea or have their own ideas for how it could be improved. The most helpful comments are ones that describe the kinds of experiences users and readers should have. Unlike an **issue**, there is no specific code change that would “resolve” a Discussion. +A discussion is best suited to propose a Big Idea, such as brand-new capability in dbt Core or an adapter. Anyone can open a discussion, comment on an existing one, or reply in a thread. -If, over the course of a discussion, we do manage to reach consensus on a way forward, we’ll open a new issue that references the discussion for context. That issue will connect desired outcomes to specific implementation details, as well as perceived limitations and open questions. It will serve as a formal proposal and request for comment. +When you open a new discussion, you might be looking for validation from other members of the community — folks who identify with your problem statement, who like your proposed idea, and who may have their own ideas for how it could be improved. The most helpful comments propose nuances or desirable user experiences to be considered in design and refinement. Unlike an **issue**, there is no specific code change that would “resolve” a discussion. -## Issues +If, over the course of a discussion, we reach a consensus on specific elements of a proposed design, we can open new implementation issues that reference the discussion for context. Those issues will connect desired user outcomes to specific implementation details, acceptance testing, and remaining questions that need answering. -An issue could be a bug you’ve identified while using the product or reading the documentation. It could also be a specific idea you’ve had for how it could be better. +### Issues -### Best practices for issues +An issue could be a bug you've identified while using the product or reading the documentation. It could also be a specific idea you've had for a narrow extension of existing functionality. + +#### Best practices for issues - Issues are **not** for support / troubleshooting / debugging help. Please see [dbt support](/docs/dbt-support) for more details and suggestions on how to get help. - Always search existing issues first, to see if someone else had the same idea / found the same bug you did. -- Many repositories offer templates for creating issues, such as when reporting a bug or requesting a new feature. If available, please select the relevant template and fill it out to the best of your ability. This will help other people understand your issue and respond. +- Many dbt repositories offer templates for creating issues, such as reporting a bug or requesting a new feature. If available, please select the relevant template and fill it out to the best of your ability. This information helps us (and others) understand your issue. -### You’ve found an existing issue that interests you. What should you do? +##### You've found an existing issue that interests you. What should you do? -Comment on it! Explain that you’ve run into the same bug, or had a similar idea for a new feature. If the issue includes a detailed proposal for a change, say which parts of the proposal you find most compelling, and which parts give you pause. +Comment on it! Explain that you've run into the same bug, or had a similar idea for a new feature. If the issue includes a detailed proposal for a change, say which parts of the proposal you find most compelling, and which parts give you pause. -### You’ve opened a new issue. What can you expect to happen? +##### You've opened a new issue. What can you expect to happen? -In our most critical repositories (such as `dbt-core`), **our goal is to respond to new issues within 2 standard work days.** While this initial response might be quite lengthy (context, feedback, and pointers that we can offer as maintainers), more often it will be a short acknowledgement that the maintainers are aware of it and don't believe it's in urgent need of resolution. Depending on the nature of your issue, it might be well suited to an external contribution, from you or another community member. +In our most critical repositories (such as `dbt-core`), our goal is to respond to new issues as soon as possible. This initial response will often be a short acknowledgement that the maintainers are aware of the issue, signalling our perception of its urgency. Depending on the nature of your issue, it might be well suited to an external contribution, from you or another community member. -**What does “triage” mean?** In some repositories, we use a `triage` label to keep track of issues that need an initial response from a maintainer. +**What if you're opening an issue in a different repository?** We have engineering teams dedicated to active maintenance of [`dbt-core`](https://github.com/dbt-labs/dbt-core) and its component libraries ([`dbt-common`](https://github.com/dbt-labs/dbt-common) + [`dbt-adapters`](https://github.com/dbt-labs/dbt-adapters)), as well as several platform-specific adapters ([`dbt-snowflake`](https://github.com/dbt-labs/dbt-snowflake), [`dbt-bigquery`](https://github.com/dbt-labs/dbt-bigquery), [`dbt-redshift`](https://github.com/dbt-labs/dbt-redshift), [`dbt-postgres`](https://github.com/dbt-labs/dbt-postgres)). We've open-sourced a number of other software projects over the years, and the majority of them do not have the same activity or maintenance guarantees. Check to see if other recent issues have responses, or when the last commit was added to the `main` branch. -**What if I’m opening an issue in a different repository?** **What if I’m opening an issue in a different repository?** We have engineering teams dedicated to active maintainence of [`dbt-core`](https://github.com/dbt-labs/dbt-core) and its component libraries ([`dbt-common`](https://github.com/dbt-labs/dbt-common) + [`dbt-adapters`](https://github.com/dbt-labs/dbt-adapters)), as well as several platform-specific adapters ([`dbt-snowflake`](https://github.com/dbt-labs/dbt-snowflake), [`dbt-bigquery`](https://github.com/dbt-labs/dbt-bigquery), [`dbt-redshift`](https://github.com/dbt-labs/dbt-redshift), [`dbt-postgres`](https://github.com/dbt-labs/dbt-postgres)). We’ve open sourced a number of other software projects over the years, and the majority of them do not have the same activity or maintenance guarantees. Check to see if other recent issues have responses, or when the last commit was added to the `main` branch. +**You're not sure about the status of your issue.** If your issue is in an actively maintained repo and has a `triage` label attached, we're aware it's something that needs a response. If the issue has been triaged, but not prioritized, this could mean: +- The intended scope or user experience of a proposed feature requires further refinement from a maintainer +- We believe the required code change is too tricky for an external contributor -**If my issue is lingering...** Sorry for the delay! If your issue is in an actively maintained repo and has a `triage` label attached, we’re aware it's something that needs a response. +We'll do our best to explain the open questions or complexity, and when / why we could foresee prioritizing it. -**Automation that can help us:** In many repositories, we use a bot that marks issues as stale if they haven’t had any activity for 180 days. This helps us keep our backlog organized and up-to-date. We encourage you to comment on older open issues that you’re interested in, to keep them from being marked stale. You’re also always welcome to comment on closed issues to say that you’re still interested in the proposal. +**Automation that can help us:** In many repositories, we use a bot that marks issues as stale if they haven't had any activity for 180 days. This helps us keep our backlog organized and up-to-date. We encourage you to comment on older open issues that you're interested in, to keep them from being marked stale. You're also always welcome to comment on closed issues to say that you're still interested in the proposal. -### Issue labels +#### Issue labels In all likelihood, the maintainer who responds will also add a number of labels. Not all of these labels are used in every repository. -In some cases, the right resolution to an open issue might be tangential to the codebase. The right path forward might be in another codebase (we'll transfer it), a documentation update, or a change that can be made in user-space code. In other cases, the issue might describe functionality that the maintainers are unwilling or unable to incorporate into the main codebase. In these cases, a maintainer will close the issue (perhaps using a `wontfix` label) and explain why. +In some cases, the right resolution to an open issue might be tangential to the codebase. The right path forward might be in another codebase (we'll transfer it), a documentation update, or a change that you can make yourself in user-space code. In other cases, the issue might describe functionality that the maintainers are unwilling or unable to incorporate into the main codebase. In these cases, a maintainer will close the issue (perhaps using a `wontfix` label) and explain why. + +Some of the most common labels are explained below: | tag | description | | ------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `triage` | This is a new issue which has not yet been reviewed by a maintainer. This label is removed when a maintainer reviews and responds to the issue. | -| `bug` | This issue represents a defect or regression from the behavior that's documented, or that you reasonably expect | -| `enhancement` | This issue represents net-new functionality, including an extension of an existing capability | -| `good_first_issue` | This issue does not require deep knowledge of the codebase to implement. This issue is appropriate for a first-time contributor. | +| `bug` | This issue represents a defect or regression from the behavior that's documented | +| `enhancement` | This issue represents a narrow extension of an existing capability | +| `good_first_issue` | This issue does not require deep knowledge of the codebase to implement, and it is appropriate for a first-time contributor. | | `help_wanted` | This issue is trickier than a "good first issue." The required changes are scattered across the codebase, or more difficult to test. The maintainers are happy to help an experienced community contributor; they aren't planning to prioritize this issue themselves. | | `duplicate` | This issue is functionally identical to another open issue. The maintainers will close this issue and encourage community members to focus conversation on the other one. | | `stale` | This is an old issue which has not recently been updated. In repositories with a lot of activity, stale issues will periodically be closed. | | `wontfix` | This issue does not require a code change in the repository, or the maintainers are unwilling to merge a change which implements the proposed behavior. | -## Pull requests - -PRs are your surest way to make the change you want to see in dbt / packages / docs, especially when the change is straightforward. +### Pull requests -**Every PR should be associated with an issue.** Why? Before you spend a lot of time working on a contribution, we want to make sure that your proposal will be accepted. You should open an issue first, describing your desired outcome and outlining your planned change. If you've found an older issue that's already open, comment on it with an outline for your planned implementation. Exception to this rule: If you're just opening a PR for a cosmetic fix, such as a typo in documentation, an issue isn't needed. +**Every PR should be associated with an issue.** Why? Before you spend a lot of time working on a contribution, we want to make sure that your proposal will be accepted. You should open an issue first, describing your desired outcome and outlining your planned change. If you've found an older issue that's already open, comment on it with an outline for your planned implementation _before_ putting in the work to open a pull request. -**PRs must include robust testing.** Comprehensive testing within pull requests is crucial for the stability of our project. By prioritizing robust testing, we ensure the reliability of our codebase, minimize unforeseen issues, and safeguard against potential regressions. We cannot merge changes that risk the backward incompatibility of existing documented behaviors. We understand that creating thorough tests often requires significant effort, and your dedication to this process greatly contributes to the project's overall reliability. Thank you for your commitment to maintaining the integrity of our codebase and the experience of everyone using dbt! +**PRs must include robust testing.** Comprehensive testing within pull requests is crucial for the stability of dbt. By prioritizing robust testing, we ensure the reliability of our codebase, minimize unforeseen issues, and safeguard against potential regressions. **We cannot merge changes that risk the backward incompatibility of existing documented behaviors.** We understand that creating thorough tests often requires significant effort, and your dedication to this process greatly contributes to the project's overall reliability. Thank you for your commitment to maintaining the integrity of our codebase and the experience of everyone using dbt! -**PRs go through two review steps.** First, we aim to respond with feedback on whether we think the implementation is appropriate from a product & usability standpoint. At this point, we will close PRs that we believe fall outside the scope of dbt Core, or which might lead to an inconsistent user experience. This is an important part of our role as maintainers; we're always open to hearing disagreement. If a PR passes this first review, we will queue it up for code review, at which point we aim to test it ourselves and provide thorough feedback within the next month. +**PRs go through two review steps.** First, we aim to respond with feedback on whether we think the implementation is appropriate from a product & usability standpoint. At this point, we will close PRs that we believe fall outside the scope of dbt Core, or which might lead to an inconsistent user experience. This is an important part of our role as maintainers; we're always open to hearing disagreement. If a PR passes this first review, we will queue it up for code review, at which point we aim to test it ourselves and provide thorough feedback. -**We receive more PRs than we can thoroughly review, test, and merge.** Our teams have finite capacity, and our top priority is maintaining a well-scoped, high-quality framework for the tens of thousands of people who use it every week. To that end, we must prioritize overall stability and planned improvements over a long tail of niche potential features. For best results, say what in particular you’d like feedback on, and explain what would it mean to you, your team, and other community members to have the proposed change merged. Smaller PRs tackling well-scoped issues tend to be easier and faster for review. Two recent examples of community-contributed PRs: +**We receive more PRs than we can thoroughly review, test, and merge.** Our teams have finite capacity, and our top priority is maintaining a well-scoped, high-quality framework for the tens of thousands of people who use it every week. To that end, we must prioritize overall stability and planned improvements over a long tail of niche potential features. For best results, say what in particular you'd like feedback on, and explain what would it mean to you, your team, and other community members to have the proposed change merged. Smaller PRs tackling well-scoped issues tend to be easier and faster for review. Two examples of community-contributed PRs: - [(dbt-core#9347) Fix configuration of turning test warnings into failures](https://github.com/dbt-labs/dbt-core/pull/9347) - [(dbt-core#9863) Better error message when trying to select a disabled model](https://github.com/dbt-labs/dbt-core/pull/9863) -**Automation that can help us:** Many repositories have a template for pull request descriptions, which will include a checklist that must be completed before the PR can be merged. You don’t have to do all of these things to get an initial PR, but they definitely help. Those many include things like: +**Automation that can help us:** Many repositories have a template for pull request descriptions, which will include a checklist that must be completed before the PR can be merged. You don't have to do all of these things to get an initial PR, but they will delay our review process. Those include: -- **Tests!** When you open a PR, some tests and code checks will run. (For security reasons, some may need to be approved by a maintainer.) We will not merge any PRs with failing tests. If you’re not sure why a test is failing, please say so, and we’ll do our best to get to the bottom of it together. +- **Tests, tests, tests.** When you open a PR, some tests and code checks will run. (For security reasons, some may need to be approved by a maintainer.) We will not merge any PRs with failing tests. If you're not sure why a test is failing, please say so, and we'll do our best to get to the bottom of it together. - **Contributor License Agreement** (CLA): This ensures that we can merge your code, without worrying about unexpected implications for the copyright or license of open source dbt software. For more details, read: ["Contributor License Agreements"](../resources/contributor-license-agreements.md) - **Changelog:** In projects that include a number of changes in each release, we need a reliable way to signal what's been included. The mechanism for this will vary by repository, so keep an eye out for notes about how to update the changelog. -### Inclusion in release versions +#### Inclusion in release versions -Both bug fixes and backwards-compatible new features will be included in the [next minor release](/docs/dbt-versions/core#how-dbt-core-uses-semantic-versioning). Fixes for regressions and net-new bugs that were present in the minor version's original release will be backported to versions with [active support](/docs/dbt-versions/core). Other bug fixes may be backported when we have high confidence that they're narrowly scoped and won't cause unintended side effects. +Both bug fixes and backwards-compatible new features will be included in the [next minor release of dbt Core](/docs/dbt-versions/core#how-dbt-core-uses-semantic-versioning). Fixes for regressions and net-new bugs that were present in the minor version's original release will be backported to versions with [active support](/docs/dbt-versions/core). Other bug fixes may be backported when we have high confidence that they're narrowly scoped and won't cause unintended side effects. diff --git a/website/docs/community/spotlight/bruno-de-lima.md b/website/docs/community/spotlight/bruno-de-lima.md index f5ffaa6a970..3c373db06e8 100644 --- a/website/docs/community/spotlight/bruno-de-lima.md +++ b/website/docs/community/spotlight/bruno-de-lima.md @@ -2,42 +2,39 @@ id: bruno-de-lima title: Bruno de Lima description: | - Hi all! I'm a Data Engineer, deeply fascinated by the awesomeness dbt. I love talking about dbt, creating content from daily tips to blogposts and engaging with this vibrant community! - - Started my career at the beginning of 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a global trajectory as I joined phData as a Data Engineer, expanding my experiences and forging connections beyond Brazil. While dbt is at the heart of my expertise, I've also delved into data warehouses such as Snowflake, Databricks, and BigQuery; visualization tools like Power BI and Tableau; and several minor modern data stack tools. - - I actively participate in the dbt community, having attended two dbt Meetups in Brazil organized by Indicium; writing about dbt-related topics in my Medium and LinkedIn profiles; contributing to the code; and frequently checking dbt Slack and Discourse, helping (and being helped by) other dbt practitioners. If you are a community member, you may have seen me around! -image: /img/community/spotlight/bruno-de-lima.jpg + Hey all! I was born and raised in Florianopolis, Brazil, and I'm a Senior Data Engineer at phData. I live with my fiancĂŠe and I enjoy music, photography, and powerlifting. + + I started my career in early 2022 at Indicium as an Analytics Engineer, working with dbt from day 1. By 2023, my path took a global trajectory as I joined phData as a Data Engineer, expanding my experiences and creating connections beyond Brazil. While dbt is my main expertise, because of my work in consultancy I have experience with a large range of tools, specially the ones related to Snowflake, Databricks, AWS and GCP; but I have already tried several other modern data stack tools too. + + I actively participate in the dbt community, having organized dbt Meetups in Brazil (in Floripa and SĂŁo Paulo); writing about dbt-related topics in my Medium and LinkedIn profiles; contributing to the dbt Core code and to the docs; and frequently checking dbt Slack and Discourse, helping (and being helped by) other dbt practitioners. If you are a community member, you may have seen me around! +image: /img/community/spotlight/bruno-souza-de-lima-newimage.jpg pronouns: he/him location: FlorianĂłpolis, Brazil -jobTitle: Data Engineer +jobTitle: Senior Data Engineer companyName: phData -organization: "" socialLinks: - name: LinkedIn link: https://www.linkedin.com/in/brunoszdl/ - name: Medium link: https://medium.com/@bruno.szdl -dateCreated: 2023-11-05 +dateCreated: 2024-11-03 hide_table_of_contents: true communityAward: true -communityAwardYear: 2023 +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? -I was not truly happy with my academic life. My career took a new turn when I enrolled in the Analytics Engineer course by Indicium. That was my first contact with dbt, and I didn't realize how much it would transform my career. After that, I was hired at the company as an Analytics Engineer and worked extensively with dbt from day one. +I was not truly happy with my academic life. My career took a new turn when I enrolled in the Analytics Engineer course by Indicium. That was my first contact with dbt, and I didn't realize how much it would transform my career. After that, I was hired at the company as an Analytics Engineer and worked extensively with dbt from day one. It took me some time to become an active member of the dbt community. I started working with dbt at the beginning of 2022 and became more involved towards the end of that year, encouraged by Daniel Avancini. I regret not doing this earlier, because being an active community member has been a game-changer for me, as my knowledge of dbt has grown exponentially just by participating in daily discussions on Slack. I have found #advice-dbt-help and #advice-dbt-for-power-users channels particularly useful, as well as the various database-specific channels. Additionally, the #i-made-this and #i-read-this channels have allowed me to learn about the innovative things that community members are doing. Inspired by other members, especially Josh Devlin and Owen Prough, I began answering questions on Slack and Discourse. For questions I couldn't answer, I would try engaging in discussions about possible solutions or provide useful links. I also started posting dbt tips on LinkedIn to help practitioners learn about new features or to refresh their memories about existing ones. -By being more involved in the community, I felt more connected and supported. I received help from other members, and now, I could help others, too. I was happy with this arrangement, but more unexpected surprises came my way. My active participation in Slack, Discourse, and LinkedIn opened doors to new connections and career opportunities. I had the pleasure of meeting a lot of incredible people and receiving exciting job offers, including the one for working at phData. +By being more involved in the community, I felt more connected and supported. I received help from other members, and now, I could help others, too. I was happy with this arrangement, but more unexpected surprises came my way. My active participation in Slack, Discourse, and LinkedIn opened doors to new connections and career opportunities. I had the pleasure of meeting a lot of incredible people and receiving exciting job offers, including the ones for working at phData and teaching at Zach Wilson's data engineering bootcamp. Thanks to the dbt community, I went from feeling uncertain about my career prospects to having a solid career and being surrounded by incredible people. -I would like to thank the Indicium folks for opening the first door for me for this career in data, and not just for me but for lots of people in Brazil trying to migrate from different fields who would not have this opportunity otherwise. - ## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? I identify with Gwen Windflower and Joel Labes, or at least they are the kind of leader I admire. Their strong presence and continuous interaction with all types of dbt enthusiasts make everyone feel welcomed in the community. They uplift those who contribute to the community, whether it's through a LinkedIn post or answering a question, and provide constructive feedback to help them improve. And of course they show a very strong knowledge about dbt and data in general, which is reflected in their contributions. diff --git a/website/docs/community/spotlight/christophe-oudar.md b/website/docs/community/spotlight/christophe-oudar.md new file mode 100644 index 00000000000..2381d88a381 --- /dev/null +++ b/website/docs/community/spotlight/christophe-oudar.md @@ -0,0 +1,35 @@ +--- +id: christophe-oudar +title: Christophe Oudar +description: | + I joined the dbt Community in November 2021 after exchanging some issues in Github. I currently work as a staff engineer at a scaleup in the ad tech industry called Teads, which I joined 11 years ago as a new grad. I've been using dbt Core on BigQuery since then. I write about data engineering both on Medium and Substack. I contribute on dbt-bigquery. I wrote an article that was then featured on the Developer Blog called BigQuery ingestion-time partitioning and partition copy with dbt. +image: /img/community/spotlight/christophe-oudar.jpg +pronouns: he/him +location: Montpellier, France +jobTitle: Staff Engineer +companyName: Teads +socialLinks: + - name: X + link: https://x.com/Kayrnt + - name: LinkedIn + link: https://www.linkedin.com/in/christopheoudar/ + - name: Substack + link: https://smallbigdata.substack.com/ +dateCreated: 2024-11-08 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I joined the community in November 2021 as a way to explore how to move our in-house data modeling layer to dbt. The transition took over a year while we ensured we could cover all our bases and add missing features to dbt-bigquery. That project was one of stepping stones that helped me to move from senior to staff level at my current job. + +## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I identify with leaders that have strong convictions about how data engineering should move forward but remain open to innovation and ideas from everyone to bring the best to the field and make it as inclusive as possible to all cultures and profiles. I think that could mean people like Jordan Tigani or Mark Raasveldt. In the dbt community, my leadership has looked like helping people struggling and offering better ways to simplify one's day to day work when possible. + +## What have you learned from community members? What do you hope others can learn from you? + +I read a lot of articles about dbt, especially when I got started with it. It helped me a lot to build a proper Slim CI that could fit my company's ways of working. I also got to see how data pipelines were done in other companies and the pros and cons of my approaches. I hope I can share more of that knowledge for people to pick what's best for their needs. +​ diff --git a/website/docs/community/spotlight/fabiyi-opeyemi.md b/website/docs/community/spotlight/fabiyi-opeyemi.md index 18a311fa437..b5b4bf8c9e0 100644 --- a/website/docs/community/spotlight/fabiyi-opeyemi.md +++ b/website/docs/community/spotlight/fabiyi-opeyemi.md @@ -2,13 +2,11 @@ id: fabiyi-opeyemi title: Opeyemi Fabiyi description: | - I'm an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. I've also got a background in financial services and supply chain. I'm passionate about helping organizations to become data-driven and I majorly use dbt for data modeling, while the other aspect of the stack is largely dependent on the client infrastructure I'm working for, so I often say I'm tool-agnostic. 😀 - - I'm the founder of Nigeria's Young Data Professional Community. I'm also the organizer of the Lagos dbt Meetup which I started, and one of the organizers of the DataFest Africa Conference. I became an active member of the dbt Community in 2021 & spoke at Coalesce 2022. + I’m an Analytics Engineer with Data Culture, a Data Consulting firm where I use dbt regularly to help clients build quality-tested data assets. Before Data Culture, I worked at Cowrywise, one of the leading Fintech companies in Nigeria, where I was a solo data team member, and that was my first introduction to dbt and Analytics Engineering. Before that, I was doing Data Science and Analytics at Deloitte Nigeria. It’s been an exciting journey since I started using dbt and joining the community.Outside of work, I’m very passionate about Community building and Data Advocacy. I founded one of Nigeria’s most vibrant Data communities, “The Young Data Professional Community.” I’m also the Founder of the Lagos dbt Meetup and one of the organizers of the Largest Data Conference in Africa, DataFest Africa Conference. I became an active member of the dbt community in 2021 & spoke at Coalesce 2022. So when I’m not actively working I’m involved in one community activity or the other. image: /img/community/spotlight/fabiyi-opeyemi.jpg pronouns: he/him location: Lagos, Nigeria -jobTitle: Senior Analytics Engineer +jobTitle: Analytics Manager companyName: Data Culture organization: Young Data Professionals (YDP) socialLinks: @@ -16,10 +14,10 @@ socialLinks: link: https://twitter.com/Opiano_1 - name: LinkedIn link: https://www.linkedin.com/in/opeyemifabiyi/ -dateCreated: 2023-11-06 +dateCreated: 2024-11-02 hide_table_of_contents: true communityAward: true -communityAwardYear: 2023 +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? @@ -40,4 +38,4 @@ I've learned how to show empathy as a data professional and be a great engineer ## Anything else interesting you want to tell us? -Maybe, I will consider DevRel as a career sometime because of my innate passion and love for community and people. Several folks tell me I'm a strong DevRel talent and a valuable asset for any product-led company. If you need someone to bounce ideas off of or discuss😃 your community engagement efforts, please feel free to reach out. +Maybe I will consider DevRel as a career sometime because of my innate passion and love for community and people. Several folks tell me I’m a strong DevRel talent and a valuable asset for any product-led company. If you need someone to bounce ideas off of or discuss your community engagement efforts, please feel free to reach out. On a side note, it was really exciting for me to attend Coalesce 2024 in Vegas in person, which allowed me not only to learn but, most importantly, to meet amazing persons I’ve only interacted with online, like Bruno, Kuberjain, Dakota and many more; shout-out to Zenlytic and Lightdash for making that possible and, most importantly, a huge shout-out to the dbt Lab community team: Amada, Natasha and everyone on the community team for their constant supports to helping out with making the dbt Lagos (Nigeria) meetup a success. diff --git a/website/docs/community/spotlight/jenna-jordan.md b/website/docs/community/spotlight/jenna-jordan.md new file mode 100644 index 00000000000..86f19f125f8 --- /dev/null +++ b/website/docs/community/spotlight/jenna-jordan.md @@ -0,0 +1,36 @@ +--- +id: jenna-jordan +title: Jenna Jordan +description: | + I am a Senior Data Management Consultant with Analytics8, where I advise clients on dbt best practices (especially regarding dbt Mesh and the various shifts in governance and strategy that come with it). My experiences working within a dbt Mesh architecture and all of the difficulties organizations could run into with such a major paradigm shift inspired my peer exchange (role-playing/simulation game) at Coalesce 2024: "Governance co-lab: We the people, in order to govern data, do establish processes." I also experimented with bringing role-playing scenarios to data problems at the September 2024 Chicago dbt Meetup, hosted by Analytics8. I occasionally write long blog posts on my website, if you're up for the read. +image: /img/community/spotlight/jenna-jordan.jpg +pronouns: she/her +location: Asheville, USA +jobTitle: Senior Data Management Consultant +companyName: Analytics8 +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/jennajordan1/ + - name: Personal website + link: https://jennajordan.me/ +dateCreated: 2024-11-01 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +My dbt learning journey kicked off with the CoRise (now Uplimit) course Analytics Engineering with dbt, with Emily Hawkins and Jake Hannan, in February 2022 – less than a month after starting as a data engineer with the City of Boston Analytics Team. About a year later, I spearheaded the adoption of dbt at the City and got to build the project and associated architecture from scratch – which is probably the best learning experience you could ask for! I saw the value dbt could bring to improving data management processes at the City, and I knew there were other cities and local governments that could benefit from dbt as well, which motivated me to find my fellow co-speakers Ian Rose and Laurie Merrell to give a talk at Coalesce 2023 called "From Coast to Coast: Implementing dbt in the public sector." As a part of our goal to identify and cultivate a community of dbt practitioners in the public (and adjacent) sectors, we also started the dbt Community Slack channel #industry-public-sector. That experience allowed me to continue to grow my career and find my current role - as well as connect with so many amazing data folks! + +## What dbt community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +There are many leaders in the dbt community that I admire and identify with – I won’t list them all out because I will invariably miss someone (but… you probably know who you are). Technical prowess is always enviable, but I most admire those who bring the human element to data work: those who aren’t afraid to be their authentic selves, cultivate a practice of empathy and compassion, and are driven by curiosity and a desire to help others. I’ve never set out to be a leader, and I still don’t really consider myself to be a leader – I’m much more comfortable in the role of a librarian. I just want to help people by connecting them to the information and resources that they may need. + +## What have you learned from community members? What do you hope others can learn from you? + +Pretty much everything I’ve learned about dbt and working in a mature analytics ecosystem I’ve learned from dbt community members. The dbt Community Slack is full of useful information and advice, and has also helped me identify experts about certain topics that I can chat with to learn even more. When I find someone sharing useful information, I usually try to find and follow them on social media so I can see more of their content. If there is one piece of advice I want to share, it is this: don’t be afraid to engage. Ask for help when you need it, but also offer help freely. Engage with the community with the same respect and grace you would offer your friends and coworkers. + +## Anything else interesting you want to tell us? + +Library Science is so much more than the Dewey Decimal System (seriously, ask a librarian about Dewey for a juicy rant). RDF triples (for knowledge graphs) are queried using SPARQL (pronounced “sparkle”). An antelope can be a document. The correct way to write a date/time is ISO-8601. The oldest known table (of the spreadsheet variety) is from 5,000 years ago – record-keeping predates literature by a significant margin. Zip codes aren’t polygons – they don’t contain an area or have boundaries. Computers don’t always return 0.3 when asked to add 0.1 + 0.2. SQL was the sequel to SQUARE. Before computers, people programmed looms (weaving is binary). What? You asked!! On a more serious note – data teams: start hiring librarians. No, seriously. No degree could have prepared me better for what I do in the data field than my M.S. in Library & Information Science. I promise, you want the skillset & mindset that a librarian will bring to your team. diff --git a/website/docs/community/spotlight/meagan-palmer.md b/website/docs/community/spotlight/meagan-palmer.md index ff45a3d6b7d..fffc2a6e0d6 100644 --- a/website/docs/community/spotlight/meagan-palmer.md +++ b/website/docs/community/spotlight/meagan-palmer.md @@ -3,8 +3,11 @@ id: meagan-palmer title: Meagan Palmer description: | I first started using dbt in 2016 or 2017 (I can't remember exactly). Since then, I have moved into data and analytics consulting and have dipped in and out of the dbt Community. + Late last year, I started leading dbt Cloud training courses and spending more time in the dbt Slack. + In consulting, I get to use a range of stacks. I've used dbt with Redshift, Snowflake, and Databricks in production settings with a range of loaders & reporting tools, and I've been enjoying using DuckDB for some home experimentation. + To share some of the experiences, I regularly post to LinkedIn and have recently started Analytics Engineering Today, a twice monthly newsletter about dbt in practice. image: /img/community/spotlight/Meagan-Palmer.png pronouns: she/her @@ -14,9 +17,10 @@ companyName: Altis Consulting socialLinks: - name: LinkedIn link: https://www.linkedin.com/in/meaganpalmer/ -dateCreated: 2024-07-29 +dateCreated: 2024-11-04 hide_table_of_contents: true -communityAward: false +communityAward: true +communityAwardYear: 2024 --- ## When did you join the dbt community and in what way has it impacted your career? @@ -27,9 +31,9 @@ I was fortunate that Jon Bradley at Nearmap had the vision to engage the then Fi Being in Australia, I often see replies from Jeremy Yeo to people in the dbt Slack. His clarity of communication is impressive. -For growth, I'm hoping that others can benefit from the wide range of experience I have. My newsletter, Analytics Engineering Today on LinkedIn aims to upskill the dbt Community and shed some light on some useful features that might not be well known. +For growth, I'm hoping that others can benefit from the wide range of experience I have. My LinkedIn Newsletter, Analytics Engineering Today aims to upskill the dbt Community and shed some light on some useful features that might not be well known. -I'll be at Coalesce and am doing some webinars/events later in the year. Come say hi, I love talking dbt and analytics engineering with people. +I was at Coalesce Onlineand am doing some webinars/events later in the year. Come say hi, I love talking dbt and analytics engineering with people. ## What have you learned from community members? What do you hope others can learn from you? diff --git a/website/docs/community/spotlight/mike-stanley.md b/website/docs/community/spotlight/mike-stanley.md new file mode 100644 index 00000000000..853b0e2f704 --- /dev/null +++ b/website/docs/community/spotlight/mike-stanley.md @@ -0,0 +1,30 @@ +--- +id: mike-stanley +title: Mike Stanley +description: | + I've split my time between financial services and the video games industry. Back when I wrote code every day, I worked in marketing analytics and marketing technology. I've been in the dbt community for about two years. I haven't authored any extensions to dbt's adapters yet but I've given feedback on proposed changes! +image: /img/community/spotlight/mike-stanley.jpg +pronouns: he/him +location: London, United Kingdom +jobTitle: Manager, Data +companyName: Freetrade +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/mike-stanley-31616994/ +dateCreated: 2024-11-05 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I've led data teams for almost ten years now and it can be a challenge to stay current on new technology when you're spending a lot of time on leadership and management. I joined the dbt Community to learn how to get more from it, how to solve problems and use more advanced features, and to learn best practices. I find that answering questions is the way I learn best, so I started helping people! + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I hope that we can all continue to level up our dbt skills and leave the data environments that we work in better than we found them. + +## What have you learned from community members? What do you hope others can learn from you? + +Everything! People share so much about their best practices and when and how to deviate from them, interesting extensions to dbt that they've worked on, common bugs and problems, and how to think in a "dbtish" way. I couldn't have learned any of that without the community! diff --git a/website/docs/community/spotlight/original-dbt-athena-maintainers.md b/website/docs/community/spotlight/original-dbt-athena-maintainers.md new file mode 100644 index 00000000000..b3728a71d63 --- /dev/null +++ b/website/docs/community/spotlight/original-dbt-athena-maintainers.md @@ -0,0 +1,44 @@ +--- +id: original-dbt-athena-maintainers +title: The Original dbt-athena Maintainers +description: | + The original dbt-athena Maintainers is a group of 5 people—JĂŠrĂŠmy Guiselin, Mattia, Jesse Dobbelaere, Serhii Dimchenko, and Nicola Corda—who met via dbt Slack in the #db-athena channel, with the aim to make make dbt-athena a production-ready adapter. + + In the first periods, Winter 2022 and Spring 2023, we focused on contributing directly to the adapter, adding relevant features like Iceberg and Lake Formation support, and stabilizing some internal behaviour. + + On a second iteration our role was triaging, providing community support and bug fixing. We encouraged community members to make their first contributions, and helped them to merge their PRs. +image: /img/community/spotlight/dbt-athena-groupheadshot.jpg +location: Europe +jobTitle: A group of data-engineers +companyName: Mix of companies +organization: dbt-athena (since November 2022) +socialLinks: + - name: JĂŠrĂŠmy's LinkedIn + link: https://www.linkedin.com/in/jrmyy/ + - name: Mattia's LinkedIn + link: https://www.linkedin.com/in/mattia-sappa/ + - name: Jesse's LinkedIn + link: https://www.linkedin.com/in/dobbelaerejesse/ + - name: Serhii's LinkedIn + link: https://www.linkedin.com/in/serhii-dimchenko-075b3061/ + - name: Nicola's LinkedIn + link: https://www.linkedin.com/in/nicolacorda/ +dateCreated: 2024-11-06 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +The dbt community allowed the dbt-athena maintainers to meet each other, and share the common goal of making the dbt-athena adapter production-ready. + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +As we grow, we are looking to embody democratic leadership. + +## What have you learned from community members? What do you hope others can learn from you? + +We learned that the power of the community was endless. People started to share best practises, and some of the best practises were incorporated directly in dbt-athena, allowing people to run the adapter smoothly in their production environment. +We reached a point where people started to ask advice for their AWS architecture, which we found pretty awesome. + diff --git a/website/docs/community/spotlight/ruth-onyekwe.md b/website/docs/community/spotlight/ruth-onyekwe.md new file mode 100644 index 00000000000..cf07e98a4f7 --- /dev/null +++ b/website/docs/community/spotlight/ruth-onyekwe.md @@ -0,0 +1,31 @@ +--- +id: ruth-onyekwe +title: Ruth Onyekwe +description: | + I've been working in the world of Data Analytics for over 5 years and have been part of the dbt community for the last 4. With a background in International Business and Digital Marketing, I experienced first hand the need for reliable data to fuel business decisions. This inspired a career move into the technology space to be able to work with the tools and the people that were facilitating this process. Today I am leading teams to deliver data modernization projects, as well as helping grow the analytics arm of my company on a day to day basis. I also have the privilege of organising the dbt Meetups in Barcelona, Spain - and am excited to continue to grow the community across Europe. +image: /img/community/spotlight/ruth-onyekwe.jpeg +pronouns: she/her +location: Madrid, Spain +jobTitle: Data Analytics Manager +companyName: Spaulding Ridge +socialLinks: + - name: LinkedIn + link: https://www.linkedin.com/in/ruth-onyekwe/ +dateCreated: 2024-11-07 +hide_table_of_contents: true +communityAward: true +communityAwardYear: 2024 +--- + +## When did you join the dbt community and in what way has it impacted your career? + +I joined the dbt community in 2021, after meeting dbt Labs reps at a conference. Through partnering with dbt Labs and learning the technology, we (Spaulding Ridge) were able to open a whole new offering in our service catalogue, and meet the growing needs of our customers. + +## Which dbt Community leader do you identify with? How are you looking to grow your leadership in the dbt community? + +I identify with the transparent leaders - those willing to share their learnings, knowledge, and experiences. I want to encourage other dbt enthusiasts to stretch themselves professionally and actively participate in the analytics community. + +## What have you learned from community members? What do you hope others can learn from you? + +I've learnt that most of us working in data have experienced the same struggles, be it searching for the best testing frameworks, or deciding how to build optimised and scalable models, or searching for the answers to non-technical questions like how to best organise teams or how to communicate with business stakeholders and translate their needs - we're all faced with the same dilemmas. And the great thing I've learned being in the dbt community, is that if you're brave enough to share your stories, you'll connect with someone who has already gone through those experiences, and can help you reach a solution a lot faster than if you tried to start from scratch. + diff --git a/website/docs/docs/build/conversion-metrics.md b/website/docs/docs/build/conversion-metrics.md index 2ef2c3910b9..2d227f4a703 100644 --- a/website/docs/docs/build/conversion-metrics.md +++ b/website/docs/docs/build/conversion-metrics.md @@ -20,28 +20,29 @@ The specification for conversion metrics is as follows: Note that we use the double colon (::) to indicate whether a parameter is nested within another parameter. So for example, `query_params::metrics` means the `metrics` parameter is nested under `query_params`. ::: -| Parameter | Description | Type | -| --- | --- | --- | -| `name` | The name of the metric. | Required | -| `description` | The description of the metric. | Optional | -| `type` | The type of metric (such as derived, ratio, and so on.). In this case, set as 'conversion' | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `type_params` | Specific configurations for each metric type. | Required | -| `conversion_type_params` | Additional configuration specific to conversion metrics. | Required | -| `entity` | The entity for each conversion event. | Required | -| `calculation` | Method of calculation. Either `conversion_rate` or `conversions`. Defaults to `conversion_rate`. | Optional | -| `base_measure` | A list of base measure inputs | Required | -| `base_measure:name` | The base conversion event measure. | Required | -| `base_measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | -| `base_measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | -| `conversion_measure` | A list of conversion measure inputs. | Required | -| `conversion_measure:name` | The base conversion event measure.| Required | -| `conversion_measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | -| `conversion_measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | -| `window` | The time window for the conversion event, such as 7 days, 1 week, 3 months. Defaults to infinity. | Optional | -| `constant_properties` | List of constant properties. | Optional | -| `base_property` | The property from the base semantic model that you want to hold constant. | Optional | -| `conversion_property` | The property from the conversion semantic model that you want to hold constant. | Optional | +| Parameter | Description | Required | Type | +| --- | --- | --- | --- | +| `name` | The name of the metric. | Required | String | +| `description` | The description of the metric. | Optional | String | +| `type` | The type of metric (such as derived, ratio, and so on.). In this case, set as 'conversion'. | Required | String | +| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `type_params` | Specific configurations for each metric type. | Required | Dict | +| `conversion_type_params` | Additional configuration specific to conversion metrics. | Required | Dict | +| `entity` | The entity for each conversion event. | Required | String | +| `calculation` | Method of calculation. Either `conversion_rate` or `conversions`. Defaults to `conversion_rate`. | Optional | String | +| `base_measure` | A list of base measure inputs. | Required | Dict | +| `base_measure:name` | The base conversion event measure. | Required | String | +| `base_measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | String | +| `base_measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | Boolean | +| `base_measure:filter` | Optional `filter` used to apply to the base measure. | Optional | String | +| `conversion_measure` | A list of conversion measure inputs. | Required | Dict | +| `conversion_measure:name` | The base conversion event measure.| Required | String | +| `conversion_measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | String | +| `conversion_measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | Boolean | +| `window` | The time window for the conversion event, such as 7 days, 1 week, 3 months. Defaults to infinity. | Optional | String | +| `constant_properties` | List of constant properties. | Optional | List | +| `base_property` | The property from the base semantic model that you want to hold constant. | Optional | String | +| `conversion_property` | The property from the conversion semantic model that you want to hold constant. | Optional | String | Refer to [additional settings](#additional-settings) to learn how to customize conversion metrics with settings for null values, calculation type, and constant properties. @@ -61,6 +62,7 @@ metrics: name: The name of the measure # Required fill_nulls_with: Set the value in your metric definition instead of null (such as zero) # Optional join_to_timespine: true/false # Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. # Optional + filter: The filter used to apply to the base measure. # Optional conversion_measure: name: The name of the measure # Required fill_nulls_with: Set the value in your metric definition instead of null (such as zero) # Optional @@ -105,13 +107,14 @@ Next, define a conversion metric as follows: - name: visit_to_buy_conversion_rate_7d description: "Conversion rate from visiting to transaction in 7 days" type: conversion - label: Visit to Buy Conversion Rate (7-day window) + label: Visit to buy conversion rate (7-day window) type_params: conversion_type_params: base_measure: name: visits fill_nulls_with: 0 - conversion_measure: sellers + filter: {{ Dimension('visits__referrer_id') }} = 'facebook' + conversion_measure: name: sellers entity: user window: 7 days diff --git a/website/docs/docs/build/cumulative-metrics.md b/website/docs/docs/build/cumulative-metrics.md index 056ff79c6eb..24596be8b3d 100644 --- a/website/docs/docs/build/cumulative-metrics.md +++ b/website/docs/docs/build/cumulative-metrics.md @@ -18,21 +18,21 @@ Note that we use the double colon (::) to indicate whether a parameter is nested -| Parameter |
Description
| Type | -| --------- | ----------- | ---- | -| `name` | The name of the metric. | Required | -| `description` | The description of the metric. | Optional | -| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `type_params` | The type parameters of the metric. Supports nested parameters indicated by the double colon, such as `type_params::measure`. | Required | -| `type_params::cumulative_type_params` | Allows you to add a `window`, `period_agg`, and `grain_to_date` configuration. Nested under `type_params`. | Optional | -| `cumulative_type_params::window` | The accumulation window, such as 1 month, 7 days, 1 year. This can't be used with `grain_to_date`. | Optional | -| `cumulative_type_params::grain_to_date` | Sets the accumulation grain, such as `month`, which will accumulate data for one month and then restart at the beginning of the next. This can't be used with `window`. | Optional | -| `cumulative_type_params::period_agg` | Specifies how to aggregate the cumulative metric when summarizing data to a different granularity. Can be used with grain_to_date. Options are
- `first` (Takes the first value within the period)
- `last` (Takes the last value within the period
- `average` (Calculates the average value within the period).

Defaults to `first` if no `window` is specified. | Optional | -| `type_params::measure` | A dictionary describing the measure you will use. | Required | -| `measure::name` | The measure you are referencing. | Optional | -| `measure::fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | -| `measure::join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | +| Parameter |
Description
| Required | Type | +|-------------|---------------------------------------------------|----------|-----------| +| `name` | The name of the metric. | Required | String | +| `description` | The description of the metric. | Optional | String | +| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | String | +| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `type_params` | The type parameters of the metric. Supports nested parameters indicated by the double colon, such as `type_params::measure`. | Required | Dict | +| `type_params::measure` | The measure associated with the metric. Supports both shorthand (string) and object syntax. The shorthand is used if only the name is needed, while the object syntax allows specifying additional attributes. | Required | Dict | +| `measure::name` | The name of the measure being referenced. Required if using object syntax for `type_params::measure`. | Optional | String | +| `measure::fill_nulls_with` | Sets a value (for example, 0) to replace nulls in the metric definition. | Optional | Integer or string | +| `measure::join_to_timespine` | Boolean indicating if the aggregated measure should be joined to the time spine table to fill in missing dates. Default is `false`. | Optional | Boolean | +| `type_params::cumulative_type_params` | Configures the attributes like `window`, `period_agg`, and `grain_to_date` for cumulative metrics. | Optional | Dict | +| `cumulative_type_params::window` | Specifies the accumulation window, such as `1 month`, `7 days`, or `1 year`. Cannot be used with `grain_to_date`. | Optional | String | +| `cumulative_type_params::grain_to_date` | Sets the accumulation grain, such as `month`, restarting accumulation at the beginning of each specified grain period. Cannot be used with `window`. | Optional | String | +| `cumulative_type_params::period_agg` | Defines how to aggregate the cumulative metric when summarizing data to a different granularity: `first`, `last`, or `average`. Defaults to `first` if `window` is not specified. | Optional | String |
@@ -45,15 +45,34 @@ Note that we use the double colon (::) to indicate whether a parameter is nested | `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | | `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | | `type_params` | The type parameters of the metric. Supports nested parameters indicated by the double colon, such as `type_params::measure`. | Required | -| `window` | The accumulation window, such as 1 month, 7 days, 1 year. This can't be used with `grain_to_date`. | Optional | +| `window` | The accumulation window, such as `1 month`, `7 days`, or `1 year`. This can't be used with `grain_to_date`. | Optional | | `grain_to_date` | Sets the accumulation grain, such as `month`, which will accumulate data for one month and then restart at the beginning of the next. This can't be used with `window`. | Optional | | `type_params::measure` | A list of measure inputs | Required | -| `measure:name` | The measure you are referencing. | Optional | +| `measure:name` | The name of the measure being referenced. Required if using object syntax for `type_params::measure`. | Optional | | `measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero).| Optional | | `measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional |
+ + +The`type_params::measure` configuration can be written in different ways: +- Shorthand syntax — To only specify the name of the measure, use a simple string value. This is a shorthand approach when no other attributes are required. + ```yaml + type_params: + measure: revenue + ``` +- Object syntax — To add more details or attributes to the measure (such as adding a filter, handling `null` values, or specifying whether to join to a time spine), you need to use the object syntax. This allows for additional configuration beyond just the measure's name. + + ```yaml + type_params: + measure: + name: order_total + fill_nulls_with: 0 + join_to_timespine: true + ``` + + ### Complete specification The following displays the complete specification for cumulative metrics, along with an example: diff --git a/website/docs/docs/build/custom-target-names.md b/website/docs/docs/build/custom-target-names.md index ac7036de572..218fec4283d 100644 --- a/website/docs/docs/build/custom-target-names.md +++ b/website/docs/docs/build/custom-target-names.md @@ -24,6 +24,6 @@ To set a custom target name for a job in dbt Cloud, configure the **Target Name* ## dbt Cloud IDE -When developing in dbt Cloud, you can set a custom target name in your development credentials. Go to your account (from the gear menu in the top right hand corner), select the project under **Credentials**, and update the target name. +When developing in dbt Cloud, you can set a custom target name in your development credentials. Click your account name above the profile icon in the left panel, select **Account settings**, then go to **Credentials**. Choose the project to update the target name. diff --git a/website/docs/docs/build/data-tests.md b/website/docs/docs/build/data-tests.md index 59d716b4ca9..559fe468644 100644 --- a/website/docs/docs/build/data-tests.md +++ b/website/docs/docs/build/data-tests.md @@ -9,6 +9,11 @@ id: "data-tests" keywords: - test, tests, testing, dag --- + +import CopilotBeta from '/snippets/_dbt-copilot-avail.md'; + + + ## Related reference docs * [Test command](/reference/commands/test) * [Data test properties](/reference/resource-properties/data-tests) @@ -66,11 +71,27 @@ having total_amount < 0 -The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. Simple enough. +The name of this test is the name of the file: `assert_total_payment_amount_is_positive`. + +Note, you won't need to include semicolons (;) at the end of the SQL statement in your singular test files as it can cause your test to fail. + +To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content: + + -Singular data tests are easy to write—so easy that you may find yourself writing the same basic structure over and over, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend... +```yaml +version: 2 +data_tests: + - name: assert_total_payment_amount_is_positive + description: > + Refunds have a negative amount, so the total amount should always be >= 0. + Therefore return records where total amount < 0 to make the test fail. +``` + + +Singular data tests are so easy that you may find yourself writing the same basic structure repeatedly, only changing the name of a column or model. By that point, the test isn't so singular! In that case, we recommend generic data tests. ## Generic data tests Certain data tests are generic: they can be reused over and over again. A generic data test is defined in a `test` block, which contains a parametrized query and accepts arguments. It might look like: @@ -304,7 +325,6 @@ data_tests: -To suppress warnings about the rename, add `TestsConfigDeprecation` to the `silence` block of the `warn_error_options` flag in `dbt_project.yml`, [as described in the Warnings documentation](https://docs.getdbt.com/reference/global-configs/warnings).
diff --git a/website/docs/docs/build/dbt-tips.md b/website/docs/docs/build/dbt-tips.md index 817468e5e9c..0cc83394b8b 100644 --- a/website/docs/docs/build/dbt-tips.md +++ b/website/docs/docs/build/dbt-tips.md @@ -40,7 +40,7 @@ Leverage these dbt packages to streamline your workflow: - Set `vars` in your `dbt_project.yml` to define global defaults for certain conditions, which you can then override using the `--vars` flag in your commands. - Use [for loops](/guides/using-jinja?step=3) in Jinja to DRY up repetitive logic, such as selecting a series of columns that all require the same transformations and naming patterns to be applied. - Instead of relying on post-hooks, use the [grants config](/reference/resource-configs/grants) to apply permission grants in the warehouse resiliently. -- Define [source-freshness](/docs/build/sources#snapshotting-source-data-freshness) thresholds on your sources to avoid running transformations on data that has already been processed. +- Define [source-freshness](/docs/build/sources#source-data-freshness) thresholds on your sources to avoid running transformations on data that has already been processed. - Use the `+` operator on the left of a model `dbt build --select +model_name` to run a model and all of its upstream dependencies. Use the `+` operator on the right of the model `dbt build --select model_name+` to run a model and everything downstream that depends on it. - Use `dir_name` to run all models in a package or directory. - Use the `@` operator on the left of a model in a non-state-aware CI setup to test it. This operator runs all of a selection’s parents and children, and also runs the parents of its children, which in a fresh CI schema will likely not exist yet. diff --git a/website/docs/docs/build/derived-metrics.md b/website/docs/docs/build/derived-metrics.md index d5f2221907e..b6184aaeebf 100644 --- a/website/docs/docs/build/derived-metrics.md +++ b/website/docs/docs/build/derived-metrics.md @@ -10,18 +10,18 @@ In MetricFlow, derived metrics are metrics created by defining an expression usi The parameters, description, and type for derived metrics are: -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | The name of the metric. | Required | -| `description` | The description of the metric. | Optional | -| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `type_params` | The type parameters of the metric. | Required | -| `expr` | The derived expression. You see validation warnings when the derived metric is missing an `expr` or the `expr` does not use all the input metrics. | Required | -| `metrics` | The list of metrics used in the derived metrics. | Required | -| `alias` | Optional alias for the metric that you can use in the expr. | Optional | -| `filter` | Optional filter to apply to the metric. | Optional | -| `offset_window` | Set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | The name of the metric. | Required | String | +| `description` | The description of the metric. | Optional | String | +| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | String | +| `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `type_params` | The type parameters of the metric. | Required | Dict | +| `expr` | The derived expression. You'll see validation warnings when the derived metric is missing an `expr` or the `expr` does not use all the input metrics. | Required | String | +| `metrics` | The list of metrics used in the derived metrics. Each entry can include optional fields like `alias`, `filter`, or `offset_window`. | Required | List | +| `alias` | Optional alias for the metric that you can use in the `expr`. | Optional | String | +| `filter` | Optional filter to apply to the metric. | Optional | String | +| `offset_window` | Set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. | Optional | String | The following displays the complete specification for derived metrics, along with an example. diff --git a/website/docs/docs/build/dimensions.md b/website/docs/docs/build/dimensions.md index 6b665f4251c..975ae4d3160 100644 --- a/website/docs/docs/build/dimensions.md +++ b/website/docs/docs/build/dimensions.md @@ -12,16 +12,16 @@ Dimensions represent the non-aggregatable columns in your data set, which are th Groups are defined within semantic models, alongside entities and measures, and correspond to non-aggregatable columns in your dbt model that provides categorical or time-based context. In SQL, dimensions is typically included in the GROUP BY clause.--> -All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique wihtin the same semantic model. +All dimensions require a `name`, `type`, and can optionally include an `expr` parameter. The `name` for your Dimension must be unique within the same semantic model. -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | Refers to the name of the group that will be visible to the user in downstream tools. It can also serve as an alias if the column name or SQL query reference is different and provided in the `expr` parameter.

Dimension names should be unique within a semantic model, but they can be non-unique across different models as MetricFlow uses [joins](/docs/build/join-logic) to identify the right dimension. | Required | -| `type` | Specifies the type of group created in the semantic model. There are two types:

- **Categorical**: Describe attributes or features like geography or sales region.
- **Time**: Time-based dimensions like timestamps or dates. | Required | -| `type_params` | Specific type params such as if the time is primary or used as a partition | Required | -| `description` | A clear description of the dimension | Optional | -| `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use the column name itself to input a SQL expression. | Optional | -| `label` | A recommended string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | Refers to the name of the group that will be visible to the user in downstream tools. It can also serve as an alias if the column name or SQL query reference is different and provided in the `expr` parameter.

Dimension names should be unique within a semantic model, but they can be non-unique across different models as MetricFlow uses [joins](/docs/build/join-logic) to identify the right dimension. | Required | String | +| `type` | Specifies the type of group created in the semantic model. There are two types:

- **Categorical**: Describe attributes or features like geography or sales region.
- **Time**: Time-based dimensions like timestamps or dates. | Required | String | +| `type_params` | Specific type params such as if the time is primary or used as a partition. | Required | Dict | +| `description` | A clear description of the dimension. | Optional | String | +| `expr` | Defines the underlying column or SQL query for a dimension. If no `expr` is specified, MetricFlow will use the column with the same name as the group. You can use the column name itself to input a SQL expression. | Optional | String | +| `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Optional | String | Refer to the following for the complete specification for dimensions: @@ -41,7 +41,7 @@ Refer to the following example to see how dimensions are used in a semantic mode semantic_models: - name: transactions description: A record for every transaction that takes place. Carts are considered multiple transactions for each SKU. - model: {{ ref("fact_transactions") }} + model: {{ ref('fact_transactions') }} defaults: agg_time_dimension: order_date # --- entities --- @@ -67,7 +67,7 @@ semantic_models: type: categorical ``` -Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimensoin `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`. +Dimensions are bound to the primary entity of the semantic model they are defined in. For example the dimension `type` is defined in a model that has `transaction` as a primary entity. `type` is scoped to the `transaction` entity, and to reference this dimension you would use the fully qualified dimension name i.e `transaction__type`. MetricFlow requires that all semantic models have a primary entity. This is to guarantee unique dimension names. If your data source doesn't have a primary entity, you need to assign the entity a name using the `primary_entity` key. It doesn't necessarily have to map to a column in that table and assigning the name doesn't affect query generation. We recommend making these "virtual primary entities" unique across your semantic model. An example of defining a primary entity for a data source that doesn't have a primary entity column is below: @@ -122,9 +122,9 @@ dbt sl query --metrics users_created,users_deleted --group-by metric_time__year mf query --metrics users_created,users_deleted --group-by metric_time__year --order-by metric_time__year ``` -You can set `is_partition` for time to define specific time spans. Additionally, use the `type_params` section to set `time_granularity` to adjust aggregation details (hourly, daily, weekly, and so on). +You can set `is_partition` for time to define specific time spans. Additionally, use the `type_params` section to set `time_granularity` to adjust aggregation details (daily, weekly, and so on). - + @@ -161,6 +161,8 @@ measures: + + `time_granularity` specifies the grain of a time dimension. MetricFlow will transform the underlying column to the specified granularity. For example, if you add hourly granularity to a time dimension column, MetricFlow will run a `date_trunc` function to convert the timestamp to hourly. You can easily change the time grain at query time and aggregate it to a coarser grain, for example, from hourly to monthly. However, you can't go from a coarser grain to a finer grain (monthly to hourly). Our supported granularities are: @@ -172,6 +174,7 @@ Our supported granularities are: * hour * day * week +* month * quarter * year @@ -204,6 +207,50 @@ measures: agg: sum ``` + + + + +`time_granularity` specifies the grain of a time dimension. MetricFlow will transform the underlying column to the specified granularity. For example, if you add daily granularity to a time dimension column, MetricFlow will run a `date_trunc` function to convert the timestamp to daily. You can easily change the time grain at query time and aggregate it to a coarser grain, for example, from daily to monthly. However, you can't go from a coarser grain to a finer grain (monthly to daily). + +Our supported granularities are: +* day +* week +* month +* quarter +* year + +Aggregation between metrics with different granularities is possible, with the Semantic Layer returning results at the coarsest granularity by default. For example, when querying two metrics with daily and monthly granularity, the resulting aggregation will be at the monthly level. + +```yaml +dimensions: + - name: created_at + type: time + label: "Date of creation" + expr: ts_created # ts_created is the underlying column name from the table + is_partition: True + type_params: + time_granularity: day + - name: deleted_at + type: time + label: "Date of deletion" + expr: ts_deleted # ts_deleted is the underlying column name from the table + is_partition: True + type_params: + time_granularity: day + +measures: + - name: users_deleted + expr: 1 + agg: sum + agg_time_dimension: deleted_at + - name: users_created + expr: 1 + agg: sum +``` + + + @@ -313,7 +360,7 @@ Additionally, the entity is tagged as `natural` to differentiate it from a `prim semantic_models: - name: sales_person_tiers description: SCD Type II table of tiers for salespeople - model: {{ref(sales_person_tiers)}} + model: {{ ref('sales_person_tiers') }} defaults: agg_time_dimension: tier_start @@ -355,7 +402,7 @@ semantic_models: There is a transaction, product, sales_person, and customer id for every transaction. There is only one transaction id per transaction. The `metric_time` or date is reflected in UTC. - model: {{ ref(fact_transactions) }} + model: {{ ref('fact_transactions') }} defaults: agg_time_dimension: metric_time diff --git a/website/docs/docs/build/documentation.md b/website/docs/docs/build/documentation.md index d040d3c5bef..455fc9e70e0 100644 --- a/website/docs/docs/build/documentation.md +++ b/website/docs/docs/build/documentation.md @@ -7,6 +7,10 @@ id: "documentation" Good documentation for your dbt models will help downstream consumers discover and understand the datasets you curate for them. dbt provides a way to generate documentation for your dbt project and render it as a website. +import CopilotBeta from '/snippets/_dbt-copilot-avail.md'; + + + ## Related documentation * [Declaring properties](/reference/configs-and-properties) @@ -101,7 +105,18 @@ The events in this table are recorded by [Snowplow](http://github.com/snowplow/s In the above example, a docs block named `table_events` is defined with some descriptive markdown contents. There is nothing significant about the name `table_events` — docs blocks can be named however you like, as long as the name only contains alphanumeric and underscore characters and does not start with a numeric character. ### Placement -Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (i.e. the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths) and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + + +Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (for example, the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [test-paths](/reference/project-configs/test-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + + + + +Docs blocks should be placed in files with a `.md` file extension. By default, dbt will search in all resource paths for docs blocks (for example, the combined list of [model-paths](/reference/project-configs/model-paths), [seed-paths](/reference/project-configs/seed-paths), [analysis-paths](/reference/project-configs/analysis-paths), [macro-paths](/reference/project-configs/macro-paths), and [snapshot-paths](/reference/project-configs/snapshot-paths)) — you can adjust this behavior using the [docs-paths](/reference/project-configs/docs-paths) config. + + ### Usage diff --git a/website/docs/docs/build/environment-variables.md b/website/docs/docs/build/environment-variables.md index 01601ce7eb8..4c0b785448e 100644 --- a/website/docs/docs/build/environment-variables.md +++ b/website/docs/docs/build/environment-variables.md @@ -32,7 +32,7 @@ There are four levels of environment variables: To set environment variables at the project and environment level, click **Deploy** in the top left, then select **Environments**. Click **Environments Variables** to add and update your environment variables. - + @@ -62,7 +62,10 @@ Every job runs in a specific, deployment environment, and by default, a job will **Overriding environment variables at the personal level** -You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, click the gear icon in the top right. Under "Your Profile," click **Credentials** and select your project. Click **Edit** and make any changes in "Environment Variables." +You can also set a personal value override for an environment variable when you develop in the dbt-integrated developer environment (IDE). By default, dbt Cloud uses environment variable values set in the project's development environment. To see and override these values, from dbt Cloud: +- Click on your account name in the left side menu and select **Account settings**. +- Under the **Your profile** section, click **Credentials** and then select your project. +- Scroll to the **Environment variables** section and click **Edit** to make the necessary changes. @@ -80,7 +83,7 @@ If you change the value of an environment variable mid-session while using the I To refresh the IDE mid-development, click on either the green 'ready' signal or the red 'compilation error' message at the bottom right corner of the IDE. A new modal will pop up, and you should select the Refresh IDE button. This will load your environment variables values into your development environment. - + There are some known issues with partial parsing of a project and changing environment variables mid-session in the IDE. If you find that your dbt project is not compiling to the values you've set, try deleting the `target/partial_parse.msgpack` file in your dbt project which will force dbt to re-compile your whole project. @@ -97,49 +100,55 @@ While all environment variables are encrypted at rest in dbt Cloud, dbt Cloud ha dbt Cloud has a number of pre-defined variables built in. Variables are set automatically and cannot be changed. -**dbt Cloud IDE details** +#### dbt Cloud IDE details The following environment variable is set automatically for the dbt Cloud IDE: -- `DBT_CLOUD_GIT_BRANCH`: Provides the development Git branch name in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). - - Available in dbt v 1.6 and later. +- `DBT_CLOUD_GIT_BRANCH` — Provides the development Git branch name in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). - The variable changes when the branch is changed. - Doesn't require restarting the IDE after a branch change. - Currently not available in the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). Use case — This is useful in cases where you want to dynamically use the Git branch name as a prefix for a [development schema](/docs/build/custom-schemas) ( `{{ env_var ('DBT_CLOUD_GIT_BRANCH') }}` ). -**dbt Cloud context** +#### dbt Cloud context -The following environment variables are set automatically for deployment runs: +The following environment variables are set automatically: -- `DBT_ENV`: This key is reserved for the dbt Cloud application and will always resolve to 'prod' +- `DBT_ENV` — This key is reserved for the dbt Cloud application and will always resolve to 'prod'. For deployment runs only. +- `DBT_CLOUD_ENVIRONMENT_NAME` — The name of the dbt Cloud environment in which `dbt` is running. +- `DBT_CLOUD_ENVIRONMENT_TYPE` — The type of dbt Cloud environment in which `dbt` is running. The valid values are `dev`, `staging`, or `prod`. It can be unset, so use a default like `{{env_var('DBT_CLOUD_ENVIRONMENT_TYPE', '')}}`. -**Run details** -- `DBT_CLOUD_PROJECT_ID`: The ID of the dbt Cloud Project for this run -- `DBT_CLOUD_JOB_ID`: The ID of the dbt Cloud Job for this run -- `DBT_CLOUD_RUN_ID`: The ID of this particular run -- `DBT_CLOUD_RUN_REASON_CATEGORY`: The "category" of the trigger for this run (one of: `scheduled`, `github_pull_request`, `gitlab_merge_request`, `azure_pull_request`, `other`) -- `DBT_CLOUD_RUN_REASON`: The specific trigger for this run (eg. `Scheduled`, `Kicked off by `, or custom via `API`) -- `DBT_CLOUD_ENVIRONMENT_ID`: The ID of the environment for this run -- `DBT_CLOUD_ACCOUNT_ID`: The ID of the dbt Cloud account for this run +#### Run details -**Git details** +- `DBT_CLOUD_PROJECT_ID` — The ID of the dbt Cloud Project for this run +- `DBT_CLOUD_JOB_ID` — The ID of the dbt Cloud Job for this run +- `DBT_CLOUD_RUN_ID` — The ID of this particular run +- `DBT_CLOUD_RUN_REASON_CATEGORY` — The "category" of the trigger for this run (one of: `scheduled`, `github_pull_request`, `gitlab_merge_request`, `azure_pull_request`, `other`) +- `DBT_CLOUD_RUN_REASON` — The specific trigger for this run (eg. `Scheduled`, `Kicked off by `, or custom via `API`) +- `DBT_CLOUD_ENVIRONMENT_ID` — The ID of the environment for this run +- `DBT_CLOUD_ACCOUNT_ID` — The ID of the dbt Cloud account for this run + +#### Git details _The following variables are currently only available for GitHub, GitLab, and Azure DevOps PR builds triggered via a webhook_ -- `DBT_CLOUD_PR_ID`: The Pull Request ID in the connected version control system -- `DBT_CLOUD_GIT_SHA`: The git commit SHA which is being run for this Pull Request build +- `DBT_CLOUD_PR_ID` — The Pull Request ID in the connected version control system +- `DBT_CLOUD_GIT_SHA` — The git commit SHA which is being run for this Pull Request build ### Example usage Environment variables can be used in many ways, and they give you the power and flexibility to do what you want to do more easily in dbt Cloud. -#### Clone private packages + + Now that you can set secrets as environment variables, you can pass git tokens into your package HTTPS URLs to allow for on-the-fly cloning of private repositories. Read more about enabling [private package cloning](/docs/build/packages#private-packages). -#### Dynamically set your warehouse in your Snowflake connection + + + + Environment variables make it possible to dynamically change the Snowflake virtual warehouse size depending on the job. Instead of calling the warehouse name directly in your project connection, you can reference an environment variable which will get set to a specific virtual warehouse at runtime. For example, suppose you'd like to run a full-refresh job in an XL warehouse, but your incremental job only needs to run in a medium-sized warehouse. Both jobs are configured in the same dbt Cloud environment. In your connection configuration, you can use an environment variable to set the warehouse name to `{{env_var('DBT_WAREHOUSE')}}`. Then in the job settings, you can set a different value for the `DBT_WAREHOUSE` environment variable depending on the job's workload. @@ -160,7 +169,10 @@ However, there are some limitations when using env vars with Snowflake OAuth Con Something to note, if you supply an environment variable in the account/host field, Snowflake OAuth Connection will **fail** to connect. This happens because the field doesn't pass through Jinja rendering, so dbt Cloud simply passes the literal `env_var` code into a URL string like `{{ env_var("DBT_ACCOUNT_HOST_NAME") }}.snowflakecomputing.com`, which is an invalid hostname. Use [extended attributes](/docs/deploy/deploy-environments#deployment-credentials) instead. ::: -#### Audit your run metadata + + + + Here's another motivating example that uses the dbt Cloud run ID, which is set automatically at each run. This additional data field can be used for auditing and debugging: ```sql @@ -186,3 +198,13 @@ select *, from users_aggregated ``` + + + + + +import SLEnvVars from '/snippets/_sl-env-vars.md'; + + + + diff --git a/website/docs/docs/build/exposures.md b/website/docs/docs/build/exposures.md index 1a85d5fb415..a3ac7bcb3ce 100644 --- a/website/docs/docs/build/exposures.md +++ b/website/docs/docs/build/exposures.md @@ -69,7 +69,7 @@ dbt test -s +exposure:weekly_jaffle_report ``` -When we generate the dbt Explorer site, you'll see the exposure appear: +When we generate the [dbt Explorer site](/docs/collaborate/explore-projects), you'll see the exposure appear: @@ -77,5 +77,5 @@ When we generate the dbt Explorer site, you'll see the exposure appear: ## Related docs * [Exposure properties](/reference/exposure-properties) -* [`exposure:` selection method](/reference/node-selection/methods#the-exposure-method) +* [`exposure:` selection method](/reference/node-selection/methods#exposure) * [Data health tiles](/docs/collaborate/data-tile) diff --git a/website/docs/docs/build/groups.md b/website/docs/docs/build/groups.md index 890ee96901a..1be4388c246 100644 --- a/website/docs/docs/build/groups.md +++ b/website/docs/docs/build/groups.md @@ -119,4 +119,4 @@ dbt.exceptions.DbtReferenceError: Parsing Error * [Model Access](/docs/collaborate/govern/model-access#groups) * [Group configuration](/reference/resource-configs/group) -* [Group selection](/reference/node-selection/methods#the-group-method) +* [Group selection](/reference/node-selection/methods#group) diff --git a/website/docs/docs/build/hooks-operations.md b/website/docs/docs/build/hooks-operations.md index 6cec2a673c0..842d3fb99a3 100644 --- a/website/docs/docs/build/hooks-operations.md +++ b/website/docs/docs/build/hooks-operations.md @@ -40,8 +40,6 @@ Hooks are snippets of SQL that are executed at different times: Hooks are a more-advanced capability that enable you to run custom SQL, and leverage database-specific actions, beyond what dbt makes available out-of-the-box with standard materializations and configurations. - - If (and only if) you can't leverage the [`grants` resource-config](/reference/resource-configs/grants), you can use `post-hook` to perform more advanced workflows: * Need to apply `grants` in a more complex way, which the dbt Core `grants` config doesn't (yet) support. diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md new file mode 100644 index 00000000000..4aff8b5839c --- /dev/null +++ b/website/docs/docs/build/incremental-microbatch.md @@ -0,0 +1,492 @@ +--- +title: "About microbatch incremental models" +description: "Learn about the 'microbatch' strategy for incremental models." +id: "incremental-microbatch" +--- + +# About microbatch incremental models + +:::info Microbatch + +The new `microbatch` strategy is available in beta for [dbt Cloud "Latest"](/docs/dbt-versions/cloud-release-tracks) and dbt Core v1.9. + +If you use a custom microbatch macro, set a [distinct behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution. If you don't have a custom microbatch macro, you don't need to set this flag as dbt will handle microbatching automatically for any model using the [microbatch strategy](#how-microbatch-compares-to-other-incremental-strategies). + +Read and participate in the discussion: [dbt-core#10672](https://github.com/dbt-labs/dbt-core/discussions/10672) + +Refer to [Supported incremental strategies by adapter](/docs/build/incremental-strategy#supported-incremental-strategies-by-adapter) for a list of supported adapters. + +::: + +## What is "microbatch" in dbt? + +Incremental models in dbt are a [materialization](/docs/build/materializations) designed to efficiently update your data warehouse tables by only transforming and loading _new or changed data_ since the last run. Instead of reprocessing an entire dataset every time, incremental models process a smaller number of rows, and then append, update, or replace those rows in the existing table. This can significantly reduce the time and resources required for your data transformations. + +Microbatch is an incremental strategy designed for large time-series datasets: +- It relies solely on a time column ([`event_time`](/reference/resource-configs/event-time)) to define time-based ranges for filtering. Set the `event_time` column for your microbatch model and its direct parents (upstream models). Note, this is different to `partition_by`, which groups rows into partitions. +- It complements, rather than replaces, existing incremental strategies by focusing on efficiency and simplicity in batch processing. +- Unlike traditional incremental strategies, microbatch enables you to [reprocess failed batches](/docs/build/incremental-microbatch#retry), auto-detect [parallel batch execution](#parallel-batch-execution), and eliminate the need to implement complex conditional logic for [backfilling](#backfills). + +- Note, microbatch might not be the best strategy for all use cases. Consider other strategies for use cases such as not having a reliable `event_time` column or if you want more control over the incremental logic. Read more in [How `microbatch` compares to other incremental strategies](#how-microbatch-compares-to-other-incremental-strategies). + +### How microbatch works + +When dbt runs a microbatch model — whether for the first time, during incremental runs, or in specified backfills — it will split the processing into multiple queries (or "batches"), based on the `event_time` and `batch_size` you configure. + +Each "batch" corresponds to a single bounded time period (by default, a single day of data). Where other incremental strategies operate only on "old" and "new" data, microbatch models treat every batch as an atomic unit that can be built or replaced on its own. Each batch is independent and . + +This is a powerful abstraction that makes it possible for dbt to run batches [separately](#backfills), concurrently, and [retry](#retry) them independently. + +## Example + +A `sessions` model aggregates and enriches data that comes from two other models: +- `page_views` is a large, time-series table. It contains many rows, new records almost always arrive after existing ones, and existing records rarely update. It uses the `page_view_start` column as its `event_time`. +- `customers` is a relatively small dimensional table. Customer attributes update often, and not in a time-based manner — that is, older customers are just as likely to change column values as newer customers. The customers model doesn't configure an `event_time` column. + +As a result: + +- Each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch. +- The `customers` table isn't filtered, resulting in a full scan for every batch. + +:::tip +In addition to configuring `event_time` for the target table, you should also specify it for any upstream models that you want to filter, even if they have different time columns. +::: + + + +```yaml +models: + - name: page_views + config: + event_time: page_view_start +``` + + +We run the `sessions` model for October 1, 2024, and then again for October 2. It produces the following queries: + + + + + +The [`event_time`](/reference/resource-configs/event-time) for the `sessions` model is set to `session_start`, which marks the beginning of a user’s session on the website. This setting allows dbt to combine multiple page views (each tracked by their own `page_view_start` timestamps) into a single session. This way, `session_start` differentiates the timing of individual page views from the broader timeframe of the entire user session. + + + +```sql +{{ config( + materialized='incremental', + incremental_strategy='microbatch', + event_time='session_start', + begin='2020-01-01', + batch_size='day' +) }} + +with page_views as ( + + -- this ref will be auto-filtered + select * from {{ ref('page_views') }} + +), + +customers as ( + + -- this ref won't + select * from {{ ref('customers') }} + +), + +select + page_views.id as session_id, + page_views.page_view_start as session_start, + customers.* + from page_views + left join customers + on page_views.customer_id = customer.id +``` + + + + + + + + + +```sql + +with page_views as ( + + select * from ( + -- filtered on configured event_time + select * from "analytics"."page_views" + where page_view_start >= '2024-10-01 00:00:00' -- Oct 1 + and page_view_start < '2024-10-02 00:00:00' + ) + +), + +customers as ( + + select * from "analytics"."customers" + +), + +... +``` + + + + + + + + + +```sql + +with page_views as ( + + select * from ( + -- filtered on configured event_time + select * from "analytics"."page_views" + where page_view_start >= '2024-10-02 00:00:00' -- Oct 2 + and page_view_start < '2024-10-03 00:00:00' + ) + +), + +customers as ( + + select * from "analytics"."customers" + +), + +... +``` + + + + + + + +dbt will instruct the data platform to take the result of each batch query and insert, update, or replace the contents of the `analytics.sessions` table for the same day of data. To perform this operation, dbt will use the most efficient atomic mechanism for "full batch" replacement that is available on each data platform. + +It does not matter whether the table already contains data for that day. Given the same input data, the resulting table is the same no matter how many times a batch is reprocessed. + + + +## Relevant configs + +Several configurations are relevant to microbatch models, and some are required: + + +| Config | Description | Default | Type | Required | +|----------|---------------|---------|------|---------| +| [`event_time`](/reference/resource-configs/event-time) | The column indicating "at what time did the row occur." Required for your microbatch model and any direct parents that should be filtered. | N/A | Column | Required | +| [`begin`](/reference/resource-configs/begin) | The "beginning of time" for the microbatch model. This is the starting point for any initial or full-refresh builds. For example, a daily-grain microbatch model run on `2024-10-01` with `begin = '2023-10-01` will process 366 batches (it's a leap year!) plus the batch for "today." | N/A | Date | Required | +| [`batch_size`](/reference/resource-configs/batch-size) | The granularity of your batches. Supported values are `hour`, `day`, `month`, and `year` | N/A | String | Required | +| [`lookback`](/reference/resource-configs/lookback) | Process X batches prior to the latest bookmark to capture late-arriving records. | `1` | Integer | Optional | +| [`concurrent_batches`](/reference/resource-properties/concurrent_batches) | Overrides dbt's auto detect for running batches concurrently (at the same time). Read more about [configuring concurrent batches](/docs/build/incremental-microbatch#configure-concurrent_batches). Setting to
* `true` runs batches concurrently (in parallel).
* `false` runs batches sequentially (one after the other). | `None` | Boolean | Optional | + + + +### Required configs for specific adapters +Some adapters require additional configurations for the microbatch strategy. This is because each adapter implements the microbatch strategy differently. + +The following table lists the required configurations for the specific adapters, in addition to the standard microbatch configs: + +| Adapter | `unique_key` config | `partition_by` config | +|----------|------------------|--------------------| +| [`dbt-postgres`](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ Required | N/A | +| [`dbt-spark`](/reference/resource-configs/spark-configs#incremental-models) | N/A | ✅ Required | +| [`dbt-bigquery`](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | N/A | ✅ Required | + +For example, if you're using `dbt-postgres`, configure `unique_key` as follows: + + + +```sql +{{ config( + materialized='incremental', + incremental_strategy='microbatch', + unique_key='sales_id', ## required for dbt-postgres + event_time='transaction_date', + begin='2023-01-01', + batch_size='day' +) }} + +select + sales_id, + transaction_date, + customer_id, + product_id, + total_amount +from {{ source('sales', 'transactions') }} + +``` + + In this example, `unique_key` is required because `dbt-postgres` microbatch uses the `merge` strategy, which needs a `unique_key` to identify which rows in the data warehouse need to get merged. Without a `unique_key`, dbt won't be able to match rows between the incoming batch and the existing table. + + + +### Full refresh + +As a best practice, we recommend configuring `full_refresh: False` on microbatch models so that they ignore invocations with the `--full-refresh` flag. If you need to reprocess historical data, do so with a targeted backfill that specifies explicit start and end dates. + +## Usage + +**You must write your model query to process (read and return) exactly one "batch" of data**. This is a simplifying assumption and a powerful one: +- You don’t need to think about `is_incremental` filtering +- You don't need to pick among DML strategies (upserting/merging/replacing) +- You can preview your model, and see the exact records for a given batch that will appear when that batch is processed and written to the table + +When you run a microbatch model, dbt will evaluate which batches need to be loaded, break them up into a SQL query per batch, and load each one independently. + +dbt will automatically filter upstream inputs (`source` or `ref`) that define `event_time`, based on the `lookback` and `batch_size` configs for this model. + +During standard incremental runs, dbt will process batches according to the current timestamp and the configured `lookback`, with one query per batch. + + + +**Note:** If there’s an upstream model that configures `event_time`, but you *don’t* want the reference to it to be filtered, you can specify `ref('upstream_model').render()` to opt-out of auto-filtering. This isn't generally recommended — most models that configure `event_time` are fairly large, and if the reference is not filtered, each batch will perform a full scan of this input table. + +## Backfills + +Whether to fix erroneous source data or retroactively apply a change in business logic, you may need to reprocess a large amount of historical data. + +Backfilling a microbatch model is as simple as selecting it to run or build, and specifying a "start" and "end" for `event_time`. Note that `--event-time-start` and `--event-time-end` are mutually necessary, meaning that if you specify one, you must specify the other. + +As always, dbt will process the batches between the start and end as independent queries. + +```bash +dbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04" +``` + + + + +## Retry + +If one or more of your batches fail, you can use `dbt retry` to reprocess _only_ the failed batches. + +![Partial retry](https://github.com/user-attachments/assets/f94c4797-dcc7-4875-9623-639f70c97b8f) + +## Timezones + +For now, dbt assumes that all values supplied are in UTC: + +- `event_time` +- `begin` +- `--event-time-start` +- `--event-time-end` + +While we may consider adding support for custom time zones in the future, we also believe that defining these values in UTC makes everyone's lives easier. + +## Parallel batch execution + +The microbatch strategy offers the benefit of updating a model in smaller, more manageable batches. Depending on your use case, configuring your microbatch models to run in parallel offers faster processing, in comparison to running batches sequentially. + +Parallel batch execution means that multiple batches are processed at the same time, instead of one after the other (sequentially) for faster processing of your microbatch models. + +dbt automatically detects whether a batch can be run in parallel in most cases, which means you don’t need to configure this setting. However, the [`concurrent_batches` config](/reference/resource-properties/concurrent_batches) is available as an override (not a gate), allowing you to specify whether batches should or shouldn’t be run in parallel in specific cases. + +For example, if you have a microbatch model with 12 batches, you can execute those batches to run in parallel. Specifically they'll run in parallel limited by the number of [available threads](/docs/running-a-dbt-project/using-threads). + +### Prerequisites + +To enable parallel execution, you must: + +- Use a supported adapter: + - Snowflake + - Databricks + - More adapters coming soon! + - We'll be continuing to test and add concurrency support for adapters. This means that some adapters might get concurrency support _after_ the 1.9 initial release. + +- Meet [additional conditions](#how-parallel-batch-execution-works) described in the following section. + +### How parallel batch execution works + +A batch can only run in parallel if all of these conditions are met: + +| Condition | Parallel execution | Sequential execution| +| ---------------| :------------------: | :----------: | +| **Not** the first batch | ✅ | - | +| **Not** the last batch | ✅ | - | +| [Adapter supports](#prerequisites) parallel batches | ✅ | - | + + +After checking for the conditions in the previous table — and if `concurrent_batches` value isn't set, dbt will intelligently auto-detect if the model invokes the [`{{ this }}`](/reference/dbt-jinja-functions/this) Jinja function. If it references `{{ this }}`, the batches will run sequentially since `{{ this }}` represents the database of the current model and referencing the same relation causes conflict. + +Otherwise, if `{{ this }}` isn't detected (and other conditions are met), the batches will run in parallel, which can be overriden when you [set a value for `concurrent_batches`](/reference/resource-properties/concurrent_batches). + +### Parallel or sequential execution + +Choosing between parallel batch execution and sequential processing depends on the specific requirements of your use case. + +- Parallel batch execution is faster but requires logic independent of batch execution order. For example, if you're developing a data pipeline for a system that processes user transactions in batches, each batch is executed in parallel for better performance. However, the logic used to process each transaction shouldn't depend on the order of how batches are executed or completed. +- Sequential processing is slower but essential for calculations like [cumulative metrics](/docs/build/cumulative) in microbatch models. It processes data in the correct order, allowing each step to build on the previous one. + + + +### Configure `concurrent_batches` + +By default, dbt auto-detects whether batches can run in parallel for microbatch models, and this works correctly in most cases. However, you can override dbt's detection by setting the [`concurrent_batches` config](/reference/resource-properties/concurrent_batches) in your `dbt_project.yml` or model `.sql` file to specify parallel or sequential execution, given you meet all the [conditions](#prerequisites): + + + + + + +```yaml +models: + +concurrent_batches: true # value set to true to run batches in parallel +``` + + + + + + + + +```sql +{{ + config( + materialized='incremental', + incremental_strategy='microbatch', + event_time='session_start', + begin='2020-01-01', + batch_size='day + concurrent_batches=true, # value set to true to run batches in parallel + ... + ) +}} + +select ... +``` + + + + +## How microbatch compares to other incremental strategies + +As data warehouses roll out new operations for concurrently replacing/upserting data partitions, we may find that the new operation for the data warehouse is more efficient than what the adapter uses for microbatch. In such instances, we reserve the right the update the default operation for microbatch, so long as it works as intended/documented for models that fit the microbatch paradigm. + +Most incremental models rely on the end user (you) to explicitly tell dbt what "new" means, in the context of each model, by writing a filter in an `{% if is_incremental() %}` conditional block. You are responsible for crafting this SQL in a way that queries [`{{ this }}`](/reference/dbt-jinja-functions/this) to check when the most recent record was last loaded, with an optional look-back window for late-arriving records. + +Other incremental strategies will control _how_ the data is being added into the table — whether append-only `insert`, `delete` + `insert`, `merge`, `insert overwrite`, etc — but they all have this in common. + +As an example: + +```sql +{{ + config( + materialized='incremental', + incremental_strategy='delete+insert', + unique_key='date_day' + ) +}} + +select * from {{ ref('stg_events') }} + + {% if is_incremental() %} + -- this filter will only be applied on an incremental run + -- add a lookback window of 3 days to account for late-arriving records + where date_day >= (select {{ dbt.dateadd("day", -3, "max(date_day)") }} from {{ this }}) + {% endif %} + +``` + +For this incremental model: + +- "New" records are those with a `date_day` greater than the maximum `date_day` that has previously been loaded +- The lookback window is 3 days +- When there are new records for a given `date_day`, the existing data for `date_day` is deleted and the new data is inserted + +Let’s take our same example from before, and instead use the new `microbatch` incremental strategy: + + + +```sql +{{ + config( + materialized='incremental', + incremental_strategy='microbatch', + event_time='event_occured_at', + batch_size='day', + lookback=3, + begin='2020-01-01', + full_refresh=false + ) +}} + +select * from {{ ref('stg_events') }} -- this ref will be auto-filtered +``` + + + +Where you’ve also set an `event_time` for the model’s direct parents - in this case, `stg_events`: + + + +```yaml +models: + - name: stg_events + config: + event_time: my_time_field +``` + + + +And that’s it! + +When you run the model, each batch templates a separate query. For example, if you were running the model on October 1, dbt would template separate queries for each day between September 28 and October 1, inclusive — four batches in total. + +The query for `2024-10-01` would look like: + + + +```sql +select * from ( + select * from "analytics"."stg_events" + where my_time_field >= '2024-10-01 00:00:00' + and my_time_field < '2024-10-02 00:00:00' +) +``` + + + +Based on your data platform, dbt will choose the most efficient atomic mechanism to insert, update, or replace these four batches (`2024-09-28`, `2024-09-29`, `2024-09-30`, and `2024-10-01`) in the existing table. diff --git a/website/docs/docs/build/incremental-models-overview.md b/website/docs/docs/build/incremental-models-overview.md index 16c950eb331..bddc6b0a55d 100644 --- a/website/docs/docs/build/incremental-models-overview.md +++ b/website/docs/docs/build/incremental-models-overview.md @@ -42,4 +42,5 @@ Transaction management, a process used in certain data platforms, ensures that a ## Related docs - [Incremental models](/docs/build/incremental-models) to learn how to configure incremental models in dbt. - [Incremental strategies](/docs/build/incremental-strategy) to understand how dbt implements incremental models on different databases. +- [Microbatch](/docs/build/incremental-strategy) to understand a new incremental strategy intended for efficient and resilient processing of very large time-series datasets. - [Materializations best practices](/best-practices/materializations/1-guide-overview) to learn about the best practices for using materializations in dbt. diff --git a/website/docs/docs/build/incremental-models.md b/website/docs/docs/build/incremental-models.md index 2f8bbc46c3a..0560797c9bc 100644 --- a/website/docs/docs/build/incremental-models.md +++ b/website/docs/docs/build/incremental-models.md @@ -94,7 +94,7 @@ Not specifying a `unique_key` will result in append-only behavior, which means d The optional `unique_key` parameter specifies a field (or combination of fields) that defines the grain of your model. That is, the field(s) identify a single unique row. You can define `unique_key` in a configuration block at the top of your model, and it can be a single column name or a list of column names. -The `unique_key` should be supplied in your model definition as a string representing a single column or a list of single-quoted column names that can be used together, for example, `['col1', 'col2', …])`. Columns used in this way should not contain any nulls, or the incremental model run may fail. Either ensure that each column has no nulls (for example with `coalesce(COLUMN_NAME, 'VALUE_IF_NULL')`), or define a single-column [surrogate key](/terms/surrogate-key) (for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source)). +The `unique_key` should be supplied in your model definition as a string representing a single column or a list of single-quoted column names that can be used together, for example, `['col1', 'col2', …])`. Columns used in this way should not contain any nulls, or the incremental model may fail to match rows and generate duplicate rows. Either ensure that each column has no nulls (for example with `coalesce(COLUMN_NAME, 'VALUE_IF_NULL')`) or define a single-column [surrogate key](https://www.getdbt.com/blog/guide-to-surrogate-key) (for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source)). :::tip In cases where you need multiple columns in combination to uniquely identify each row, we recommend you pass these columns as a list (`unique_key = ['user_id', 'session_number']`), rather than a string expression (`unique_key = 'concat(user_id, session_number)'`). @@ -103,7 +103,7 @@ By using the first syntax, which is more universal, dbt can ensure that the colu When you pass a list in this way, please ensure that each column does not contain any nulls, or the incremental model run may fail. -Alternatively, you can define a single-column [surrogate key](/terms/surrogate-key), for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source). +Alternatively, you can define a single-column [surrogate key](https://www.getdbt.com/blog/guide-to-surrogate-key), for example with [`dbt_utils.generate_surrogate_key`](https://github.com/dbt-labs/dbt-utils#generate_surrogate_key-source). ::: When you define a `unique_key`, you'll see this behavior for each row of "new" data returned by your dbt model: @@ -111,10 +111,10 @@ When you define a `unique_key`, you'll see this behavior for each row of "new" d * If the same `unique_key` is present in the "new" and "old" model data, dbt will update/replace the old row with the new row of data. The exact mechanics of how that update/replace takes place will vary depending on your database, [incremental strategy](/docs/build/incremental-strategy), and [strategy specific configs](/docs/build/incremental-strategy#strategy-specific-configs). * If the `unique_key` is _not_ present in the "old" data, dbt will insert the entire row into the table. -Please note that if there's a unique_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and [incremental strategy](/docs/build/incremental-strategy). If you're having issues running an incremental model, it's a good idea to double check that the unique key is truly unique in both your existing database table and your new incremental rows. You can [learn more about surrogate keys here](/terms/surrogate-key). +Please note that if there's a unique_key with more than one row in either the existing target table or the new incremental rows, the incremental model may fail depending on your database and [incremental strategy](/docs/build/incremental-strategy). If you're having issues running an incremental model, it's a good idea to double check that the unique key is truly unique in both your existing database table and your new incremental rows. You can [learn more about surrogate keys here](https://www.getdbt.com/blog/guide-to-surrogate-key). :::info -While common incremental strategies, such as`delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy). +While common incremental strategies, such as `delete+insert` + `merge`, might use `unique_key`, others don't. For example, the `insert_overwrite` strategy does not use `unique_key`, because it operates on partitions of data rather than individual rows. For more information, see [About incremental_strategy](/docs/build/incremental-strategy). ::: #### `unique_key` example @@ -156,15 +156,17 @@ Building this model incrementally without the `unique_key` parameter would resul ## How do I rebuild an incremental model? If your incremental model logic has changed, the transformations on your new rows of data may diverge from the historical transformations, which are stored in your target table. In this case, you should rebuild your incremental model. -To force dbt to rebuild the entire incremental model from scratch, use the `--full-refresh` flag on the command line. This flag will cause dbt to drop the existing target table in the database before rebuilding it for all-time. +To force dbt to rebuild the entire incremental model from scratch, use the `--full-refresh` flag on the command line. This flag will cause dbt to drop the existing target table in the database before rebuilding it for all-time. ```bash $ dbt run --full-refresh --select my_incremental_model+ ``` + It's also advisable to rebuild any downstream models, as indicated by the trailing `+`. -For detailed usage instructions, check out the [dbt run](/reference/commands/run) documentation. +You can optionally use the [`full_refresh config`](/reference/resource-configs/full_refresh) to set a resource to always or never full-refresh at the project or resource level. If specified as true or false, the `full_refresh` config will take precedence over the presence or absence of the `--full-refresh` flag. +For detailed usage instructions, check out the [dbt run](/reference/commands/run) documentation. ## What if the columns of my incremental model change? @@ -212,11 +214,11 @@ Currently, `on_schema_change` only tracks top-level column changes. It does not ### Default behavior -This is the behavior if `on_schema_change: ignore`, which is set by default, and on older versions of dbt. +This is the behavior of `on_schema_change: ignore`, which is set by default. If you add a column to your incremental model, and execute a `dbt run`, this column will _not_ appear in your target table. -Similarly, if you remove a column from your incremental model, and execute a `dbt run`, this column will _not_ be removed from your target table. +If you remove a column from your incremental model and execute a `dbt run`, `dbt run` will fail. Instead, whenever the logic of your incremental changes, execute a full-refresh run of both your incremental model and any downstream models. diff --git a/website/docs/docs/build/incremental-strategy.md b/website/docs/docs/build/incremental-strategy.md index 8e86da0eba8..9176e962a3a 100644 --- a/website/docs/docs/build/incremental-strategy.md +++ b/website/docs/docs/build/incremental-strategy.md @@ -10,32 +10,31 @@ There are various strategies to implement the concept of incremental materializa * The reliability of your `unique_key`. * The support of certain features in your data platform. -An optional `incremental_strategy` config is provided in some adapters that controls the code that dbt uses -to build incremental models. +An optional `incremental_strategy` config is provided in some adapters that controls the code that dbt uses to build incremental models. -### Supported incremental strategies by adapter - -Click the name of the adapter in the below table for more information about supported incremental strategies. +:::info Microbatch -The `merge` strategy is available in dbt-postgres and dbt-redshift beginning in dbt v1.6. +The [`microbatch` incremental strategy](/docs/build/incremental-microbatch) is intended for large time-series datasets. dbt will process the incremental model in multiple queries (or "batches") based on a configured `event_time` column. Depending on the volume and nature of your data, this can be more efficient and resilient than using a single query for adding new data. -| data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | -|-----------------------------------------------------------------------------------------------------|:--------:|:-------:|:---------------:|:------------------:| -| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | -| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | -| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | | ✅ | | ✅ | -| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | ✅ | ✅ | | ✅ | -| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | ✅ | ✅ | | ✅ | -| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | ✅ | ✅ | ✅ | | -| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | ✅ | ✅ | ✅ | | -| [dbt-fabric](/reference/resource-configs/fabric-configs#incremental) | ✅ | | ✅ | | +::: +### Supported incremental strategies by adapter -:::note Snowflake Configurations +This table represents the availability of each incremental strategy, based on the latest version of dbt Core and each adapter. -dbt has changed the default materialization for incremental table merges from `temporary table` to `view`. For more information about this change and instructions for setting the configuration to a temp table, please read about [Snowflake temporary tables](/reference/resource-configs/snowflake-configs#temporary-tables). +Click the name of the adapter in the below table for more information about supported incremental strategies. -::: +| Data platform adapter | `append` | `merge` | `delete+insert` | `insert_overwrite` | `microbatch` | +|-----------------------|:--------:|:-------:|:---------------:|:------------------:|:-------------------:| +| [dbt-postgres](/reference/resource-configs/postgres-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | +| [dbt-redshift](/reference/resource-configs/redshift-configs#incremental-materialization-strategies) | ✅ | ✅ | ✅ | | ✅ | +| [dbt-bigquery](/reference/resource-configs/bigquery-configs#merge-behavior-incremental-models) | | ✅ | | ✅ | ✅ | +| [dbt-spark](/reference/resource-configs/spark-configs#incremental-models) | ✅ | ✅ | | ✅ | ✅ | +| [dbt-databricks](/reference/resource-configs/databricks-configs#incremental-models) | ✅ | ✅ | | ✅ | ✅ | +| [dbt-snowflake](/reference/resource-configs/snowflake-configs#merge-behavior-incremental-models) | ✅ | ✅ | ✅ | | ✅ | +| [dbt-trino](/reference/resource-configs/trino-configs#incremental) | ✅ | ✅ | ✅ | | | +| [dbt-fabric](/reference/resource-configs/fabric-configs#incremental) | ✅ | | ✅ | | | +| [dbt-athena](/reference/resource-configs/athena-configs#incremental-models) | ✅ | ✅ | | ✅ | | ### Configuring incremental strategy @@ -200,6 +199,7 @@ Before diving into [custom strategies](#custom-strategies), it's important to un | `delete+insert` | `get_incremental_delete_insert_sql` | | `merge` | `get_incremental_merge_sql` | | `insert_overwrite` | `get_incremental_insert_overwrite_sql` | +| `microbatch` | `get_incremental_microbatch_sql` | For example, a built-in strategy for the `append` can be defined and used with the following files: @@ -241,7 +241,13 @@ select * from {{ ref("some_model") }} ### Custom strategies -Starting from dbt version 1.2 and onwards, users have an easier alternative to [creating an entirely new materialization](/guides/create-new-materializations). They define and use their own "custom" incremental strategies by: +:::note limited support + +Custom strategies are not currently supported on the BigQuery and Spark adapters. + +::: + +From dbt v1.2 and onwards, users have an easier alternative to [creating an entirely new materialization](/guides/create-new-materializations). They define and use their own "custom" incremental strategies by: 1. Defining a macro named `get_incremental_STRATEGY_sql`. Note that `STRATEGY` is a placeholder and you should replace it with the name of your custom incremental strategy. 2. Configuring `incremental_strategy: STRATEGY` within an incremental model. @@ -289,6 +295,8 @@ For example, a user-defined strategy named `insert_only` can be defined and used +If you use a custom microbatch macro, set a [`require_batched_execution_for_custom_microbatch_strategy` behavior flag](/reference/global-configs/behavior-changes#custom-microbatch-strategy) in your `dbt_project.yml` to enable batched execution of your custom strategy. + ### Custom strategies from a package To use the `merge_null_safe` custom incremental strategy from the `example` package: diff --git a/website/docs/docs/build/materializations.md b/website/docs/docs/build/materializations.md index 5deb1e7ce92..2ed30c7126a 100644 --- a/website/docs/docs/build/materializations.md +++ b/website/docs/docs/build/materializations.md @@ -18,7 +18,11 @@ You can also configure [custom materializations](/guides/create-new-materializat ## Configuring materializations -By default, dbt models are materialized as "views". Models can be configured with a different materialization by supplying the `materialized` configuration parameter as shown below. +By default, dbt models are materialized as "views". Models can be configured with a different materialization by supplying the [`materialized` configuration](/reference/resource-configs/materialized) parameter as shown in the following tabs. + + + + @@ -49,6 +53,10 @@ models: + + + + Alternatively, materializations can be configured directly inside of the model sql files. This can be useful if you are also setting [Performance Optimization] configs for specific models (for example, [Redshift specific configurations](/reference/resource-configs/redshift-configs) or [BigQuery specific configurations](/reference/resource-configs/bigquery-configs)). @@ -63,6 +71,29 @@ from ... + + + + +Materializations can also be configured in the model's `properties.yml` file. The following example shows the `table` materialization type. For a complete list of materialization types, refer to [materializations](/docs/build/materializations#materializations). + + + +```yaml +version: 2 + +models: + - name: events + config: + materialized: table +``` + + + + + + + ## Materializations @@ -111,7 +142,7 @@ When using the `table` materialization, your model is rebuilt as a + -- MetricFlow [commands](#metricflow-commands) are embedded in the dbt Cloud CLI. This means you can immediately run them once you install the dbt Cloud CLI and don't need to install MetricFlow separately. -- You don't need to manage versioning — your dbt Cloud account will automatically manage the versioning for you. - - +In dbt Cloud, run MetricFlow commands directly in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) or in the [dbt Cloud CLI](/docs/cloud/cloud-cli-installation). - - -:::info -You can create metrics using MetricFlow in the dbt Cloud IDE and run the [dbt sl validate](/docs/build/validation#validations-command) command. Support for running more MetricFlow commands in the IDE will be available soon. -::: +For dbt Cloud CLI users, MetricFlow commands are embedded in the dbt Cloud CLI, which means you can immediately run them once you install the dbt Cloud CLI and don't need to install MetricFlow separately. You don't need to manage versioning because your dbt Cloud account will automatically manage the versioning for you. - - -:::tip Use dbt Cloud CLI for semantic layer development - -You can use the dbt Cloud CLI for the experience in defining and querying metrics in your dbt project. - -A benefit to using the dbt Cloud is that you won't need to manage versioning — your dbt Cloud account will automatically manage the versioning. -::: + You can install [MetricFlow](https://github.com/dbt-labs/metricflow#getting-started) from [PyPI](https://pypi.org/project/dbt-metricflow/). You need to use `pip` to install MetricFlow on Windows or Linux operating systems: + + 1. Create or activate your virtual environment `python -m venv venv` 2. Run `pip install dbt-metricflow` * You can install MetricFlow using PyPI as an extension of your dbt adapter in the command line. To install the adapter, run `python -m pip install "dbt-metricflow[your_adapter_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter run `python -m pip install "dbt-metricflow[snowflake]"` -**Note**, you'll need to manage versioning between dbt Core, your adapter, and MetricFlow. + - + + +1. Create or activate your virtual environment `python -m venv venv` +2. Run `pip install dbt-metricflow` + * You can install MetricFlow using PyPI as an extension of your dbt adapter in the command line. To install the adapter, run `python -m pip install "dbt-metricflow[adapter_package_name]"` and add the adapter name at the end of the command. For example, for a Snowflake adapter run `python -m pip install "dbt-metricflow[dbt-snowflake]"` -
+ + +**Note**, you'll need to manage versioning between dbt Core, your adapter, and MetricFlow. Something to note, MetricFlow `mf` commands return an error if you have a Metafont latex package installed. To run `mf` commands, uninstall the package. + + + ## MetricFlow commands MetricFlow provides the following commands to retrieve metadata and query metrics. - + -You can use the `dbt sl` prefix before the command name to execute them in the dbt Cloud CLI. For example, to list all metrics, run `dbt sl list metrics`. For a complete list of the MetricFlow commands and flags, run the `dbt sl --help` command in your terminal. +You can use the `dbt sl` prefix before the command name to execute them in the dbt Cloud IDE or dbt Cloud CLI. For example, to list all metrics, run `dbt sl list metrics`. + +dbt Cloud CLI users can run `dbt sl --help` in the terminal for a complete list of the MetricFlow commands and flags. + +The following table lists the commands compatible with the dbt Cloud IDE and dbt Cloud CLI: + +|
Command
|
Description
| dbt Cloud IDE | dbt Cloud CLI | +|---------|-------------|---------------|---------------| +| [`list metrics`](#list-metrics) | Lists metrics with dimensions. | ✅ | ✅ | +| [`list dimensions`](#list) | Lists unique dimensions for metrics. | ✅ | ✅ | +| [`list dimension-values`](#list-dimension-values) | List dimensions with metrics. | ✅ | ✅ | +| [`list entities`](#list-entities) | Lists all unique entities. | ✅ | ✅ | +| [`list saved-queries`](#list-saved-queries) | Lists available saved queries. Use the `--show-exports` flag to display each export listed under a saved query or `--show-parameters` to show the full query parameters each saved query uses. | ✅ | ✅ | +| [`query`](#query) | Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to help you get started. | ✅ | ✅ | +| [`validate`](#validate) | Validates semantic model configurations. | ✅ | ✅ | +| [`export`](#export) | Runs exports for a singular saved query for testing and generating exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. | ❌ | ✅ | +| [`export-all`](#export-all) | Runs exports for multiple saved queries at once, saving time and effort. | ❌ | ✅ | -- [`list`](#list) — Retrieves metadata values. -- [`list metrics`](#list-metrics) — Lists metrics with dimensions. -- [`list dimensions`](#list) — Lists unique dimensions for metrics. -- [`list dimension-values`](#list-dimension-values) — List dimensions with metrics. -- [`list entities`](#list-entities) — Lists all unique entities. -- [`list saved-queries`](#list-saved-queries) — Lists available saved queries. Use the `--show-exports` flag to display each export listed under a saved query. -- [`query`](#query) — Query metrics, saved queries, and dimensions you want to see in the command line interface. Refer to [query examples](#query-examples) to help you get started. -- [`export`](#export) — Runs exports for a singular saved query for testing and generating exports in your development environment. You can also use the `--select` flag to specify particular exports from a saved query. -- [`export-all`](#export-all) — Runs exports for multiple saved queries at once, saving time and effort. -- [`validate`](#validate) — Validates semantic model configurations. **Query** ```bash # In dbt Core -mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' +mf query --metrics order_total --group-by order_id__is_food_order --limit 10 --order-by -metric_time --where "is_food_order = True" --start-time '2017-08-22' --end-time '2017-08-27' ``` **Result** @@ -502,8 +496,6 @@ The following tabs present additional query examples, like exporting to a CSV. S - - Add `--compile` (or `--explain` for dbt Core users) to your query to view the SQL generated by MetricFlow. @@ -522,24 +514,24 @@ mf query --metrics order_total --group-by metric_time,is_food_order --limit 10 - ```bash ✔ Success 🦄 - query completed after 0.28 seconds 🔎 SQL (remove --compile to see data or add --show-dataflow-plan to see the generated dataflow plan): -SELECT +select metric_time , is_food_order - , SUM(order_cost) AS order_total -FROM ( - SELECT - cast(ordered_at as date) AS metric_time + , sum(order_cost) as order_total +from ( + select + cast(ordered_at as date) as metric_time , is_food_order , order_cost - FROM ANALYTICS.js_dbt_sl_demo.orders orders_src_1 - WHERE cast(ordered_at as date) BETWEEN CAST('2017-08-22' AS TIMESTAMP) AND CAST('2017-08-27' AS TIMESTAMP) + from analytics.js_dbt_sl_demo.orders orders_src_1 + where cast(ordered_at as date) between cast('2017-08-22' as timestamp) and cast('2017-08-27' as timestamp) ) subq_3 -WHERE is_food_order = True -GROUP BY +where is_food_order = True +group by metric_time , is_food_order -ORDER BY metric_time DESC -LIMIT 10 +order by metric_time desc +limit 10 ``` diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md index 18acf451a12..5499c61a8e4 100644 --- a/website/docs/docs/build/metricflow-time-spine.md +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -1,55 +1,128 @@ --- title: MetricFlow time spine id: metricflow-time-spine -description: "MetricFlow expects a default timespine table called metricflow_time_spine" +description: "MetricFlow expects a default time spine table called metricflow_time_spine" sidebar_label: "MetricFlow time spine" tags: [Metrics, Semantic Layer] --- + -It's common in analytics engineering to have a date dimension or "time spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarter, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain. + -MetricFlow requires you to define a time spine table as a project level configuration, which then is used for various time-based joins and aggregations, like cumulative metrics. At a minimum, you need to define a time spine table for a daily grain. You can optionally define a time spine table for a different granularity, like hourly. +It's common in analytics engineering to have a date dimension or "time spine" table as a base table for different types of time-based joins and aggregations. The structure of this table is typically a base column of daily or hourly dates, with additional columns for other time grains, like fiscal quarters, defined based on the base column. You can join other tables to the time spine on the base column to calculate metrics like revenue at a point in time, or to aggregate to a specific time grain. -If you already have a date dimension or time spine table in your dbt project, you can point MetricFlow to this table by updating the `model` configuration to use this table in the Semantic Layer. For example, given the following directory structure, you can create two time spine configurations, `time_spine_hourly` and `time_spine_daily`. +MetricFlow requires you to define at least one dbt model which provides a time-spine, and then specify (in YAML) the columns to be used for time-based joins. MetricFlow will join against the time-spine model for the following types of metrics and dimensions: -:::tip -Previously, you were required to create a model called `metricflow_time_spine` in your dbt project. This is no longer required. However, you can build your time spine model from this table if you don't have another date dimension table you want to use in your project. +- [Cumulative metrics](/docs/build/cumulative) +- [Metric offsets](/docs/build/derived#derived-metric-offset) +- [Conversion metrics](/docs/build/conversion) +- [Slowly Changing Dimensions](/docs/build/dimensions#scd-type-ii) +- [Metrics](/docs/build/metrics-overview) with the `join_to_timespine` configuration set to true +To see the generated SQL for the metric and dimension types that use time spine joins, refer to the respective documentation or add the `compile=True` flag when querying the Semantic Layer to return the compiled SQL. + +## Configuring time spine in YAML + + Time spine models are normal dbt models with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties. Add the [`models` key](/reference/model-properties) for the time spine in your `models/` directory. If your project already includes a calendar table or date dimension, you can configure that table as a time spine. Otherwise, review the [example time-spine tables](#example-time-spine-tables) to create one. + + Some things to note when configuring time spine models: + +- Add the configurations under the `time_spine` key for that [model's properties](/reference/model-properties), just as you would add a description or tests. +- You only need to configure time-spine models that the Semantic Layer should recognize. +- At a minimum, define a time-spine table for a daily grain. +- You can optionally define additional time-spine tables for different granularities, like hourly. Review the [granularity considerations](#granularity-considerations) when deciding which tables to create. +- If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran) + +:::tip +If you previously used a model called `metricflow_time_spine`, you no longer need to create this specific model. You can now configure MetricFlow to use any date dimension or time spine table already in your project by updating the `model` setting in the Semantic Layer. + +If you don’t have a date dimension table, you can still create one by using the code snippet in the [next section](#creating-a-time-spine-table) to build your time spine model. ::: - +### Creating a time spine table + +MetricFlow supports granularities ranging from milliseconds to years. Refer to the [Dimensions page](/docs/build/dimensions?dimension=time_gran#time) (time_granularity tab) to find the full list of supported granularities. + +To create a time spine table from scratch, you can do so by adding the following code to your dbt project. +This example creates a time spine at an hourly grain and a daily grain: `time_spine_hourly` and `time_spine_daily`. + + + + +```yaml +[models:](/reference/model-properties) +# Hourly time spine + - name: time_spine_hourly + description: my favorite time spine + time_spine: + standard_granularity_column: date_hour # column for the standard grain of your table, must be date time type. + custom_granularities: + - name: fiscal_year + column_name: fiscal_year_column + columns: + - name: date_hour + granularity: hour # set granularity at column-level for standard_granularity_column + +# Daily time spine + - name: time_spine_daily + time_spine: + standard_granularity_column: date_day # column for the standard grain of your table + columns: + - name: date_day + granularity: day # set granularity at column-level for standard_granularity_column +``` + + + + + -Now, break down the configuration above. It's pointing to a model called `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`. +- This example configuration shows a time spine model called `time_spine_hourly` and `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. +- The `standard_granularity_column` is the column that maps to one of our [standard granularities](/docs/build/dimensions?dimension=time_gran). This column must be set under the `columns` key and should have a grain that is finer or equal to any custom granularity columns defined in the same model. + - It needs to reference a column defined under the `columns` key, in this case, `date_hour` and `date_day`, respectively. + - It sets the granularity at the column-level using the `granularity` key, in this case, `hour` and `day`, respectively. +- MetricFlow will use the `standard_granularity_column` as the join key when joining the time spine table to another source table. +- [The `custom_granularities` field](#custom-calendar), (available in dbt Cloud Latest and dbt Core v1.9 and higher) lets you specify non-standard time periods like `fiscal_year` or `retail_month` that your organization may use. +For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. -If you need to create a time spine table from scratch, you can do so by adding the following code to your dbt project. -The example creates a time spine at a daily grain and an hourly grain. A few things to note when creating time spine models: -* MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month. -* You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries. -* We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. i.e., if you have dimensions at an hourly grain, you should have a time spine at an hourly grain. +### Considerations when choosing which granularities to create{#granularity-considerations} - +- MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month. +- You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries. +- We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. For example, if you have dimensions at an hourly grain, you should have a time spine at an hourly grain. - +## Example time spine tables + +### Daily + + ```sql {{ @@ -61,7 +134,7 @@ The example creates a time spine at a daily grain and an hourly grain. A few thi with days as ( {{ - dbt_utils.date_spine( + dbt.date_spine( 'day', "to_date('01/01/2000','mm/dd/yyyy')", "to_date('01/01/2025','mm/dd/yyyy')" @@ -76,14 +149,47 @@ final as ( ) select * from final --- filter the time spine to a specific range where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` - +### Daily (BigQuery) - +Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`: + + + +```sql + +{{config(materialized='table')}} +with days as ( + {{dbt.date_spine( + 'day', + "DATE(2000,01,01)", + "DATE(2025,01,01)" + ) + }} +), + +final as ( + select cast(date_day as date) as date_day + from days +) + +select * +from final +-- filter the time spine to a specific range +where date_day > date_add(DATE(current_timestamp()), INTERVAL -4 YEAR) +and date_day < date_add(DATE(current_timestamp()), INTERVAL 30 DAY) +``` + + + + + +### Hourly + + ```sql {{ @@ -92,11 +198,11 @@ and date_hour < dateadd(day, 30, current_timestamp()) ) }} -with days as ( +with hours as ( {{ dbt.date_spine( - 'day', + 'hour', "to_date('01/01/2000','mm/dd/yyyy')", "to_date('01/01/2025','mm/dd/yyyy')" ) @@ -105,31 +211,51 @@ with days as ( ), final as ( - select cast(date_day as date) as date_day - from days + select cast(date_hour as timestamp) as date_hour + from hours ) select * from final +-- filter the time spine to a specific range where date_day > dateadd(year, -4, current_timestamp()) and date_hour < dateadd(day, 30, current_timestamp()) ``` + + + + + + + + +MetricFlow uses a time spine table to construct cumulative metrics. By default, MetricFlow expects the time spine table to be named `metricflow_time_spine` and doesn't support using a different name. For supported granularities, refer to the [dimensions](/docs/build/dimensions?dimension=time_gran#time) page. + +To create this table, you need to create a model in your dbt project called `metricflow_time_spine` and add the following code: + +### Daily + + -Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`: - - - ```sql -{{config(materialized='table')}} -with days as ( - {{dbt_utils.date_spine( - 'day', - "DATE(2000,01,01)", - "DATE(2025,01,01)" +{{ + config( + materialized = 'table', ) +}} + +with days as ( + + {{ + dbt.date_spine( + 'day', + "to_date('01/01/2000','mm/dd/yyyy')", + "to_date('01/01/2025','mm/dd/yyyy')" + ) }} + ), final as ( @@ -137,21 +263,20 @@ final as ( from days ) -select * -from final --- filter the time spine to a specific range +select * from final where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` + - - +### Daily (BigQuery) + +Use this model if you're using BigQuery. BigQuery supports `DATE()` instead of `TO_DATE()`: ```sql - {{config(materialized='table')}} with days as ( {{dbt.date_spine( @@ -171,43 +296,67 @@ select * from final -- filter the time spine to a specific range where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +and date_day < dateadd(day, 30, current_timestamp()) ``` + + +You only need to include the `date_day` column in the table. MetricFlow can handle broader levels of detail, but finer grains are only supported in versions 1.9 and higher. + - -## Hourly time spine - +## Custom calendar -```sql -{{ - config( - materialized = 'table', - ) -}} + -with hours as ( +The ability to configure custom calendars, such as a fiscal calendar, is available now in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks), and it will be available in [dbt Core v1.9+](/docs/dbt-versions/core-upgrade/upgrading-to-v1.9). - {{ - dbt.date_spine( - 'hour', - "to_date('01/01/2000','mm/dd/yyyy')", - "to_date('01/01/2025','mm/dd/yyyy')" - ) - }} + -), + -final as ( - select cast(date_hour as timestamp) as date_hour - from hours -) +Custom date transformations can be complex, and organizations often have unique needs that can’t be easily generalized. Creating a custom calendar model allows you to define these transformations in SQL, offering more flexibility than native transformations in MetricFlow. This approach lets you map custom columns back to MetricFlow granularities, ensuring consistency while giving you control over the transformations. -select * from final --- filter the time spine to a specific range -where date_day > dateadd(year, -4, current_timestamp()) -and date_hour < dateadd(day, 30, current_timestamp()) +For example, if you use a custom calendar in your organization, such as a fiscal calendar, you can configure it in MetricFlow using its date and time operations. + +- This is useful for calculating metrics based on a custom calendar, such as fiscal quarters or weeks. +- Use the `custom_granularities` key to define a non-standard time period for querying data, such as a `retail_month` or `fiscal_week`, instead of standard options like `day`, `month`, or `year`. +- This feature provides more control over how time-based metrics are calculated. + + + +When working with custom calendars in MetricFlow, it's important to ensure: + +- Consistent data types — Both your dimension column and the time spine column should use the same data type to allow accurate comparisons. Functions like `DATE_TRUNC` don't change the data type of the input in some databases (like Snowflake). Using different data types can lead to mismatches and inaccurate results. + + We recommend using `DATETIME` or `TIMESTAMP` data types for your time dimensions and time spine, as they support all granularities. The `DATE` data type may not support smaller granularities like hours or minutes. + +- Time zones — MetricFlow currently doesn't perform any timezone manipulation. When working with timezone-aware data, inconsistent time zones may lead to unexpected results during aggregations and comparisons. + +For example, if your time spine column is `TIMESTAMP` type and your dimension column is `DATE` type, comparisons between these columns might not work as intended. To fix this, convert your `DATE` column to `TIMESTAMP`, or make sure both columns are the same data type. + + + +### Add custom granularities + +To add custom granularities, the Semantic Layer supports custom calendar configurations that allow users to query data using non-standard time periods like `fiscal_year` or `retail_month`. You can define these custom granularities (all lowercased) by modifying your model's YAML configuration like this: + + + +```yaml +models: + - name: my_time_spine + description: my favorite time spine + time_spine: + standard_granularity_column: date_day + custom_granularities: + - name: fiscal_year + column_name: fiscal_year_column ``` + +#### Coming soon +Note that features like calculating offsets and period-over-period will be supported soon! + + diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index 38b9b22bdb2..e874dced63a 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -15,15 +15,15 @@ This article explains the different supported metric types you can add to your d -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | Provide the reference name for the metric. This name must be a unique metric name and can consist of lowercase letters, numbers, and underscores. | Required | -| `description` | Describe your metric. | Optional | -| `type` | Define the type of metric, which can be `conversion`, `cumulative`, `derived`, `ratio`, or `simple`. | Required | -| `type_params` | Additional parameters used to configure metrics. `type_params` are different for each metric type. | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `config` | Use the [`config`](/reference/resource-properties/config) property to specify configurations for your metric. Supports [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) configurations. | Optional | -| `filter` | You can optionally add a [filter](#filters) string to any metric type, applying filters to dimensions, entities, time dimensions, or other metrics during metric computation. Consider it as your WHERE clause. | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | Provide the reference name for the metric. This name must be a unique metric name and can consist of lowercase letters, numbers, and underscores. | Required | String | +| `description` | Describe your metric. | Optional | String | +| `type` | Define the type of metric, which can be `conversion`, `cumulative`, `derived`, `ratio`, or `simple`. | Required | String | +| `type_params` | Additional parameters used to configure metrics. `type_params` are different for each metric type. | Required | Dict | +| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `config` | Use the [`config`](/reference/resource-properties/config) property to specify configurations for your metric. Supports [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) configurations. | Optional | Dict | +| `filter` | You can optionally add a [filter](#filters) string to any metric type, applying filters to dimensions, entities, time dimensions, or other metrics during metric computation. Consider it as your WHERE clause. | Optional | String | Here's a complete example of the metrics spec configuration: @@ -52,16 +52,16 @@ metrics: -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | Provide the reference name for the metric. This name must be unique amongst all metrics. | Required | -| `description` | Describe your metric. | Optional | -| `type` | Define the type of metric, which can be `simple`, `ratio`, `cumulative`, or `derived`. | Required | -| `type_params` | Additional parameters used to configure metrics. `type_params` are different for each metric type. | Required | -| `config` | Provide the specific configurations for your metric. | Optional | -| `meta` | Use the [`meta` config](/reference/resource-configs/meta) to set metadata for a resource. | Optional | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `filter` | You can optionally add a filter string to any metric type, applying filters to dimensions, entities, or time dimensions during metric computation. Consider it as your WHERE clause. | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | Provide the reference name for the metric. This name must be unique amongst all metrics. | Required | String | +| `description` | Describe your metric. | Optional | String | +| `type` | Define the type of metric, which can be `simple`, `ratio`, `cumulative`, or `derived`. | Required | String | +| `type_params` | Additional parameters used to configure metrics. `type_params` are different for each metric type. | Required | Dict | +| `config` | Provide the specific configurations for your metric. | Optional | Dict | +| `meta` | Use the [`meta` config](/reference/resource-configs/meta) to set metadata for a resource. | Optional | String | +| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `filter` | You can optionally add a filter string to any metric type, applying filters to dimensions, entities, or time dimensions during metric computation. Consider it as your WHERE clause. | Optional | String | Here's a complete example of the metrics spec configuration: @@ -92,7 +92,19 @@ import SLCourses from '/snippets/_sl-course.md'; ## Default granularity for metrics -It's possible to define a default time granularity for metrics if it's different from the granularity of the default aggregation time dimensions (`metric_time`). This is useful if your time dimension has a very fine grain, like second or hour, but you typically query metrics rolled up at a coarser grain. The granularity can be set using the `time_granularity` parameter on the metric, and defaults to `day`. If day is not available because the dimension is defined at a coarser granularity, it will default to the defined granularity for the dimension. + +Default time granularity for metrics is useful if your time dimension has a very fine grain, like second or hour, but you typically query metrics rolled up at a coarser grain. + +Default time granularity for metrics is available now in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks), and it will be available in [dbt Core v1.9+](/docs/dbt-versions/core-upgrade/upgrading-to-v1.9). + + + + + + +It's possible to define a default time granularity for metrics if it's different from the granularity of the default aggregation time dimensions (`metric_time`). This is useful if your time dimension has a very fine grain, like second or hour, but you typically query metrics rolled up at a coarser grain. + +The granularity can be set using the `time_granularity` parameter on the metric, and defaults to `day`. If day is not available because the dimension is defined at a coarser granularity, it will default to the defined granularity for the dimension. ### Example You have a semantic model called `orders` with a time dimension called `order_time`. You want the `orders` metric to roll up to `monthly` by default; however, you want the option to look at these metrics hourly. You can set the `time_granularity` parameter on the `order_time` dimension to `hour`, and then set the `time_granularity` parameter in the metric to `month`. @@ -117,6 +129,7 @@ semantic_models: name: orders time_granularity: month -- Optional, defaults to day ``` + ## Conversion metrics @@ -258,9 +271,9 @@ metrics: measure: name: cancellations_usd # Specify the measure you are creating a proxy for. fill_nulls_with: 0 + join_to_timespine: true filter: | {{ Dimension('order__value')}} > 100 and {{Dimension('user__acquisition')}} is not null - join_to_timespine: true ``` @@ -270,6 +283,8 @@ A filter is configured using Jinja templating. Use the following syntax to refer Refer to [Metrics as dimensions](/docs/build/ref-metrics-in-filters) for details on how to use metrics as dimensions with metric filters: + + ```yaml @@ -283,10 +298,30 @@ filter: | {{ TimeDimension('time_dimension', 'granularity') }} filter: | - {{ Metric('metric_name', group_by=['entity_name']) }} # Available in v1.8 or with [versionless (/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) dbt Cloud. + {{ Metric('metric_name', group_by=['entity_name']) }} + ``` + + + + + + + +```yaml +filter: | + {{ Entity('entity_name') }} + +filter: | + {{ Dimension('primary_entity__dimension_name') }} + +filter: | + {{ TimeDimension('time_dimension', 'granularity') }} + +``` + For example, if you want to filter for the order date dimension grouped by month, use the following syntax: diff --git a/website/docs/docs/build/packages.md b/website/docs/docs/build/packages.md index 0b69d10cee6..7a2c08d3e70 100644 --- a/website/docs/docs/build/packages.md +++ b/website/docs/docs/build/packages.md @@ -20,9 +20,10 @@ In dbt, libraries like these are called _packages_. dbt's packages are so powerf * Models to understand [Redshift](https://hub.getdbt.com/dbt-labs/redshift/latest/) privileges. * Macros to work with data loaded by [Stitch](https://hub.getdbt.com/dbt-labs/stitch_utils/latest/). -dbt _packages_ are in fact standalone dbt projects, with models and macros that tackle a specific problem area. As a dbt user, by adding a package to your project, the package's models and macros will become part of your own project. This means: +dbt _packages_ are in fact standalone dbt projects, with models, macros, and other resources that tackle a specific problem area. As a dbt user, by adding a package to your project, all of the package's resources will become part of your own project. This means: * Models in the package will be materialized when you `dbt run`. * You can use `ref` in your own models to refer to models from the package. +* You can use `source` to refer to sources in the package. * You can use macros in the package in your own project. * It's important to note that defining and installing dbt packages is different from [defining and installing Python packages](/docs/build/python-models#using-pypi-packages) @@ -82,11 +83,7 @@ packages: version: [">=0.7.0", "<0.8.0"] ``` - - -Beginning in v1.7, `dbt deps` "pins" each package by default. See ["Pinning packages"](#pinning-packages) for details. - - +`dbt deps` "pins" each package by default. See ["Pinning packages"](#pinning-packages) for details. Where possible, we recommend installing packages via dbt Hub, since this allows dbt to handle duplicate dependencies. This is helpful in situations such as: * Your project uses both the dbt-utils and Snowplow packages, and the Snowplow package _also_ uses the dbt-utils package. @@ -145,18 +142,8 @@ packages: revision: 4e28d6da126e2940d17f697de783a717f2503188 ``` - - -We **strongly recommend** ["pinning" your packages](#pinning-packages) to a specific release by specifying a release name. - - - - - By default, `dbt deps` "pins" each package. See ["Pinning packages"](#pinning-packages) for details. - - ### Internally hosted tarball URL Some organizations have security requirements to pull resources only from internal services. To address the need to install packages from hosted environments such as Artifactory or cloud storage buckets, dbt Core enables you to install packages from internally-hosted tarball URLs. @@ -170,12 +157,60 @@ packages: Where `name: 'dbt_utils'` specifies the subfolder of `dbt_packages` that's created for the package source code to be installed within. -### Private packages +## Private packages + +### Native private packages + +dbt Cloud supports private packages from [supported](#prerequisites) Git repos leveraging an existing [configuration](/docs/cloud/git/git-configuration-in-dbt-cloud) in your environment. Previously, you had to configure a [token](#git-token-method) to retrieve packages from your private repos. + +#### Prerequisites + +To use native private packages, you must have one of the following Git providers configured in the **Integrations** section of your **Account settings**: +- [GitHub](/docs/cloud/git/connect-github) +- [Azure DevOps](/docs/cloud/git/connect-azure-devops) +- Support for GitLab is coming soon. + + +#### Configuration + +Use the `private` key in your `packages.yml` or `dependencies.yml` to clone package repos using your existing dbt Cloud Git integration without having to provision an access token or create a dbt Cloud environment variable: + + + +```yaml +packages: + - private: dbt-labs/awesome_repo + - package: normal packages + + [...] +``` + + + +You can pin private packages similar to regular dbt packages: -#### SSH Key Method (Command Line only) +```yaml +packages: + - private: dbt-labs/awesome_repo + revision: "0.9.5" # Pin to a tag, branch, or complete 40-character commit hash + +``` + +If you are using multiple Git integrations, disambiguate by adding the provider key: + +```yaml +packages: + - private: dbt-labs/awesome_repo + provider: "github" # GitHub and Azure are currently supported. GitLab is coming soon. + +``` + +With this method, you can retrieve private packages from an integrated Git provider without any additional steps to connect. + +### SSH key method (command line only) If you're using the Command Line, private packages can be cloned via SSH and an SSH key. -When you use SSH keys to authenticate to your git remote server, you don’t need to supply your username and password each time. Read more about SSH keys, how to generate them, and how to add them to your git provider here: [Github](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh) and [GitLab](https://docs.gitlab.com/ee/ssh/). +When you use SSH keys to authenticate to your git remote server, you don’t need to supply your username and password each time. Read more about SSH keys, how to generate them, and how to add them to your git provider here: [Github](https://docs.github.com/en/github/authenticating-to-github/connecting-to-github-with-ssh) and [GitLab](https://docs.gitlab.com/ee/user/ssh.html). @@ -190,7 +225,14 @@ packages: If you're using dbt Cloud, the SSH key method will not work, but you can use the [HTTPS Git Token Method](https://docs.getdbt.com/docs/build/packages#git-token-method). -#### Git token method +### Git token method + +:::note + +dbt Cloud has [native support](#native-private-packages) for Git hosted private packages with GitHub and Azure DevOps (GitLab coming soon). If you are using a supported [integrated Git environment](/docs/cloud/git/git-configuration-in-dbt-cloud), you no longer need to configure Git tokens to retrieve private packages. + +::: + This method allows the user to clone via HTTPS by passing in a git token via an environment variable. Be careful of the expiration date of any token you use, as an expired token could cause a scheduled run to fail. Additionally, user tokens can create a challenge if the user ever loses access to a specific repo. @@ -259,7 +301,7 @@ Read more about creating a Personal Access Token [here](https://confluence.atlas -#### Configure subdirectory for packaged projects +## Configure subdirectory for packaged projects In general, dbt expects `dbt_project.yml` to be located as a top-level file in a package. If the packaged project is instead nested in a subdirectory—perhaps within a much larger mono repo—you can optionally specify the folder path as `subdirectory`. dbt will attempt a [sparse checkout](https://git-scm.com/docs/git-sparse-checkout) of just the files located within that subdirectory. Note that you must be using a recent version of `git` (`>=2.26.0`). @@ -318,18 +360,6 @@ When you remove a package from your `packages.yml` file, it isn't automatically ### Pinning packages - - -We **strongly recommend** "pinning" your package to a specific release by specifying a tagged release name or a specific commit hash. - -If you do not provide a revision, or if you use the main branch, then any updates to the package will be incorporated into your project the next time you run `dbt deps`. While we generally try to avoid making breaking changes to these packages, they are sometimes unavoidable. Pinning a package revision helps prevent your code from changing without your explicit approval. - -To find the latest release for a package, navigate to the `Releases` tab in the relevant GitHub repository. For example, you can find all of the releases for the dbt-utils package [here](https://github.com/dbt-labs/dbt-utils/releases). - - - - - Beginning with v1.7, running [`dbt deps`](/reference/commands/deps) "pins" each package by creating or updating the `package-lock.yml` file in the _project_root_ where `packages.yml` is recorded. - The `package-lock.yml` file contains a record of all packages installed. @@ -337,8 +367,6 @@ Beginning with v1.7, running [`dbt deps`](/reference/commands/deps) "pins" each For example, if you use a branch name, the `package-lock.yml` file pins to the head commit. If you use a version range, it pins to the latest release. In either case, subsequent commits or versions will **not** be installed. To get new commits or versions, run `dbt deps --upgrade` or add `package-lock.yml` to your .gitignore file. - - As of v0.14.0, dbt will warn you if you install a package using the `git` syntax without specifying a revision (see below). ### Configuring packages diff --git a/website/docs/docs/build/projects.md b/website/docs/docs/build/projects.md index a65d4773ac6..4732dbe6da7 100644 --- a/website/docs/docs/build/projects.md +++ b/website/docs/docs/build/projects.md @@ -22,6 +22,8 @@ At a minimum, all a project needs is the `dbt_project.yml` project configuration | [metrics](/docs/build/build-metrics-intro) | A way for you to define metrics for your project. | | [groups](/docs/build/groups) | Groups enable collaborative node organization in restricted collections. | | [analysis](/docs/build/analyses) | A way to organize analytical SQL queries in your project such as the general ledger from your QuickBooks. | +| [semantic models](/docs/build/semantic-models) | Semantic models define the foundational data relationships in [MetricFlow](/docs/build/about-metricflow) and the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl), enabling you to query metrics using a semantic graph. | +| [saved queries](/docs/build/saved-queries) | Saved queries organize reusable queries by grouping metrics, dimensions, and filters into nodes visible in the dbt DAG. | When building out the structure of your project, you should consider these impacts on your organization's workflow: diff --git a/website/docs/docs/build/python-models.md b/website/docs/docs/build/python-models.md index 09ed7a1c881..575757ab4a1 100644 --- a/website/docs/docs/build/python-models.md +++ b/website/docs/docs/build/python-models.md @@ -598,6 +598,34 @@ Python models have capabilities that SQL models do not. They also have some draw - **These capabilities are very new.** As data warehouses develop new features, we expect them to offer cheaper, faster, and more intuitive mechanisms for deploying Python transformations. **We reserve the right to change the underlying implementation for executing Python models in future releases.** Our commitment to you is around the code in your model `.py` files, following the documented capabilities and guidance we're providing here. - **Lack of `print()` support.** The data platform runs and compiles your Python model without dbt's oversight. This means it doesn't display the output of commands such as Python's built-in [`print()`](https://docs.python.org/3/library/functions.html#print) function in dbt's logs. +- + + The following explains other methods you can use for debugging, such as writing messages to a dataframe column: + + - Using platform logs: Use your data platform's logs to debug your Python models. + - Return logs as a dataframe: Create a dataframe containing your logs and build it into the warehouse. + - Develop locally with DuckDB: Test and debug your models locally using DuckDB before deploying them. + + Here's an example of debugging in a Python model: + + ```python + def model(dbt, session): + dbt.config( + materialized = "table" + ) + + df = dbt.ref("my_source_table").df() + + # One option for debugging: write messages to temporary table column + # Pros: visibility + # Cons: won't work if table isn't building for some reason + msg = "something" + df["debugging"] = f"My debug message here: {msg}" + + return df + ``` + + As a general rule, if there's a transformation you could write equally well in SQL or Python, we believe that well-written SQL is preferable: it's more accessible to a greater number of colleagues, and it's easier to write code that's performant at scale. If there's a transformation you _can't_ write in SQL, or where ten lines of elegant and well-annotated Python could save you 1000 lines of hard-to-read Jinja-SQL, Python is the way to go. ## Specific data platforms {#specific-data-platforms} @@ -613,7 +641,8 @@ In their initial launch, Python models are supported on three of the most popula **Installing packages:** Snowpark supports several popular packages via Anaconda. Refer to the [complete list](https://repo.anaconda.com/pkgs/snowflake/) for more details. Packages are installed when your model is run. Different models can have different package dependencies. If you use third-party packages, Snowflake recommends using a dedicated virtual warehouse for best performance rather than one with many concurrent users. **Python version:** To specify a different python version, use the following configuration: -``` + +```python def model(dbt, session): dbt.config( materialized = "table", @@ -625,7 +654,7 @@ def model(dbt, session): **External access integrations and secrets**: To query external APIs within dbt Python models, use Snowflake’s [external access](https://docs.snowflake.com/en/developer-guide/external-network-access/external-network-access-overview) together with [secrets](https://docs.snowflake.com/en/developer-guide/external-network-access/secret-api-reference). Here are some additional configurations you can use: -``` +```python import pandas import snowflake.snowpark as snowpark @@ -645,20 +674,43 @@ def model(dbt, session: snowpark.Session): -**About "sprocs":** dbt submits Python models to run as _stored procedures_, which some people call _sprocs_ for short. By default, dbt will create a named sproc containing your model's compiled Python code, and then _call_ it to execute. Snowpark has an Open Preview feature for _temporary_ or _anonymous_ stored procedures ([docs](https://docs.snowflake.com/en/sql-reference/sql/call-with.html)), which are faster and leave a cleaner query history. You can switch this feature on for your models by configuring `use_anonymous_sproc: True`. We plan to switch this on for all dbt + Snowpark Python models starting with the release of dbt Core version 1.4. +**About "sprocs":** dbt submits Python models to run as _stored procedures_, which some people call _sprocs_ for short. By default, dbt will use Snowpark's _temporary_ or _anonymous_ stored procedures ([docs](https://docs.snowflake.com/en/sql-reference/sql/call-with.html)), which are faster and keep query history cleaner than named sprocs containing your model's compiled Python code. To disable this feature, set `use_anonymous_sproc: False` in your model configuration. - +**Docs:** ["Developer Guide: Snowpark Python"](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html) + +#### Third-party Snowflake packages + +To use a third-party Snowflake package that isn't available in Snowflake Anaconda, upload your package by following [this example](https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages#importing-packages-through-a-snowflake-stage), and then configure the `imports` setting in the dbt Python model to reference to the zip file in your Snowflake staging. + +Here’s a complete example configuration using a zip file, including using `imports` in a Python model: + +```python + +def model(dbt, session): + # Configure the model + dbt.config( + materialized="table", + imports=["@mystage/mycustompackage.zip"], # Specify the external package location + ) + + # Example data transformation using the imported package + # (Assuming `some_external_package` has a function we can call) + data = { + "name": ["Alice", "Bob", "Charlie"], + "score": [85, 90, 88] + } + df = pd.DataFrame(data) + + # Process data with the external package + df["adjusted_score"] = df["score"].apply(lambda x: some_external_package.adjust_score(x)) + + # Return the DataFrame as the model output + return df -```yml -# I asked Snowflake Support to enable this Private Preview feature, -# and now my dbt-py models run even faster! -models: - use_anonymous_sproc: True ``` - +For more information on using this configuration, refer to [Snowflake's documentation](https://community.snowflake.com/s/article/how-to-use-other-python-packages-in-snowpark) on uploading and using other python packages in Snowpark not published on Snowflake's Anaconda channel. -**Docs:** ["Developer Guide: Snowpark Python"](https://docs.snowflake.com/en/developer-guide/snowpark/python/index.html) diff --git a/website/docs/docs/build/ratio-metrics.md b/website/docs/docs/build/ratio-metrics.md index cc1d13b7835..a34dec29d71 100644 --- a/website/docs/docs/build/ratio-metrics.md +++ b/website/docs/docs/build/ratio-metrics.md @@ -10,20 +10,22 @@ Ratio allows you to create a ratio between two metrics. You simply specify a num The parameters, description, and type for ratio metrics are: -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | The name of the metric. | Required | -| `description` | The description of the metric. | Optional | -| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `type_params` | The type parameters of the metric. | Required | -| `numerator` | The name of the metric used for the numerator, or structure of properties. | Required | -| `denominator` | The name of the metric used for the denominator, or structure of properties. | Required | -| `filter` | Optional filter for the numerator or denominator. | Optional | -| `alias` | Optional alias for the numerator or denominator. | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | The name of the metric. | Required | String | +| `description` | The description of the metric. | Optional | String | +| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | String | +| `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `type_params` | The type parameters of the metric. | Required | Dict | +| `numerator` | The name of the metric used for the numerator, or structure of properties. | Required | String or dict | +| `denominator` | The name of the metric used for the denominator, or structure of properties. | Required | String or dict | +| `filter` | Optional filter for the numerator or denominator. | Optional | String | +| `alias` | Optional alias for the numerator or denominator. | Optional | String | The following displays the complete specification for ratio metrics, along with an example. + + ```yaml metrics: - name: The metric name # Required @@ -40,11 +42,19 @@ metrics: filter: Filter for the denominator # Optional alias: Alias for the denominator # Optional ``` + For advanced data modeling, you can use `fill_nulls_with` and `join_to_timespine` to [set null metric values to zero](/docs/build/fill-nulls-advanced), ensuring numeric values for every data row. ## Ratio metrics example +These examples demonstrate how to create ratio metrics in your model. They cover basic and advanced use cases, including applying filters to the numerator and denominator metrics. + +#### Example 1 +This example is a basic ratio metric that calculates the ratio of food orders to total orders: + + + ```yaml metrics: - name: food_order_pct @@ -55,6 +65,30 @@ metrics: numerator: food_orders denominator: orders ``` + + +#### Example 2 +This example is a ratio metric that calculates the ratio of food orders to total orders, with a filter and alias applied to the numerator. Note that in order to add these attributes, you'll need to use an explicit key for the name attribute too. + + + +```yaml +metrics: + - name: food_order_pct + description: "The food order count as a ratio of the total order count, filtered by location" + label: Food order ratio by location + type: ratio + type_params: + numerator: + name: food_orders + filter: location = 'New York' + alias: ny_food_orders + denominator: + name: orders + filter: location = 'New York' + alias: ny_orders +``` + ## Ratio metrics using different semantic models @@ -109,6 +143,8 @@ on Users can define constraints on input metrics for a ratio metric by applying a filter directly to the input metric, like so: + + ```yaml metrics: - name: frequent_purchaser_ratio @@ -123,6 +159,7 @@ metrics: denominator: name: distinct_purchasers ``` + Note the `filter` and `alias` parameters for the metric referenced in the numerator. - Use the `filter` parameter to apply a filter to the metric it's attached to. diff --git a/website/docs/docs/build/saved-queries.md b/website/docs/docs/build/saved-queries.md index e9beffca15f..ed56d13dcc9 100644 --- a/website/docs/docs/build/saved-queries.md +++ b/website/docs/docs/build/saved-queries.md @@ -102,24 +102,7 @@ saved_queries: ``` -Note, you can set `export_as` to both the saved query and the exports [config](/reference/resource-properties/config), with the exports config value taking precedence. If a key isn't set in the exports config, it will inherit the saved query config value. - -#### Project-level saved queries - -To enable saved queries at the project level, you can set the `saved-queries` configuration in the [`dbt_project.yml` file](/reference/dbt_project.yml). This saves you time in configuring saved queries in each file: - - - -```yaml -saved-queries: - my_saved_query: - config: - +cache: - enabled: true -``` - - -For more information on `dbt_project.yml` and config naming conventions, see the [dbt_project.yml reference page](/reference/dbt_project.yml#naming-convention). +Note, that you can set `export_as` to both the saved query and the exports [config](/reference/resource-properties/config), with the exports config value taking precedence. If a key isn't set in the exports config, it will inherit the saved query config value. #### Where clause @@ -171,6 +154,22 @@ saved_queries: +#### Project-level saved queries + +To enable saved queries at the project level, you can set the `saved-queries` configuration in the [`dbt_project.yml` file](/reference/dbt_project.yml). This saves you time in configuring saved queries in each file: + + + +```yaml +saved-queries: + my_saved_query: + +cache: + enabled: true +``` + + +For more information on `dbt_project.yml` and config naming conventions, see the [dbt_project.yml reference page](/reference/dbt_project.yml#naming-convention). + To build `saved_queries`, use the [`--resource-type` flag](/reference/global-configs/resource-type) and run the command `dbt build --resource-type saved_query`. ## Configure exports diff --git a/website/docs/docs/build/semantic-models.md b/website/docs/docs/build/semantic-models.md index d683d7cd020..4edc6b0a422 100644 --- a/website/docs/docs/build/semantic-models.md +++ b/website/docs/docs/build/semantic-models.md @@ -9,6 +9,10 @@ tags: [Metrics, Semantic Layer] pagination_next: "docs/build/dimensions" --- +import CopilotBeta from '/snippets/_dbt-copilot-avail.md'; + + + Semantic models are the foundation for data definition in MetricFlow, which powers the dbt Semantic Layer: - Think of semantic models as nodes connected by entities in a semantic graph. @@ -26,18 +30,18 @@ import SLCourses from '/snippets/\_sl-course.md'; Here we describe the Semantic model components with examples: -| Component | Description | Type | -| --------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | -| [Name](#name) | Choose a unique name for the semantic model. Avoid using double underscores (\_\_) in the name as they're not supported. | Required | -| [Description](#description) | Includes important details in the description | Optional | -| [Model](#model) | Specifies the dbt model for the semantic model using the `ref` function | Required | -| [Defaults](#defaults) | The defaults for the model, currently only `agg_time_dimension` is supported. | Required | -| [Entities](#entities) | Uses the columns from entities as join keys and indicate their type as primary, foreign, or unique keys with the `type` parameter | Required | -| [Primary Entity](#primary-entity) | If a primary entity exists, this component is Optional. If the semantic model has no primary entity, then this property is required. | Optional | -| [Dimensions](#dimensions) | Different ways to group or slice data for a metric, they can be `time` or `categorical` | Required | -| [Measures](#measures) | Aggregations applied to columns in your data model. They can be the final metric or used as building blocks for more complex metrics | Optional | -| Label | The display name for your semantic model `node`, `dimension`, `entity`, and/or `measures` | Optional | -| `config` | Use the [`config`](/reference/resource-properties/config) property to specify configurations for your metric. Supports [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) configs. | Optional | +| Component | Description | Required | Type | +| ------------ | ---------------- | -------- | -------- | +| [Name](#name) | Choose a unique name for the semantic model. Avoid using double underscores (\_\_) in the name as they're not supported. | Required | String | +| [Description](#description) | Includes important details in the description. | Optional | String | +| [Model](#model) | Specifies the dbt model for the semantic model using the `ref` function. | Required | String | +| [Defaults](#defaults) | The defaults for the model, currently only `agg_time_dimension` is supported. | Required | Dict | +| [Entities](#entities) | Uses the columns from entities as join keys and indicate their type as primary, foreign, or unique keys with the `type` parameter. | Required | List | +| [Primary Entity](#primary-entity) | If a primary entity exists, this component is Optional. If the semantic model has no primary entity, then this property is required. | Optional | String | +| [Dimensions](#dimensions) | Different ways to group or slice data for a metric, they can be `time` or `categorical`. | Required | List | +| [Measures](#measures) | Aggregations applied to columns in your data model. They can be the final metric or used as building blocks for more complex metrics. | Optional | List | +| [Label](#label) | The display name for your semantic model `node`, `dimension`, `entity`, and/or `measures`. | Optional | String | +| `config` | Use the [`config`](/reference/resource-properties/config) property to specify configurations for your metric. Supports [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) configs. | Optional | Dict | ## Semantic models components @@ -119,8 +123,6 @@ semantic_models: type: categorical ``` - - Semantic models support [`meta`](/reference/resource-configs/meta), [`group`](/reference/resource-configs/group), and [`enabled`](/reference/resource-configs/enabled) [`config`](/reference/resource-properties/config) property in either the schema file or at the project level: - Semantic model config in `models/semantic.yml`: @@ -148,8 +150,6 @@ Semantic models support [`meta`](/reference/resource-configs/meta), [`group`](/r For more information on `dbt_project.yml` and config naming conventions, see the [dbt_project.yml reference page](/reference/dbt_project.yml#naming-convention). - - ### Name Define the name of the semantic model. You must define a unique name for the semantic model. The semantic graph will use this name to identify the model, and you can update it at any time. Avoid using double underscores (\_\_) in the name as they're not supported. diff --git a/website/docs/docs/build/simple.md b/website/docs/docs/build/simple.md index f57d498d290..2deb718d780 100644 --- a/website/docs/docs/build/simple.md +++ b/website/docs/docs/build/simple.md @@ -15,17 +15,19 @@ Simple metrics are metrics that directly reference a single measure, without any Note that we use the double colon (::) to indicate whether a parameter is nested within another parameter. So for example, `query_params::metrics` means the `metrics` parameter is nested under `query_params`. ::: -| Parameter | Description | Type | -| --------- | ----------- | ---- | -| `name` | The name of the metric. | Required | -| `description` | The description of the metric. | Optional | -| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | -| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | -| `type_params` | The type parameters of the metric. | Required | -| `measure` | A list of measure inputs | Required | -| `measure:name` | The measure you're referencing. | Required | -| `measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | -| `measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | +| Parameter | Description | Required | Type | +| --------- | ----------- | ---- | ---- | +| `name` | The name of the metric. | Required | String | +| `description` | The description of the metric. | Optional | String | +| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required | String | +| `label` | Defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required | String | +| `type_params` | The type parameters of the metric. | Required | Dict | +| `measure` | A list of measure inputs. | Required | List | +| `measure:name` | The measure you're referencing. | Required | String | +| `measure:alias` | Optional [`alias`](/reference/resource-configs/alias) to rename the measure. | Optional | String | +| `measure:filter` | Optional `filter` applied to the measure. | Optional | String | +| `measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero). | Optional | String | +| `measure:join_to_timespine` | Indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional | Boolean | The following displays the complete specification for simple metrics, along with an example. @@ -38,6 +40,8 @@ metrics: type_params: # Required measure: name: The name of your measure # Required + alias: The alias applied to the measure. # Optional + filter: The filter applied to the measure. # Optional fill_nulls_with: Set value instead of null (such as zero) # Optional join_to_timespine: true/false # Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. # Optional @@ -65,9 +69,11 @@ If you've already defined the measure using the `create_metric: true` parameter, name: customers # The measure you are creating a proxy of. fill_nulls_with: 0 join_to_timespine: true + alias: customer_count + filter: {{ Dimension('customer__customer_total') }} >= 20 - name: large_orders description: "Order with order values over 20." - type: SIMPLE + type: simple label: Large orders type_params: measure: diff --git a/website/docs/docs/build/snapshots.md b/website/docs/docs/build/snapshots.md index 82b5104fcef..f72f1eb75de 100644 --- a/website/docs/docs/build/snapshots.md +++ b/website/docs/docs/build/snapshots.md @@ -10,223 +10,225 @@ id: "snapshots" * [Snapshot properties](/reference/snapshot-properties) * [`snapshot` command](/reference/commands/snapshot) - -### What are snapshots? +## What are snapshots? Analysts often need to "look back in time" at previous data states in their mutable tables. While some source data systems are built in a way that makes accessing historical data possible, this is not always the case. dbt provides a mechanism, **snapshots**, which records changes to a mutable over time. Snapshots implement [type-2 Slowly Changing Dimensions](https://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_2:_add_new_row) over mutable source tables. These Slowly Changing Dimensions (or SCDs) identify how a row in a table changes over time. Imagine you have an `orders` table where the `status` field can be overwritten as the order is processed. | id | status | updated_at | | -- | ------ | ---------- | -| 1 | pending | 2019-01-01 | +| 1 | pending | 2024-01-01 | Now, imagine that the order goes from "pending" to "shipped". That same record will now look like: | id | status | updated_at | | -- | ------ | ---------- | -| 1 | shipped | 2019-01-02 | +| 1 | shipped | 2024-01-02 | This order is now in the "shipped" state, but we've lost the information about when the order was last in the "pending" state. This makes it difficult (or impossible) to analyze how long it took for an order to ship. dbt can "snapshot" these changes to help you understand how values in a row change over time. Here's an example of a snapshot table for the previous example: | id | status | updated_at | dbt_valid_from | dbt_valid_to | | -- | ------ | ---------- | -------------- | ------------ | -| 1 | pending | 2019-01-01 | 2019-01-01 | 2019-01-02 | -| 1 | shipped | 2019-01-02 | 2019-01-02 | `null` | - -In dbt, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. - - - - - -```sql -{% snapshot orders_snapshot %} +| 1 | pending | 2024-01-01 | 2024-01-01 | 2024-01-02 | +| 1 | shipped | 2024-01-02 | 2024-01-02 | `null` | -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} +## Configuring snapshots -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` +- To configure snapshots in versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). These versions use an older syntax where snapshots are defined within a snapshot block in a `.sql` file, typically located in your `snapshots` directory. +- Note that defining multiple resources in a single file can significantly slow down parsing and compilation. For faster and more efficient management, consider the updated snapshot YAML syntax, [available now in the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or [dbt Core v1.9 and later](/docs/dbt-versions/core). + - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). - - - -```sql -{% snapshot orders_snapshot %} - -{{ - config( - unique_key='id', - schema='snapshots', - strategy='timestamp', - updated_at='updated_at', - ) -}} - -select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} +Configure your snapshots in YAML files to tell dbt how to detect record changes. Define snapshots configurations in YAML files, alongside your models, for a cleaner, faster, and more consistent set up. + + + +```yaml +snapshots: + - name: string + relation: relation # source('my_source', 'my_table') or ref('my_model') + [description](/reference/resource-properties/description): markdown_string + config: + [database](/reference/resource-configs/database): string + [schema](/reference/resource-configs/schema): string + [alias](/reference/resource-configs/alias): string + [strategy](/reference/resource-configs/strategy): timestamp | check + [unique_key](/reference/resource-configs/unique_key): column_name_or_expression + [check_cols](/reference/resource-configs/check_cols): [column_name] | all + [updated_at](/reference/resource-configs/updated_at): column_name + [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names): dictionary + [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current): string + [hard_deletes](/reference/resource-configs/hard-deletes): ignore | invalidate | new_record ``` - +The following table outlines the configurations available for snapshots: -:::info Preview or Compile Snapshots in IDE +| Config | Description | Required? | Example | +| ------ | ----------- | --------- | ------- | +| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | +| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | +| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | +| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | +| [unique_key](/reference/resource-configs/unique_key) | A column(s) (string or array) or expression for the record | Yes | `id` or `[order_id, product_id]` | +| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | +| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | +| [dbt_valid_to_current](/reference/resource-configs/dbt_valid_to_current) | Set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date). By default, this value is `NULL`. When configured, dbt will use the specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table.| No | string | +| [snapshot_meta_column_names](/reference/resource-configs/snapshot_meta_column_names) | Customize the names of the snapshot meta fields | No | dictionary | +| [hard_deletes](/reference/resource-configs/hard-deletes) | Specify how to handle deleted rows from the source. Supported options are `ignore` (default), `invalidate` (replaces the legacy `invalidate_hard_deletes=true`), and `new_record`.| No | string | -It is not possible to "preview data" or "compile sql" for snapshots in dbt Cloud. Instead, run the `dbt snapshot` command in the IDE by completing the following steps. -::: +- In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot across users and environment. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, `target_schema` became optional, allowing snapshots to be environment-aware. By default, without `target_schema` or `target_database` defined, snapshots now use the `generate_schema_name` or `generate_database_name` macros to determine where to build. Developers can still set a custom location with [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, consistent with other resource types. +- A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). +- You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). -When you run the [`dbt snapshot` command](/reference/commands/snapshot): -* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null`. -* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: - - The `dbt_valid_to` column will be updated for any existing records that have changed - - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` +### Add a snapshot to your project -Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. +To add a snapshot to your project follow these steps. For users on versions 1.8 and earlier, refer to [Configure snapshots in versions 1.8 and earlier](#configure-snapshots-in-versions-18-and-earlier). -## Example +1. Create a YAML file in your `snapshots` directory: `snapshots/orders_snapshot.yml` and add your configuration details. You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). -To add a snapshot to your project: + -1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` -2. Use a `snapshot` block to define the start and end of a snapshot: + ```yaml + snapshots: + - name: orders_snapshot + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + database: analytics + unique_key: id + strategy: timestamp + updated_at: updated_at + dbt_valid_to_current: "to_date('9999-12-31')" # Specifies that current records should have `dbt_valid_to` set to `'9999-12-31'` instead of `NULL`. - + ``` + -```sql -{% snapshot orders_snapshot %} +2. Since snapshots focus on configuration, the transformation logic is minimal. Typically, you'd select all data from the source. If you need to apply transformations (like filters, deduplication), it's best practice to define an ephemeral model and reference it in your snapshot configuration. -{% endsnapshot %} -``` + - + ```yaml + {{ config(materialized='ephemeral') }} -3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. + select * from {{ source('jaffle_shop', 'orders') }} + ``` + - +3. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. -```sql -{% snapshot orders_snapshot %} +4. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. -select * from {{ source('jaffle_shop', 'orders') }} + ``` + $ dbt snapshot + Running with dbt=1.9.0 -{% endsnapshot %} -``` + 15:07:36 | Concurrency: 8 threads (target='dev') + 15:07:36 | + 15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] + 15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] + 15:07:36 | + 15:07:36 | Finished running 1 snapshots in 0.68s. - + Completed successfully -4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. + Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 + ``` -5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). +5. Inspect the results by selecting from the table dbt created (`analytics.snapshots.orders_snapshot`). After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described later on. - +6. Run the `dbt snapshot` command again and inspect the results. If any records have been updated, the snapshot should reflect this. - +7. Select from the `snapshot` in downstream models using the `ref` function. -```sql -{% snapshot orders_snapshot %} + -{{ - config( - target_database='analytics', - target_schema='snapshots', - unique_key='id', + ```sql + select * from {{ ref('orders_snapshot') }} + ``` + - strategy='timestamp', - updated_at='updated_at', - ) -}} +8. Snapshots are only useful if you run them frequently — schedule the `dbt snapshot` command to run regularly. -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` +### Configuration best practices - + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. +This strategy handles column additions and deletions better than the `check` strategy. - + - - + -```sql -{% snapshot orders_snapshot %} +By default, `dbt_valid_to` is `NULL` for current records. However, if you set the [`dbt_valid_to_current` configuration](/reference/resource-configs/dbt_valid_to_current) (available in dbt Core v1.9+), `dbt_valid_to` will be set to your specified value (such as `9999-12-31`) for current records. -{{ - config( - schema='snapshots', - unique_key='id', - strategy='timestamp', - updated_at='updated_at', - ) -}} +This allows for straightforward date range filtering. -select * from {{ source('jaffle_shop', 'orders') }} + -{% endsnapshot %} -``` + - +The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). + -6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. + + + +Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. + + -``` -$ dbt snapshot -Running with dbt=1.8.0 + -15:07:36 | Concurrency: 8 threads (target='dev') -15:07:36 | -15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] -15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] -15:07:36 | -15:07:36 | Finished running 1 snapshots in 0.68s. + -Completed successfully +Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. -Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 -``` + -7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described below. + -8. Run the `snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. + If you need to clean or transform your data before snapshotting, create an ephemeral model or a staging model that applies the necessary transformations. Then, reference this model in your snapshot configuration. This approach keeps your snapshot definitions clean and allows you to test and run transformations separately. -9. Select from the `snapshot` in downstream models using the `ref` function. + + - +### How snapshots work -```sql -select * from {{ ref('orders_snapshot') }} -``` +When you run the [`dbt snapshot` command](/reference/commands/snapshot): +* **On the first run:** dbt will create the initial snapshot table — this will be the result set of your `select` statement, with additional columns including `dbt_valid_from` and `dbt_valid_to`. All records will have a `dbt_valid_to = null` or the value specified in [`dbt_valid_to_current`](/reference/resource-configs/dbt_valid_to_current) (available in dbt Core 1.9+) if configured. +* **On subsequent runs:** dbt will check which records have changed or if any new records have been created: + - The `dbt_valid_to` column will be updated for any existing records that have changed. + - The updated record and any new records will be inserted into the snapshot table. These records will now have `dbt_valid_to = null` or the value configured in `dbt_valid_to_current` (available in dbt Core v1.9+). - + -10. Schedule the `snapshot` command to run regularly — snapshots are only useful if you run them frequently. +#### Note +- These column names can be customized to your team or organizational conventions using the [snapshot_meta_column_names](#snapshot-meta-fields) config. +- Use the `dbt_valid_to_current` config to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track hard deletes by adding a new record when row become "deleted" in source. Supported options are `ignore`, `invalidate`, and `new_record`. + +Snapshots can be referenced in downstream models the same way as referencing models — by using the [ref](/reference/dbt-jinja-functions/ref) function. ## Detecting row changes -Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt — `timestamp` and `check`. +Snapshot "strategies" define how dbt knows if a row has changed. There are two strategies built-in to dbt: +- [Timestamp](#timestamp-strategy-recommended) — Uses an `updated_at` column to determine if a row has changed. +- [Check](#check-strategy) — Compares a list of columns between their current and historical values to determine if a row has changed. ### Timestamp strategy (recommended) The `timestamp` strategy uses an `updated_at` field to determine if a row has changed. If the configured `updated_at` column for a row is more recent than the last time the snapshot ran, then dbt will invalidate the old record and record the new one. If the timestamps are unchanged, then dbt will not take any action. @@ -266,27 +268,19 @@ The `timestamp` strategy requires the following configurations: - - -```sql -{% snapshot orders_snapshot_timestamp %} - - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - ) - }} - - select * from {{ source('jaffle_shop', 'orders') }} - -{% endsnapshot %} + + +```yaml +snapshots: + - name: orders_snapshot_timestamp + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at ``` - - ### Check strategy @@ -298,16 +292,13 @@ The `check` strategy requires the following configurations: | ------ | ----------- | ------- | | check_cols | A list of columns to check for changes, or `all` to check all columns | `["name", "email"]` | - - :::caution check_cols = 'all' The `check` snapshot strategy can be configured to track changes to _all_ columns by supplying `check_cols = 'all'`. It is better to explicitly enumerate the columns that you want to check. Consider using a to condense many columns into a single column. ::: - -**Example Usage** +**Example usage** @@ -336,30 +327,75 @@ The `check` snapshot strategy can be configured to track changes to _all_ column - + + +```yaml +snapshots: + - name: orders_snapshot_check + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: check + check_cols: + - status + - is_cancelled +``` -```sql -{% snapshot orders_snapshot_check %} + - {{ - config( - schema='snapshots', - strategy='check', - unique_key='id', - check_cols=['status', 'is_cancelled'], - ) - }} + - select * from {{ source('jaffle_shop', 'orders') }} +### Hard deletes (opt-in) -{% endsnapshot %} + + +In dbt v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config to give you more control on how to handle deleted rows from the source. The `hard_deletes` config is not a separate strategy but an additional opt-in feature that can be used with any snapshot strategy. + +The `hard_deletes` config has three options/fields: +| Field | Description | +| --------- | ----------- | +| `ignore` (default) | No action for deleted records. | +| `invalidate` | Behaves the same as the existing `invalidate_hard_deletes=true`, where deleted records are invalidated by setting `dbt_valid_to`. | +| `new_record` | Tracks deleted records as new rows using the `dbt_is_deleted` [meta field](#snapshot-meta-fields) when records are deleted.| + +import HardDeletes from '/snippets/_hard-deletes.md'; + + + +#### Example usage + + + +```yaml +snapshots: + - name: orders_snapshot_hard_delete + relation: source('jaffle_shop', 'orders') + config: + schema: snapshots + unique_key: id + strategy: timestamp + updated_at: updated_at + hard_deletes: new_record # options are: 'ignore', 'invalidate', or 'new_record' ``` +In this example, the `hard_deletes: new_record` config will add a new row for deleted records with the `dbt_is_deleted` column set to `True`. +Any restored records are added as new rows with the `dbt_is_deleted` field set to `False`. + +The resulting table will look like this: + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_is_deleted | +| -- | ------ | ---------- | -------------- | ------------ | -------------- | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | False | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | False | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | 2024-01-01 12:00 | True | +| 1 | restored | 2024-01-01 12:00 | 2024-01-01 12:00 | | False | + -### Hard deletes (opt-in) + Rows that are deleted from the source query are not invalidated by default. With the config option `invalidate_hard_deletes`, dbt can track rows that no longer exist. This is done by left joining the snapshot table with the source table, and filtering the rows that are still valid at that point, but no longer can be found in the source table. `dbt_valid_to` will be set to the current snapshot time. @@ -367,9 +403,9 @@ This configuration is not a different strategy as described above, but is an add For this configuration to work with the `timestamp` strategy, the configured `updated_at` column must be of timestamp type. Otherwise, queries will fail due to mixing data types. -**Example Usage** +Note, in v1.9 and higher, the [`hard_deletes`](/reference/resource-configs/hard-deletes) config replaces the `invalidate_hard_deletes` config for better control over how to handle deleted rows from the source. - +#### Example usage @@ -395,37 +431,153 @@ For this configuration to work with the `timestamp` strategy, the configured `up +## Snapshot meta-fields + +Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*. + +In dbt Core v1.9+ (or available sooner in [the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks)): +- These column names can be customized to your team or organizational conventions using the [`snapshot_meta_column_names`](/reference/resource-configs/snapshot_meta_column_names) config. +ess) +- Use the [`dbt_valid_to_current` config](/reference/resource-configs/dbt_valid_to_current) to set a custom indicator for the value of `dbt_valid_to` in current snapshot records (like a future date such as `9999-12-31`). By default, this value is `NULL`. When set, dbt will use this specified value instead of `NULL` for `dbt_valid_to` for current records in the snapshot table. +- Use the [`hard_deletes`](/reference/resource-configs/hard-deletes) config to track deleted records as new rows with the `dbt_is_deleted` meta field when using the `hard_deletes='new_record'` field. + + +| Field | Meaning | Usage | +| -------------- | ------- | ----- | +| dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | +| dbt_valid_to | The timestamp when this row became invalidated.
For current records, this is `NULL` by default or the value specified in `dbt_valid_to_current`. | The most recent snapshot record will have `dbt_valid_to` set to `NULL` or the specified value. | +| dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | +| dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | +| dbt_is_deleted | A boolean value indicating if the record has been deleted. `True` if deleted, `False` otherwise. | Added when `hard_deletes='new_record'` is configured. This is used internally by dbt | + +*The timestamps used for each column are subtly different depending on the strategy you use: + +For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns. + +
+ Details for the timestamp strategy + +Snapshot query results at `2024-01-01 11:00` + +| id | status | updated_at | +| -- | ------- | ---------------- | +| 1 | pending | 2024-01-01 10:47 | + +Snapshot results (note that `11:00` is not used anywhere): + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | +| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | | 2024-01-01 10:47 | + +Query results at `2024-01-01 11:30`: + +| id | status | updated_at | +| -- | ------- | ---------------- | +| 1 | shipped | 2024-01-01 11:05 | + +Snapshot results (note that `11:30` is not used anywhere): + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | +| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | | 2024-01-01 11:05 | + +Snapshot results with `hard_deletes='new_record'`: + +| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | +|----|---------|------------------|------------------|------------------|------------------|----------------| +| 1 | pending | 2024-01-01 10:47 | 2024-01-01 10:47 | 2024-01-01 11:05 | 2024-01-01 10:47 | False | +| 1 | shipped | 2024-01-01 11:05 | 2024-01-01 11:05 | 2024-01-01 11:20 | 2024-01-01 11:05 | False | +| 1 | deleted | 2024-01-01 11:20 | 2024-01-01 11:20 | | 2024-01-01 11:20 | True | + + +
+ +
+ +For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy. + +
+ Details for the check strategy + +Snapshot query results at `2024-01-01 11:00` + +| id | status | +| -- | ------- | +| 1 | pending | + +Snapshot results: + +| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | +| -- | ------- | ---------------- | ---------------- | ---------------- | +| 1 | pending | 2024-01-01 11:00 | | 2024-01-01 11:00 | + +Query results at `2024-01-01 11:30`: + +| id | status | +| -- | ------- | +| 1 | shipped | + +Snapshot results: + +| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | +| --- | ------- | ---------------- | ---------------- | ---------------- | +| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | +| 1 | shipped | 2024-01-01 11:30 | | 2024-01-01 11:30 | + +Snapshot results with `hard_deletes='new_record'`: + +| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | dbt_is_deleted | +|----|---------|------------------|------------------|------------------|----------------| +| 1 | pending | 2024-01-01 11:00 | 2024-01-01 11:30 | 2024-01-01 11:00 | False | +| 1 | shipped | 2024-01-01 11:30 | 2024-01-01 11:40 | 2024-01-01 11:30 | False | +| 1 | deleted | 2024-01-01 11:40 | | 2024-01-01 11:40 | True | + +
+ +## Configure snapshots in versions 1.8 and earlier + - +For information about configuring snapshots in dbt versions 1.8 and earlier, select **1.8** from the documentation version picker, and it will appear in this section. + +To configure snapshots in versions 1.9 and later, refer to [Configuring snapshots](#configuring-snapshots). The latest versions use an updated snapshot configuration syntax that optimizes performance. + + + + + +- In dbt versions 1.8 and earlier, snapshots are `select` statements, defined within a snapshot block in a `.sql` file (typically in your `snapshots` directory). You'll also need to configure your snapshot to tell dbt how to detect record changes. +- The earlier dbt versions use an older syntax that allows for defining multiple resources in a single file. This syntax can significantly slow down parsing and compilation. +- For faster and more efficient management, consider [choosing the "Latest" release track in dbt Cloud](/docs/dbt-versions/cloud-release-tracks) or the [latest version of dbt Core](/docs/dbt-versions/core), which introduces an updated snapshot configuration syntax that optimizes performance. + - For more information on how to migrate from the legacy snapshot configurations to the updated snapshot YAML syntax, refer to [Snapshot configuration migration](/reference/snapshot-configs#snapshot-configuration-migration). + +The following example shows how to configure a snapshot: + + ```sql -{% snapshot orders_snapshot_hard_delete %} +{% snapshot orders_snapshot %} - {{ - config( - schema='snapshots', - strategy='timestamp', - unique_key='id', - updated_at='updated_at', - invalidate_hard_deletes=True, - ) - }} +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', - select * from {{ source('jaffle_shop', 'orders') }} + strategy='timestamp', + updated_at='updated_at', + ) +}} + +select * from {{ source('jaffle_shop', 'orders') }} {% endsnapshot %} ``` - - -## Configuring snapshots -### Snapshot configurations -There are a number of snapshot-specific configurations: - - +The following table outlines the configurations available for snapshots in versions 1.8 and earlier: | Config | Description | Required? | Example | | ------ | ----------- | --------- | ------- | @@ -437,156 +589,135 @@ There are a number of snapshot-specific configurations: | [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | | [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source, and set `dbt_valid_to` current time if no longer exists | No | True | -A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). - -Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- A number of other configurations are also supported (e.g. `tags` and `post-hook`), check out the full list [here](/reference/snapshot-configs). +- Snapshots can be configured from both your `dbt_project.yml` file and a `config` block, check out the [configuration docs](/reference/snapshot-configs) for more information. +- Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. -Note: BigQuery users can use `target_project` and `target_dataset` as aliases for `target_database` and `target_schema`, respectively. +### Configuration example - - - - -| Config | Description | Required? | Example | -| ------ | ----------- | --------- | ------- | -| [database](/reference/resource-configs/database) | Specify a custom database for the snapshot | No | analytics | -| [schema](/reference/resource-configs/schema) | Specify a custom schema for the snapshot | No | snapshots | -| [alias](/reference/resource-configs/alias) | Specify an alias for the snapshot | No | your_custom_snapshot | -| [strategy](/reference/resource-configs/strategy) | The snapshot strategy to use. Valid values: `timestamp` or `check` | Yes | timestamp | -| [unique_key](/reference/resource-configs/unique_key) | A column or expression for the record | Yes | id | -| [check_cols](/reference/resource-configs/check_cols) | If using the `check` strategy, then the columns to check | Only if using the `check` strategy | ["status"] | -| [updated_at](/reference/resource-configs/updated_at) | If using the `timestamp` strategy, the timestamp column to compare | Only if using the `timestamp` strategy | updated_at | -| [invalidate_hard_deletes](/reference/resource-configs/invalidate_hard_deletes) | Find hard deleted records in source and set `dbt_valid_to` to current time if the record no longer exists | No | True | - -In versions prior to v1.9, the `target_schema` (required) and `target_database` (optional) configurations defined a single schema or database to build a snapshot into across users and environments. This created problems when testing or developing a snapshot, as there was no clear separation between development and production environments. In v1.9, support was added for environment-aware snapshots by making `target_schema` optional. Snapshots, by default with no `target_schema` or `target_database` config defined, now resolve the schema or database to build the snapshot into using the `generate_schema_name` or `generate_database_name` macros. Developers can optionally define a custom location for snapshots to build to with the [`schema`](/reference/resource-configs/schema) and [`database`](/reference/resource-configs/database) configs, as is consistent with other resource types. - -A number of other configurations are also supported (for example, `tags` and `post-hook`). For the complete list, refer to [Snapshot configurations](/reference/snapshot-configs). +To add a snapshot to your project: -You can configure snapshots from both the `dbt_project.yml` file and a `config` block. For more information, refer to the [configuration docs](/reference/snapshot-configs). +1. Create a file in your `snapshots` directory with a `.sql` file extension, e.g. `snapshots/orders.sql` +2. Use a `snapshot` block to define the start and end of a snapshot: - + -### Configuration best practices -#### Use the `timestamp` strategy where possible -This strategy handles column additions and deletions better than the `check` strategy. +```sql +{% snapshot orders_snapshot %} -#### Ensure your unique key is really unique -The unique key is used by dbt to match rows up, so it's extremely important to make sure this key is actually unique! If you're snapshotting a source, I'd recommend adding a uniqueness test to your source ([example](https://github.com/dbt-labs/jaffle_shop/blob/8e7c853c858018180bef1756ec93e193d9958c5b/models/staging/schema.yml#L26)). +{% endsnapshot %} +``` - + -#### Use a `target_schema` that is separate to your analytics schema -Snapshots cannot be rebuilt. As such, it's a good idea to put snapshots in a separate schema so end users know they are special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. +3. Write a `select` statement within the snapshot block (tips for writing a good snapshot query are below). This select statement defines the results that you want to snapshot over time. You can use `sources` and `refs` here. -
+ - +```sql +{% snapshot orders_snapshot %} -#### Use a schema that is separate to your models' schema -Snapshots can't be rebuilt. Because of this, it's a good idea to put snapshots in a separate schema so end users know they're special. From there, you may want to set different privileges on your snapshots compared to your models, and even run them as a different user (or role, depending on your warehouse) to make it very difficult to drop a snapshot unless you really want to. +select * from {{ source('jaffle_shop', 'orders') }} - +{% endsnapshot %} +``` -## Snapshot query best practices + -#### Snapshot source data. -Your models should then select from these snapshots, treating them like regular data sources. As much as possible, snapshot your source data in its raw form and use downstream models to clean up the data +4. Check whether the result set of your query includes a reliable timestamp column that indicates when a record was last updated. For our example, the `updated_at` column reliably indicates record changes, so we can use the `timestamp` strategy. If your query result set does not have a reliable timestamp, you'll need to instead use the `check` strategy — more details on this below. -#### Use the `source` function in your query. -This helps when understanding data lineage in your project. +5. Add configurations to your snapshot using a `config` block (more details below). You can also configure your snapshot from your `dbt_project.yml` file ([docs](/reference/snapshot-configs)). -#### Include as many columns as possible. -In fact, go for `select *` if performance permits! Even if a column doesn't feel useful at the moment, it might be better to snapshot it in case it becomes useful – after all, you won't be able to recreate the column later. + -#### Avoid joins in your snapshot query. -Joins can make it difficult to build a reliable `updated_at` timestamp. Instead, snapshot the two tables separately, and join them in downstream models. + -#### Limit the amount of transformation in your query. -If you apply business logic in a snapshot query, and this logic changes in the future, it can be impossible (or, at least, very difficult) to apply the change in logic to your snapshots. +```sql +{% snapshot orders_snapshot %} -Basically – keep your query as simple as possible! Some reasonable exceptions to these recommendations include: -* Selecting specific columns if the table is wide. -* Doing light transformation to get data into a reasonable shape, for example, unpacking a blob to flatten your source data into columns. +{{ + config( + target_database='analytics', + target_schema='snapshots', + unique_key='id', -## Snapshot meta-fields + strategy='timestamp', + updated_at='updated_at', + ) +}} -Snapshot tables will be created as a clone of your source dataset, plus some additional meta-fields*. +select * from {{ source('jaffle_shop', 'orders') }} -| Field | Meaning | Usage | -| -------------- | ------- | ----- | -| dbt_valid_from | The timestamp when this snapshot row was first inserted | This column can be used to order the different "versions" of a record. | -| dbt_valid_to | The timestamp when this row became invalidated. | The most recent snapshot record will have `dbt_valid_to` set to `null`. | -| dbt_scd_id | A unique key generated for each snapshotted record. | This is used internally by dbt | -| dbt_updated_at | The updated_at timestamp of the source record when this snapshot row was inserted. | This is used internally by dbt | +{% endsnapshot %} +``` -*The timestamps used for each column are subtly different depending on the strategy you use: + -For the `timestamp` strategy, the configured `updated_at` column is used to populate the `dbt_valid_from`, `dbt_valid_to` and `dbt_updated_at` columns. +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example a new table will be created at `analytics.snapshots.orders_snapshot`. You can change the `target_database` configuration, the `target_schema` configuration and the name of the snapshot (as defined in `{% snapshot .. %}`) will change how dbt names this table. -
- Details for the timestamp strategy + -Snapshot query results at `2019-01-01 11:00` + -| id | status | updated_at | -| -- | ------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | + -Snapshot results (note that `11:00` is not used anywhere): +```sql +{% snapshot orders_snapshot %} -| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | | 2019-01-01 10:47 | +{{ + config( + schema='snapshots', + unique_key='id', + strategy='timestamp', + updated_at='updated_at', + ) +}} -Query results at `2019-01-01 11:30`: +select * from {{ source('jaffle_shop', 'orders') }} -| id | status | updated_at | -| -- | ------- | ---------------- | -| 1 | shipped | 2019-01-01 11:05 | +{% endsnapshot %} +``` -Snapshot results (note that `11:30` is not used anywhere): + -| id | status | updated_at | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 10:47 | 2019-01-01 10:47 | 2019-01-01 11:05 | 2019-01-01 10:47 | -| 1 | shipped | 2019-01-01 11:05 | 2019-01-01 11:05 | | 2019-01-01 11:05 | +6. Run the `dbt snapshot` [command](/reference/commands/snapshot) — for our example, a new table will be created at `analytics.snapshots.orders_snapshot`. The [`schema`](/reference/resource-configs/schema) config will utilize the `generate_schema_name` macro. -
+
-
+``` +$ dbt snapshot +Running with dbt=1.8.0 -For the `check` strategy, the current timestamp is used to populate each column. If configured, the `check` strategy uses the `updated_at` column instead, as with the timestamp strategy. +15:07:36 | Concurrency: 8 threads (target='dev') +15:07:36 | +15:07:36 | 1 of 1 START snapshot snapshots.orders_snapshot...... [RUN] +15:07:36 | 1 of 1 OK snapshot snapshots.orders_snapshot..........[SELECT 3 in 1.82s] +15:07:36 | +15:07:36 | Finished running 1 snapshots in 0.68s. -
- Details for the check strategy +Completed successfully -Snapshot query results at `2019-01-01 11:00` +Done. PASS=2 ERROR=0 SKIP=0 TOTAL=1 +``` -| id | status | -| -- | ------- | -| 1 | pending | +7. Inspect the results by selecting from the table dbt created. After the first run, you should see the results of your query, plus the [snapshot meta fields](#snapshot-meta-fields) as described earlier. -Snapshot results: +8. Run the `dbt snapshot` command again, and inspect the results. If any records have been updated, the snapshot should reflect this. -| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| -- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | | 2019-01-01 11:00 | +9. Select from the `snapshot` in downstream models using the `ref` function. -Query results at `2019-01-01 11:30`: + -| id | status | -| -- | ------- | -| 1 | shipped | +```sql +select * from {{ ref('orders_snapshot') }} +``` -Snapshot results: + -| id | status | dbt_valid_from | dbt_valid_to | dbt_updated_at | -| --- | ------- | ---------------- | ---------------- | ---------------- | -| 1 | pending | 2019-01-01 11:00 | 2019-01-01 11:30 | 2019-01-01 11:00 | -| 1 | shipped | 2019-01-01 11:30 | | 2019-01-01 11:30 | +10. Snapshots are only useful if you run them frequently — schedule the `snapshot` command to run regularly. -
+
## FAQs diff --git a/website/docs/docs/build/sources.md b/website/docs/docs/build/sources.md index 4926601f3b2..aad1ac42c8e 100644 --- a/website/docs/docs/build/sources.md +++ b/website/docs/docs/build/sources.md @@ -130,11 +130,11 @@ You can find more details on the available properties for sources in the [refere -## Snapshotting source data freshness -With a couple of extra configs, dbt can optionally snapshot the "freshness" of the data in your source tables. This is useful for understanding if your data pipelines are in a healthy state, and is a critical component of defining SLAs for your warehouse. +## Source data freshness +With a couple of extra configs, dbt can optionally capture the "freshness" of the data in your source tables. This is useful for understanding if your data pipelines are in a healthy state, and is a critical component of defining SLAs for your warehouse. ### Declaring source freshness -To configure sources to snapshot freshness information, add a `freshness` block to your source and `loaded_at_field` to your table declaration: +To configure source freshness information, add a `freshness` block to your source and `loaded_at_field` to your table declaration: @@ -164,14 +164,14 @@ sources: -In the `freshness` block, one or both of `warn_after` and `error_after` can be provided. If neither is provided, then dbt will not calculate freshness snapshots for the tables in this source. +In the `freshness` block, one or both of `warn_after` and `error_after` can be provided. If neither is provided, then dbt will not calculate freshness for the tables in this source. Additionally, the `loaded_at_field` is required to calculate freshness for a table. If a `loaded_at_field` is not provided, then dbt will not calculate freshness for the table. These configs are applied hierarchically, so `freshness` and `loaded_at_field` values specified for a `source` will flow through to all of the `tables` defined in that source. This is useful when all of the tables in a source have the same `loaded_at_field`, as the config can just be specified once in the top-level source definition. ### Checking source freshness -To snapshot freshness information for your sources, use the `dbt source freshness` command ([reference docs](/reference/commands/source)): +To obtain freshness information for your sources, use the `dbt source freshness` command ([reference docs](/reference/commands/source)): ``` $ dbt source freshness @@ -182,7 +182,7 @@ Behind the scenes, dbt uses the freshness properties to construct a `select` que ```sql select max(_etl_loaded_at) as max_loaded_at, - convert_timezone('UTC', current_timestamp()) as snapshotted_at + convert_timezone('UTC', current_timestamp()) as calculated_at from raw.jaffle_shop.orders ``` @@ -198,7 +198,7 @@ Some databases can have tables where a filter over certain columns are required, ```sql select max(_etl_loaded_at) as max_loaded_at, - convert_timezone('UTC', current_timestamp()) as snapshotted_at + convert_timezone('UTC', current_timestamp()) as calculated_at from raw.jaffle_shop.orders where _etl_loaded_at >= date_sub(current_date(), interval 1 day) ``` diff --git a/website/docs/docs/build/unit-tests.md b/website/docs/docs/build/unit-tests.md index 1d7143d7476..a85ffa07ed2 100644 --- a/website/docs/docs/build/unit-tests.md +++ b/website/docs/docs/build/unit-tests.md @@ -8,15 +8,14 @@ keywords: - unit test, unit tests, unit testing, dag --- -:::note + + -This functionality is only supported in dbt Core v1.8+ or accounts that have opted for a ["Versionless"](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) dbt Cloud experience. -::: Historically, dbt's test coverage was confined to [“data” tests](/docs/build/data-tests), assessing the quality of input data or resulting datasets' structure. However, these tests could only be executed _after_ building a model. -With dbt Core v1.8 and dbt Cloud environments that have gone versionless by selecting the **Versionless** option, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. +Starting in dbt Core v1.8, we have introduced an additional type of test to dbt - unit tests. In software programming, unit tests validate small portions of your functional code, and they work much the same way here. Unit tests allow you to validate your SQL modeling logic on a small set of static inputs _before_ you materialize your full model in production. Unit tests enable test-driven development, benefiting developer efficiency and code reliability. ## Before you begin @@ -24,11 +23,15 @@ With dbt Core v1.8 and dbt Cloud environments that have gone versionless by sele - We currently only support adding unit tests to models in your _current_ project. - We currently _don't_ support unit testing models that use the [`materialized view`](/docs/build/materializations#materialized-view) materialization. - We currently _don't_ support unit testing models that use recursive SQL. -- You must specify all fields in a BigQuery STRUCT in a unit test. You cannot use only a subset of fields in a STRUCT. +- We currently _don't_ support unit testing models that use introspective queries. - If your model has multiple versions, by default the unit test will run on *all* versions of your model. Read [unit testing versioned models](/reference/resource-properties/unit-testing-versions) for more information. -- Unit tests must be defined in a YML file in your `models/` directory. -- Table names must be [aliased](/docs/build/custom-aliases) in order to unit test `join` logic. -- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. +- Unit tests must be defined in a YML file in your [`models/` directory](/reference/project-configs/model-paths). +- Table names must be aliased in order to unit test `join` logic. +- Include all [`ref`](/reference/dbt-jinja-functions/ref) or [`source`](/reference/dbt-jinja-functions/source) model references in the unit test configuration as `input`s to avoid "node not found" errors during compilation. + +#### Adapter-specific caveats +- You must specify all fields in a BigQuery `STRUCT` in a unit test. You cannot use only a subset of fields in a `STRUCT`. +- Redshift customers need to be aware of a [limitation when building unit tests](/reference/resource-configs/redshift-configs#unit-test-limitations) that requires a workaround. Read the [reference doc](/reference/resource-properties/unit-tests) for more details about formatting your unit tests. diff --git a/website/docs/docs/cloud-integrations/about-snowflake-native-app.md b/website/docs/docs/cloud-integrations/about-snowflake-native-app.md index 86ee6a7d630..9eb1179897e 100644 --- a/website/docs/docs/cloud-integrations/about-snowflake-native-app.md +++ b/website/docs/docs/cloud-integrations/about-snowflake-native-app.md @@ -46,7 +46,7 @@ App users are able to access all information that's available to the API service ## Procurement The dbt Snowflake Native App is available on the [Snowflake Marketplace](https://app.snowflake.com/marketplace/listing/GZTYZSRT2R3). Purchasing it includes access to the Native App and a dbt Cloud account that's on the Enterprise plan. Existing dbt Cloud Enterprise customers can also access it. If interested, contact your Enterprise account manager. -If you're interested, please [contact us](matilto:sales_snowflake_marketplace@dbtlabs.com) for more information. +If you're interested, please [contact us](mailto:sales_snowflake_marketplace@dbtlabs.com) for more information. ## Support If you have any questions about the dbt Snowflake Native App, you may [contact our Support team](mailto:dbt-snowflake-marketplace@dbtlabs.com) for help. Please provide information about your installation of the Native App, including your dbt Cloud account ID and Snowflake account identifier. diff --git a/website/docs/docs/cloud-integrations/avail-sl-integrations.md b/website/docs/docs/cloud-integrations/avail-sl-integrations.md index 04d9d55acb4..acc36623ab5 100644 --- a/website/docs/docs/cloud-integrations/avail-sl-integrations.md +++ b/website/docs/docs/cloud-integrations/avail-sl-integrations.md @@ -29,7 +29,7 @@ import AvailIntegrations from '/snippets/_sl-partner-links.md'; - {frontMatter.meta.api_name} to learn how to integrate and query your metrics in downstream tools. - [dbt Semantic Layer API query syntax](/docs/dbt-cloud-apis/sl-jdbc#querying-the-api-for-metric-metadata) -- [Hex dbt Semantic Layer cells](https://learn.hex.tech/docs/logic-cell-types/transform-cells/dbt-metrics-cells) to set up SQL cells in Hex. +- [Hex dbt Semantic Layer cells](https://learn.hex.tech/docs/explore-data/cells/data-cells/dbt-metrics-cells) to set up SQL cells in Hex. - [Resolve 'Failed APN'](/faqs/Troubleshooting/sl-alpn-error) error when connecting to the dbt Semantic Layer. - [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) - [dbt Semantic Layer FAQs](/docs/use-dbt-semantic-layer/sl-faqs) diff --git a/website/docs/docs/cloud-integrations/configure-auto-exposures.md b/website/docs/docs/cloud-integrations/configure-auto-exposures.md index 41448dd5f9e..9692249240a 100644 --- a/website/docs/docs/cloud-integrations/configure-auto-exposures.md +++ b/website/docs/docs/cloud-integrations/configure-auto-exposures.md @@ -6,12 +6,10 @@ description: "Import and auto-generate exposures from dashboards and understand image: /img/docs/cloud-integrations/auto-exposures/explorer-lineage2.jpg --- -# Configure auto-exposures +# Configure auto-exposures As a data team, it’s critical that you have context into the downstream use cases and users of your data products. [Auto-exposures](/docs/collaborate/auto-exposures) integrates natively with Tableau and [auto-generates downstream lineage](/docs/collaborate/auto-exposures#view-auto-exposures-in-dbt-explorer) in dbt Explorer for a richer experience. -:::info Available in beta -Auto-exposures are currently available in beta to a limited group of users and are gradually being rolled out. If you're interested in gaining access or learning more, stay tuned for updates! -::: + Auto-exposures help data teams optimize their efficiency and ensure data quality by: - Helping users understand how their models are used in downstream analytics tools to inform investments and reduce incidents — ultimately building trust and confidence in data products. @@ -22,23 +20,24 @@ Auto-exposures help data teams optimize their efficiency and ensure data quality To access the features, you should meet the following: -1. Your environment and jobs are on [Versionless](/docs/dbt-versions/versionless-cloud) dbt. +1. Your environment and jobs are on a supported [release track](/docs/dbt-versions/cloud-release-tracks) dbt. 2. You have a dbt Cloud account on the [Enterprise plan](https://www.getdbt.com/pricing/). 3. You have set up a [production](/docs/deploy/deploy-environments#set-as-production-environment) deployment environment for each project you want to explore, with at least one successful job run. 4. You have [admin permissions](/docs/cloud/manage-access/enterprise-permissions) in dbt Cloud to edit project settings or production environment settings. 5. Use Tableau as your BI tool and enable metadata permissions or work with an admin to do so. Compatible with Tableau Cloud or Tableau Server with the Metadata API enabled. -6. Run a production job _after_ saving the Tableau collections. + - If you're using Tableau Server, you need to [allowlist dbt Cloud's IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses) for your dbt Cloud region. + - Currently, you can only connect to a single Tableau site on the same server. ## Set up in Tableau This section of the document explains the steps you need to set up the auto-exposures integration with Tableau. Once you've set this up in Tableau and dbt Cloud, you can view the [auto-exposures](/docs/collaborate/auto-exposures#view-auto-exposures-in-dbt-explorer) in dbt Explorer. -To set up [personal access tokens (PATs)](/docs/dbt-cloud-apis/user-tokens#using-the-new-personal-access-tokens) needed for auto exposures, ask a site admin to configure it for the account. +To set up [personal access tokens (PATs)](https://help.tableau.com/current/server/en-us/security_personal_access_tokens.htm) needed for auto exposures, ask a site admin to configure it for the account. 1. Ensure you or a site admin enables PATs for the account in Tableau. -2. Create a PAT that you can add to dbt Cloud to pull in Tableau metadata for auto exposures. +2. Create a PAT that you can add to dbt Cloud to pull in Tableau metadata for auto exposures. Ensure the user creating the PAT has access to collections/folders, as the PAT only grants access matching the creator's existing privileges. 3. Copy the **Secret** and the **Token name** and enter them in dbt Cloud. The secret is only displayed once, so store it in a safe location (like a password manager). @@ -62,10 +61,15 @@ To set up [personal access tokens (PATs)](/docs/dbt-cloud-apis/user-tokens#using 4. Select the collections you want to include for auto exposures. - dbt Cloud automatically imports and syncs any workbook within the selected collections. New additions to the collections will be added to the lineage in dbt Cloud during the next automatic sync (usually once per day). + + :::info + dbt Cloud automatically imports and syncs any workbook within the selected collections. New additions to the collections will be added to the lineage in dbt Cloud during the next sync (automatically once per day). + + dbt Cloud immediately starts a sync when you update the selected collections list, capturing new workbooks and removing irrelevant ones. + ::: + 5. Click **Save**. -6. Run a production job _after_ saving the Tableau collections. dbt Cloud imports everything in the collection(s) and you can continue to view them in Explorer. For more information on how to view and use auto-exposures, refer to [View auto-exposures from dbt Explorer](/docs/collaborate/auto-exposures) page. @@ -74,5 +78,5 @@ dbt Cloud imports everything in the collection(s) and you can continue to view t ## Refresh auto-exposures in jobs :::info Coming soon -Soon, you’ll also be able to use auto-exposures trigger refresh of the data used in your Tableau dashboards from within dbt Cloud. Stay tuned for more on this soon! +Soon, you’ll also be able to use auto-exposures to trigger the refresh of the data used in your Tableau dashboards from within dbt Cloud. Stay tuned for more on this soon! ::: diff --git a/website/docs/docs/cloud-integrations/overview.md b/website/docs/docs/cloud-integrations/overview.md index d925e3e52a7..f5208c8d754 100644 --- a/website/docs/docs/cloud-integrations/overview.md +++ b/website/docs/docs/cloud-integrations/overview.md @@ -13,7 +13,7 @@ Many data applications integrate with dbt Cloud, enabling you to leverage the po
diff --git a/website/docs/docs/cloud-integrations/semantic-layer/excel.md b/website/docs/docs/cloud-integrations/semantic-layer/excel.md index e666bda0e58..c80040dce01 100644 --- a/website/docs/docs/cloud-integrations/semantic-layer/excel.md +++ b/website/docs/docs/cloud-integrations/semantic-layer/excel.md @@ -6,8 +6,6 @@ tags: [Semantic Layer] sidebar_label: "Microsoft Excel" --- -# Microsoft Excel - The dbt Semantic Layer offers a seamless integration with Excel Online and Desktop through a custom menu. This add-on allows you to build dbt Semantic Layer queries and return data on your metrics directly within Excel. ## Prerequisites @@ -18,14 +16,15 @@ The dbt Semantic Layer offers a seamless integration with Excel Online and Deskt - You must have a dbt Cloud Team or Enterprise [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. - Single-tenant accounts should contact their account representative for necessary setup and enablement. -import SLCourses from '/snippets/_sl-course.md'; +:::tip - +📹 For on-demand video learning, explore the [Querying the Semantic Layer with Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel) course to learn how to query metrics with Excel. +::: ## Installing the add-on -The dbt Semantic Layer Microsoft Excel integration is available to download directly on [Microsoft AppSource](https://appsource.microsoft.com/en-us/marketplace/apps?product=office). You can choose to download this add in for both [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) and [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True) +The dbt Semantic Layer Microsoft Excel integration is available to download directly on [Microsoft AppSource](https://appsource.microsoft.com/en-us/product/office/WA200007100?tab=Overview). You can choose to download this add-on in for both [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) and [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True) 1. In Excel, authenticate with your host, dbt Cloud environment ID, and service token. - Access your Environment ID, Host, and URLs in your dbt Cloud Semantic Layer settings. Generate a service token in the Semantic Layer settings or API tokens settings diff --git a/website/docs/docs/cloud-integrations/semantic-layer/tableau.md b/website/docs/docs/cloud-integrations/semantic-layer/tableau.md index 15a0a92cf39..1f6755c38fa 100644 --- a/website/docs/docs/cloud-integrations/semantic-layer/tableau.md +++ b/website/docs/docs/cloud-integrations/semantic-layer/tableau.md @@ -46,8 +46,8 @@ Alternatively, you can follow these steps to install the Connector: ## Using the integration 1. **Authentication** — Once you authenticate, the system will direct you to the data source page. -2. **Access all Semantic Layer Objects** — Use the "ALL" data source to access all the metrics, dimensions, and entities configured in your dbt Semantic Layer. Note that the "METRICS_AND_DIMENSIONS" data source has been deprecated and replaced by "ALL". -3. **Access saved queries** — You can optionally access individual [saved queries](/docs/build/saved-queries) that you've defined. These will also show up as unique data sources when you log in. +2. **Access all Semantic Layer Objects** — Use the "ALL" data source to access all the metrics, dimensions, and entities configured in your dbt Semantic Layer. Note that the "METRICS_AND_DIMENSIONS" data source has been deprecated and replaced by "ALL". Be sure to use a live connection since extracts are not supported at this time. +3. **Access saved queries** — You can optionally access individual [saved queries](/docs/build/saved-queries) that you've defined. These will also show up as unique data sources when you log in. 4. **Access worksheet** — From your data source selection, go directly to a worksheet in the bottom left-hand corner. 5. **Query metrics and dimensions** — Then, you'll find all the metrics, dimensions, and entities that are available to query on the left side of your window based on your selection. diff --git a/website/docs/docs/cloud-integrations/set-up-snowflake-native-app.md b/website/docs/docs/cloud-integrations/set-up-snowflake-native-app.md index cffd034ac33..ff151d4636e 100644 --- a/website/docs/docs/cloud-integrations/set-up-snowflake-native-app.md +++ b/website/docs/docs/cloud-integrations/set-up-snowflake-native-app.md @@ -45,7 +45,10 @@ The following are the prerequisites for dbt Cloud and Snowflake. Configure dbt Cloud and Snowflake Cortex to power the **Ask dbt** chatbot. 1. In dbt Cloud, browse to your Semantic Layer configurations. - 1. From the gear menu, select **Account settings**. In the left sidebar, select **Projects** and choose your dbt project from the project list. + + 1. Navigate to the left hand side panel and click your account name. From there, select **Account settings**. + 1. In the left sidebar, select **Projects** and choose your dbt project from the project list. + 1. In the **Project details** panel, click the **Edit Semantic Layer Configuration** link (which is below the **GraphQL URL** option). 1. In the **Semantic Layer Configuration Details** panel, identify the Snowflake credentials (which you'll use to access Snowflake Cortex) and the environment against which the Semantic Layer is run. Save the username, role, and the environment in a temporary location to use later on. @@ -67,7 +70,7 @@ Configure dbt Cloud and Snowflake Cortex to power the **Ask dbt** chatbot. ## Configure dbt Cloud Collect the following pieces of information from dbt Cloud to set up the application. -1. From the gear menu in dbt Cloud, select **Account settings**. In the left sidebar, select **API tokens > Service tokens**. Create a service token with access to all the projects you want to access in the dbt Snowflake Native App. Grant these permission sets: +1. Navigate to the left-hand side panel and click your account name. From there, select **Account settings**. Then click **API tokens > Service tokens**. Create a service token with access to all the projects you want to access in the dbt Snowflake Native App. Grant these permission sets: - **Manage marketplace apps** - **Job Admin** - **Metadata Only** @@ -144,7 +147,7 @@ Check that the SL user has been granted access to the `dbt_sl_llm` schema and ma -If there's been an update to the dbt Cloud account ID, access URL, or API service token, you need to update the configuration for the dbt Snowflake Native App. In Snowflake, navigate to the app's configuration page and delete the existing configurations. Add the new configuration and then run `CALL app_public.restart_ap ();` in the application database in Snowsight. +If there's been an update to the dbt Cloud account ID, access URL, or API service token, you need to update the configuration for the dbt Snowflake Native App. In Snowflake, navigate to the app's configuration page and delete the existing configurations. Add the new configuration and then run `CALL app_public.restart_app();` in the application database in Snowsight. diff --git a/website/docs/docs/cloud/about-cloud-develop-defer.md b/website/docs/docs/cloud/about-cloud-develop-defer.md index 4e2f70b7b82..d1685c42cba 100644 --- a/website/docs/docs/cloud/about-cloud-develop-defer.md +++ b/website/docs/docs/cloud/about-cloud-develop-defer.md @@ -13,11 +13,13 @@ Both the dbt Cloud IDE and the dbt Cloud CLI enable users to natively defer to p -By default, dbt follows these rules: +When using `--defer`, dbt Cloud will follow this order of execution for resolving the `{{ ref() }}` functions. -- dbt uses the production locations of parent models to resolve `{{ ref() }}` functions, based on metadata from the production environment. -- If a development version of a deferred model exists, dbt preferentially uses the development database location when resolving the reference. -- Passing the [`--favor-state`](/reference/node-selection/defer#favor-state) flag overrides the default behavior and _always_ resolve refs using production metadata, regardless of the presence of a development relation. +1. If a development version of a deferred relation exists, dbt preferentially uses the development database location when resolving the reference. +2. If a development version doesn't exist, dbt uses the staging locations of parent relations based on metadata from the staging environment. +3. If both a development and staging version doesn't exist, dbt uses the production locations of parent relations based on metadata from the production environment. + +**Note:** Passing the `--favor-state` flag will always resolve refs using staging metadata if available; otherwise, it defaults to production metadata regardless of the presence of a development relation, skipping step #1. For a clean slate, it's a good practice to drop the development schema at the start and end of your development cycle. @@ -50,8 +52,11 @@ The dbt Cloud CLI offers additional flexibility by letting you choose the source - ```yml -defer-env-id: '123456' +```yml +context: + active-host: ... + active-project: ... + defer-env-id: '123456' ``` @@ -60,7 +65,7 @@ defer-env-id: '123456' ```yml -dbt_cloud: +dbt-cloud: defer-env-id: '123456' ``` diff --git a/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md b/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md index 02f950111ea..1a7e59dd5c2 100644 --- a/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md +++ b/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md @@ -24,10 +24,16 @@ dbt Cloud's [flexible plans](https://www.getdbt.com/pricing/) and features make + + **Cell based:** ACCOUNT_PREFIX.us1.dbt.com | 52.45.144.63
54.81.134.249
52.22.161.231
52.3.77.232
3.214.191.130
34.233.79.135 | ✅ | ✅ | ✅ | +| North America [^1] | Azure
East US 2 (Virginia) | **Cell based:** ACCOUNT_PREFIX.us2.dbt.com | 20.10.67.192/26 | ❌ | ❌ | ✅ | | EMEA [^1] | AWS eu-central-1 (Frankfurt) | emea.dbt.com | 3.123.45.39
3.126.140.248
3.72.153.148 | ❌ | ❌ | ✅ | | EMEA [^1] | Azure
North Europe (Ireland) | **Cell based:** ACCOUNT_PREFIX.eu2.dbt.com | 20.13.190.192/26 | ❌ | ❌ | ✅ | | APAC [^1] | AWS ap-southeast-2 (Sydney)| au.dbt.com | 52.65.89.235
3.106.40.33
13.239.155.206
| ❌ | ❌ | ✅ | @@ -45,7 +46,7 @@ dbt Cloud, like many cloud services, relies on underlying AWS cloud infrastructu * Dynamic IP addresses — dbt Cloud infrastructure uses Amazon Web Services (AWS). dbt Cloud offers static URLs for streamlined access, but the dynamic nature of cloud services means the underlying IP addresses change occasionally. AWS manages the IP ranges and may change them according to their operational and security needs. -* Using hostnames for consistent access — To ensure uninterrupted access, we recommend that you dbt Cloud services using hostnames. Hostnames provide a consistent reference point, regardless of any changes in underlying IP addresses. We are aligning with an industry-standard practice employed by organizations such as Snowflake. +* Using hostnames for consistent access — To ensure uninterrupted access, we recommend that you use dbt Cloud services using hostnames. Hostnames provide a consistent reference point, regardless of any changes in underlying IP addresses. We are aligning with an industry-standard practice employed by organizations such as Snowflake. * Optimizing VPN connections — You should integrate a proxy alongside VPN for users who leverage VPN connections. This strategy enables steady IP addresses for your connections, facilitating smooth traffic flow through the VPN and onward to dbt Cloud. By employing a proxy and a VPN, you can direct traffic through the VPN and then to dbt Cloud. It's crucial to set up the proxy if you need to integrate with additional services. diff --git a/website/docs/docs/cloud/about-develop-dbt.md b/website/docs/docs/cloud/about-develop-dbt.md index 9568d70bb27..33d12b89e0f 100644 --- a/website/docs/docs/cloud/about-develop-dbt.md +++ b/website/docs/docs/cloud/about-develop-dbt.md @@ -9,9 +9,9 @@ hide_table_of_contents: true Develop dbt projects using dbt Cloud, which offers a fast and reliable way to work on your dbt project. It runs dbt Core in a hosted (single or multi-tenant) environment. -You can develop in your browser using an integrated development environment (IDE) or in a dbt Cloud-powered command line interface (CLI). +You can develop in your browser using an integrated development environment (IDE), a dbt Cloud-powered command line interface (CLI), or visual editor. -
+
+ +

To get started, you'll need a [dbt Cloud](https://www.getdbt.com/signup) account and a developer seat. For a more comprehensive guide about developing in dbt, refer to the [quickstart guides](/docs/get-started-dbt). diff --git a/website/docs/docs/cloud/account-integrations.md b/website/docs/docs/cloud/account-integrations.md new file mode 100644 index 00000000000..e5ff42cb900 --- /dev/null +++ b/website/docs/docs/cloud/account-integrations.md @@ -0,0 +1,103 @@ +--- +title: "Account integrations in dbt Cloud" +sidebar_label: "Account integrations" +description: "Learn how to configure account integrations for your dbt Cloud account." +--- + +The following sections describe the different **Account integrations** available from your dbt Cloud account under the account **Settings** section. + + + +## Git integrations + +Connect your dbt Cloud account to your Git provider to enable dbt Cloud users to authenticate your personal accounts. dbt Cloud will perform Git actions on behalf of your authenticated self, against repositories to which you have access according to your Git provider permissions. + +To configure a Git account integration: +1. Navigate to **Account settings** in the side menu. +2. Under the **Settings** section, click on **Integrations**. +3. Click on the Git provider from the list and select the **Pencil** icon to the right of the provider. +4. dbt Cloud [natively connects](/docs/cloud/git/git-configuration-in-dbt-cloud) to the following Git providers: + + - [GitHub](/docs/cloud/git/connect-github) + - [GitLab](/docs/cloud/git/connect-gitlab) + - [Azure DevOps](/docs/cloud/git/connect-azure-devops) + +You can connect your dbt Cloud account to additional Git providers by importing a git repository from any valid git URL. Refer to [Import a git repository](/docs/cloud/git/import-a-project-by-git-url) for more information. + + + +## OAuth integrations + +Connect your dbt Cloud account to an OAuth provider that are integrated with dbt Cloud. + +To configure an OAuth account integration: +1. Navigate to **Account settings** in the side menu. +2. Under the **Settings** section, click on **Integrations**. +3. Under **OAuth**, and click on **Link** to connect your Slack account. +4. For custom OAuth providers, under **Custom OAuth integrations**, click on **Add integration** and select the OAuth provider from the list. Fill in the required fields and click **Save**. + + + +## AI integrations + +Once AI features have been [enabled](/docs/cloud/enable-dbt-copilot#enable-dbt-copilot), you can use dbt Labs' AI integration or bring-your-own provider to support AI-powered dbt Cloud features like [dbt Copilot](/docs/cloud/dbt-copilot) and [Ask dbt](/docs/cloud-integrations/snowflake-native-app) (both available on [dbt Cloud Enterprise plans](https://www.getdbt.com/pricing)). + +dbt Cloud supports AI integrations for dbt Labs-managed OpenAI keys, Self-managed OpenAI keys, or Self-managed Azure OpenAI keys . + +Note, if you bring-your-own provider, you will incur API calls and associated charges for features used in dbt Cloud. + +:::info +dbt Cloud's AI is optimized for OpenAIs gpt-4o. Using other models can affect performance and accuracy, and functionality with other models isn't guaranteed. +::: + +To configure the AI integration in your dbt Cloud account, a dbt Cloud admin can perform the following steps: +1. Navigate to **Account settings** in the side menu. +2. Select **Integrations** and scroll to the **AI** section. +3. Click on the **Pencil** icon to the right of **OpenAI** to configure the AI integration. + +4. Configure the AI integration for either **dbt Labs OpenAI**, **OpenAI**, or **Azure OpenAI**. + + + + + 1. Select the toggle for **dbt Labs** to use dbt Labs' managed OpenAI key. + 2. Click **Save**. + + + + + + + 1. Select the toggle for **OpenAI** to use your own OpenAI key. + 2. Enter the API key. + 3. Click **Save**. + + + + + + To learn about deploying your own OpenAI model on Azure, refer to [Deploy models on Azure OpenAI](https://learn.microsoft.com/en-us/azure/ai-studio/how-to/deploy-models-openai). Configure credentials for your Azure OpenAI deployment in dbt Cloud in the following two ways: + - [From a Target URI](#from-a-target-uri) + - [Manually providing the credentials](#manually-providing-the-credentials) + + #### From a Target URI + + 1. Locate your Azure OpenAI deployment URI in your Azure Deployment details page. + 2. In the dbt Cloud **Azure OpenAI** section, select the tab **From Target URI**. + 3. Paste the URI into the **Target URI** field. + 4. Enter your Azure OpenAI API key. + 5. Verify the **Endpoint**, **API Version**, and **Deployment Name** are correct. + 6. Click **Save**. + + + #### Manually providing the credentials + + 1. Locate your Azure OpenAI configuration in your Azure Deployment details page. + 2. In the dbt Cloud **Azure OpenAI** section, select the tab **Manual Input**. + 2. Enter your Azure OpenAI API key. + 3. Enter the **Endpoint**, **API Version**, and **Deployment Name**. + 4. Click **Save**. + + + + diff --git a/website/docs/docs/cloud/account-settings.md b/website/docs/docs/cloud/account-settings.md index 6d35da3b5f3..aaad9b28e5c 100644 --- a/website/docs/docs/cloud/account-settings.md +++ b/website/docs/docs/cloud/account-settings.md @@ -39,12 +39,12 @@ To use, select the **Enable partial parsing between deployment runs** option fro -## Account access to Advanced CI features +## Account access to Advanced CI features [Advanced CI](/docs/deploy/advanced-ci) features, such as [compare changes](/docs/deploy/advanced-ci#compare-changes), allow dbt Cloud account members to view details about the changes between what's in the production environment and the pull request. To use Advanced CI features, your dbt Cloud account must have access to them. Ask your dbt Cloud administrator to enable Advanced CI features on your account, which they can do by choosing the **Enable account access to Advanced CI** option from the account settings. -Once enabled, the **Run compare changes** option becomes available in the CI job settings for you to select. +Once enabled, the **dbt compare** option becomes available in the CI job settings for you to select. - \ No newline at end of file + diff --git a/website/docs/docs/cloud/billing.md b/website/docs/docs/cloud/billing.md index ad0834c6c98..2c80648d1f9 100644 --- a/website/docs/docs/cloud/billing.md +++ b/website/docs/docs/cloud/billing.md @@ -149,7 +149,7 @@ dbt Labs may institute use limits if reasonable use is exceeded. Additional feat ## Managing usage -From anywhere in the dbt Cloud account, click the **gear icon** and click **Account settings**. The **Billing** option will be on the left side menu under the **Account Settings** heading. Here, you can view individual available plans and the features provided for each. +From dbt Cloud, click on your account name in the left side menu and select **Account settings**. The **Billing** option will be on the left side menu under the **Settings** heading. Here, you can view individual available plans and the features provided for each. ### Usage notifications diff --git a/website/docs/docs/cloud/cloud-cli-installation.md b/website/docs/docs/cloud/cloud-cli-installation.md index 8a058cbb90f..a80f1a587e0 100644 --- a/website/docs/docs/cloud/cloud-cli-installation.md +++ b/website/docs/docs/cloud/cloud-cli-installation.md @@ -21,8 +21,6 @@ dbt commands are run against dbt Cloud's infrastructure and benefit from: ## Prerequisites The dbt Cloud CLI is available in all [deployment regions](/docs/cloud/about-cloud/access-regions-ip-addresses) and for both multi-tenant and single-tenant accounts. -- You are on dbt version 1.5 or higher. Alternatively, set it to [**Versionless**](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) to automatically stay up to date. - ## Install dbt Cloud CLI You can install the dbt Cloud CLI on the command line by using one of these methods. @@ -321,3 +319,10 @@ This alias will allow you to use the dbt-cloud command to invoke th If you've ran a dbt command and receive a Session occupied error, you can reattach to your existing session with dbt reattach and then press Control-C and choose to cancel the invocation. + + + + +The Cloud CLI allows only one command that writes to the data warehouse at a time. If you attempt to run multiple write commands simultaneously (for example, `dbt run` and `dbt build`), you will encounter a `stuck session` error. To resolve this, cancel the specific invocation by passing its ID to the cancel command. For more information, refer to [parallel execution](/reference/dbt-commands#parallel-execution). + + \ No newline at end of file diff --git a/website/docs/docs/cloud/configure-cloud-cli.md b/website/docs/docs/cloud/configure-cloud-cli.md index 854950f5d8c..5e0a285c5c5 100644 --- a/website/docs/docs/cloud/configure-cloud-cli.md +++ b/website/docs/docs/cloud/configure-cloud-cli.md @@ -52,21 +52,29 @@ Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt The config file looks like this: - ```yaml - version: "1" - context: - active-project: "" - active-host: "" - defer-env-id: "" - projects: - - project-id: "" - account-host: "" - api-key: "" - - - project-id: "" - account-host: "" - api-key: "" - ``` + ```yaml + version: "1" + context: + active-project: "" + active-host: "" + defer-env-id: "" + projects: + - project-name: "" + project-id: "" + account-name: "" + account-id: "" + account-host: "" # for example, "cloud.getdbt.com" + token-name: "" + token-value: "" + + - project-name: "" + project-id: "" + account-name: "" + account-id: "" + account-host: "" # for example, "cloud.getdbt.com" + token-name: "" + token-value: "" + ``` 3. After downloading the config file and creating your directory, navigate to a dbt project in your terminal: @@ -96,11 +104,10 @@ With your repo recloned, you can add, edit, and sync files with your repo. To set environment variables in the dbt Cloud CLI for your dbt project: -1. Select the gear icon on the upper right of the page. -2. Then select **Profile Settings**, then **Credentials**. -3. Click on your project and scroll to the **Environment Variables** section. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. +2. Under the **Your profile** section, select **Credentials**. +3. Click on your project and scroll to the **Environment variables** section. 4. Click **Edit** on the lower right and then set the user-level environment variables. - - Note, when setting up the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl), using [environment variables](/docs/build/environment-variables) like `{{env_var('DBT_WAREHOUSE')}}` is not supported. You should use the actual credentials instead. ## Use the dbt Cloud CLI @@ -195,4 +202,4 @@ This command moves the `dbt_cloud.yml` from the `Downloads` folder to the `.dbt` By default, [all artifacts](/reference/artifacts/dbt-artifacts) are downloaded when you execute dbt commands from the dbt Cloud CLI. To skip these files from being downloaded, add `--download-artifacts=false` to the command you want to run. This can help improve run-time performance but might break workflows that depend on assets like the [manifest](/reference/artifacts/manifest-json). - \ No newline at end of file + diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 83ba503657d..b9b2c18aced 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -8,7 +8,7 @@ pagination_prev: null --- dbt Cloud can connect with a variety of data platform providers including: - [AlloyDB](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) -- [Amazon Athena](/docs/cloud/connect-data-platform/connect-amazon-athena) +- [Amazon Athena](/docs/cloud/connect-data-platform/connect-amazon-athena) - [Amazon Redshift](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) - [Apache Spark](/docs/cloud/connect-data-platform/connect-apache-spark) - [Azure Synapse Analytics](/docs/cloud/connect-data-platform/connect-azure-synapse-analytics) @@ -18,10 +18,14 @@ dbt Cloud can connect with a variety of data platform providers including: - [PostgreSQL](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) - [Snowflake](/docs/cloud/connect-data-platform/connect-snowflake) - [Starburst or Trino](/docs/cloud/connect-data-platform/connect-starburst-trino) +- [Teradata](/docs/cloud/connect-data-platform/connect-teradata) -You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. +To connect to your database in dbt Cloud: - +1. Click your account name at the bottom of the left-side menu and click **Account settings** +2. Select **Projects** from the top left, and from there click **New Project** + + These connection instructions provide the basic fields required for configuring a data platform connection in dbt Cloud. For more detailed guides, which include demo project data, read our [Quickstart guides](https://docs.getdbt.com/guides) @@ -33,13 +37,14 @@ Starting July 2024, connection management has moved from the project level to th -The following connection management section describes these changes. +Connections created with APIs before this change cannot be accessed with the [latest APIs](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/List%20Account%20Connections). dbt Labs recommends [recreating the connections](https://docs.getdbt.com/dbt-cloud/api-v3#/operations/Create%20Account%20Connection) with the latest APIs. + ::: Warehouse connections are an account-level resource. As such you can find them under **Accounts Settings** > **Connections**: - + Warehouse connections can be re-used across projects. If multiple projects all connect to the same warehouse, you should re-use the same connection to streamline your management operations. Connections are assigned to a project via an [environment](/docs/dbt-cloud-environments). @@ -83,7 +88,7 @@ Please consider the following actions, as the steps you take will depend on the - Normalization - - Undertsand how new connections should be created to avoid local overrides. If you currently use extended attributes to override the warehouse instance in your production environment - you should instead create a new connection for that instance, and wire your production environment to it, removing the need for the local overrides + - Understand how new connections should be created to avoid local overrides. If you currently use extended attributes to override the warehouse instance in your production environment - you should instead create a new connection for that instance, and wire your production environment to it, removing the need for the local overrides - Create new connections, update relevant environments to target these connections, removing now unecessary local overrides (which may not be all of them!) - Test the new wiring by triggering jobs or starting IDE sessions @@ -95,4 +100,4 @@ dbt Cloud will always connect to your data platform from the IP addresses specif Be sure to allow traffic from these IPs in your firewall, and include them in any database grants. -Allowing these IP addresses only enables the connection to your . However, you might want to send API requests from your restricted network to the dbt Cloud API. For example, you could use the API to send a POST request that [triggers a job to run](https://docs.getdbt.com/dbt-cloud/api-v2-legacy#operation/triggerRun). Using the dbt Cloud API requires that you allow the `cloud.getdbt.com` subdomain. For more on the dbt Cloud architecture, see [Deployment architecture](/docs/cloud/about-cloud/architecture). +Allowing these IP addresses only enables the connection to your . However, you might want to send API requests from your restricted network to the dbt Cloud API. Using the dbt Cloud API requires allowing the `cloud.getdbt.com` subdomain. For more on the dbt Cloud architecture, see [Deployment architecture](/docs/cloud/about-cloud/architecture). diff --git a/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md b/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md index 0b2f844ccac..e3645500b9e 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-amazon-athena.md @@ -5,9 +5,9 @@ description: "Configure the Amazon Athena data platform connection in dbt Cloud. sidebar_label: "Connect Amazon Athena" --- -# Connect Amazon Athena +# Connect Amazon Athena -Your environment(s) must be on ["Versionless"](/docs/dbt-versions/versionless-cloud) to use the Amazon Athena connection. +Your environment(s) must be on a supported [release track](/docs/dbt-versions/cloud-release-tracks) to use the Amazon Athena connection. Connect dbt Cloud to Amazon's Athena interactive query service to build your dbt project. The following are the required and optional fields for configuring the Athena connection: diff --git a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md index 4719095b87f..5be802cae77 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb.md @@ -118,7 +118,7 @@ Once the connection is saved, a public key will be generated and displayed for t To configure the SSH tunnel in dbt Cloud, you'll need to provide the hostname/IP of your bastion server, username, and port, of your choosing, that dbt Cloud will connect to. Review the following steps: - Verify the bastion server has its network security rules set up to accept connections from the [dbt Cloud IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses) on whatever port you configured. -- Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud:` +- Set up the user account by using the bastion servers instance's CLI, The following example uses the username `dbtcloud`: ```shell sudo groupadd dbtcloud diff --git a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md index d8dd8dfec11..6b749ced186 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-snowflake.md @@ -5,6 +5,14 @@ description: "Configure Snowflake connection." sidebar_label: "Connect Snowflake" --- +:::note + +dbt Cloud connections and credentials inherit the permissions of the accounts configured. You can customize roles and associated permissions in Snowflake to fit your company's requirements and fine-tune access to database objects in your account. See [Snowflake permissions](/reference/database-permissions/snowflake-permissions) for more information about customizing roles in Snowflake. + +Refer to [Snowflake permissions](/reference/database-permissions/snowflake-permissions) for more information about customizing roles in Snowflake. + +::: + The following fields are required when creating a Snowflake connection | Field | Description | Examples | @@ -14,12 +22,9 @@ The following fields are required when creating a Snowflake connection | Database | The logical database to connect to and run queries against. | `analytics` | | Warehouse | The virtual warehouse to use for running queries. | `transforming` | - -**Note:** A crucial part of working with dbt atop Snowflake is ensuring that users (in development environments) and/or service accounts (in deployment to production environments) have the correct permissions to take actions on Snowflake! Here is documentation of some [example permissions to configure Snowflake access](/reference/database-permissions/snowflake-permissions). - ## Authentication methods -This section describes the different authentication methods available for connecting dbt Cloud to Snowflake. +This section describes the different authentication methods for connecting dbt Cloud to Snowflake. Configure Deployment environment (Production, Staging, General) credentials globally in the [**Connections**](/docs/deploy/deploy-environments#deployment-connection) area of **Account settings**. Individual users configure their development credentials in the [**Credentials**](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud#get-started-with-the-cloud-ide) area of their user profile. ### Username / Password diff --git a/website/docs/docs/cloud/connect-data-platform/connect-starburst-trino.md b/website/docs/docs/cloud/connect-data-platform/connect-starburst-trino.md index db0d3f61728..4c460f0d705 100644 --- a/website/docs/docs/cloud/connect-data-platform/connect-starburst-trino.md +++ b/website/docs/docs/cloud/connect-data-platform/connect-starburst-trino.md @@ -11,7 +11,7 @@ The following are the required fields for setting up a connection with a [Starbu | **Host** | The hostname of your cluster. Don't include the HTTP protocol prefix. | `mycluster.mydomain.com` | | **Port** | The port to connect to your cluster. By default, it's 443 for TLS enabled clusters. | `443` | | **User** | The username (of the account) to log in to your cluster. When connecting to Starburst Galaxy clusters, you must include the role of the user as a suffix to the username.

| Format for Starburst Enterprise or Trino depends on your configured authentication method.
Format for Starburst Galaxy:
  • `user.name@mydomain.com/role`
| -| **Password** | The user's password. | | +| **Password** | The user's password. | - | | **Database** | The name of a catalog in your cluster. | `example_catalog` | | **Schema** | The name of a schema that exists within the specified catalog.  | `example_schema` | diff --git a/website/docs/docs/cloud/connect-data-platform/connect-teradata.md b/website/docs/docs/cloud/connect-data-platform/connect-teradata.md new file mode 100644 index 00000000000..8663a181645 --- /dev/null +++ b/website/docs/docs/cloud/connect-data-platform/connect-teradata.md @@ -0,0 +1,29 @@ +--- +title: "Connect Teradata" +id: connect-teradata +description: "Configure the Teradata platform connection in dbt Cloud." +sidebar_label: "Connect Teradata" +--- + +# Connect Teradata + +Your environment(s) must be on a supported [release track](/docs/dbt-versions/cloud-release-tracks) to use the Teradata connection. + +| Field | Description | Type | Required? | Example | +| ----------------------------- | --------------------------------------------------------------------------------------------- | -------------- | --------- | ------- | +| Host | Host name of your Teradata environment. | String | Required | host-name.env.clearscape.teradata.com | +| Port | The database port number. Equivalent to the Teradata JDBC Driver DBS_PORT connection parameter.| Quoted integer | Optional | 1025 | +| Retries | Number of times to retry to connect to database upon error. | Integer | optional | 10 | +| Request timeout | The waiting period between connections attempts in seconds. Default is "1" second. | Quoted integer | Optional | 3 | + + + +### Development and deployment credentials + +| Field | Description | Type | Required? | Example | +| ------------------------------|-----------------------------------------------------------------------------------------------|----------------|-----------|--------------------| +| Username | The database username. Equivalent to the Teradata JDBC Driver USER connection parameter. | String | Required | database_username | +| Password | The database password. Equivalent to the Teradata JDBC Driver PASSWORD connection parameter. | String | Required | DatabasePassword123 | +| Schema | Specifies the initial database to use after login, rather than the user's default database. | String | Required | dbtlabsdocstest | + + diff --git a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md index 0243bc619b1..ffe7e468bd2 100644 --- a/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md +++ b/website/docs/docs/cloud/connect-data-platform/connnect-bigquery.md @@ -11,7 +11,12 @@ sidebar_label: "Connect BigQuery" :::info Uploading a service account JSON keyfile -While the fields in a BigQuery connection can be specified manually, we recommend uploading a service account keyfile to quickly and accurately configure a connection to BigQuery. +While the fields in a BigQuery connection can be specified manually, we recommend uploading a service account keyfile to quickly and accurately configure a connection to BigQuery. + +You can provide the JSON keyfile in one of two formats: + +- JSON keyfile upload — Upload the keyfile directly in its normal JSON format. +- Base64-encoded string — Provide the keyfile as a base64-encoded string. When you provide a base64-encoded string, dbt decodes it automatically and populates the necessary fields. ::: @@ -52,6 +57,123 @@ As an end user, if your organization has set up BigQuery OAuth, you can link a p To learn how to optimize performance with data platform-specific configurations in dbt Cloud, refer to [BigQuery-specific configuration](/reference/resource-configs/bigquery-configs). +### Optional configurations + +In BigQuery, optional configurations let you tailor settings for tasks such as query priority, dataset location, job timeout, and more. These options give you greater control over how BigQuery functions behind the scenes to meet your requirements. + +To customize your optional configurations in dbt Cloud: + +1. Click your name at the bottom left-hand side bar menu in dbt Cloud +2. Select **Your profile** from the menu +3. From there, click **Projects** and select your BigQuery project +5. Go to **Development Connection** and select BigQuery +6. Click **Edit** and then scroll down to **Optional settings** + + + +The following are the optional configurations you can set in dbt Cloud: + +| Configuration |
Information
| Type |
Example
| +|---------------------------|-----------------------------------------|---------|--------------------| +| [Priority](#priority) | Sets the priority for BigQuery jobs (either `interactive` or queued for `batch` processing) | String | `batch` or `interactive` | +| [Retries](#retries) | Specifies the number of retries for failed jobs due to temporary issues | Integer | `3` | +| [Location](#location) | Location for creating new datasets | String | `US`, `EU`, `us-west2` | +| [Maximum bytes billed](#maximum-bytes-billed) | Limits the maximum number of bytes that can be billed for a query | Integer | `1000000000` | +| [Execution project](#execution-project) | Specifies the project ID to bill for query execution | String | `my-project-id` | +| [Impersonate service account](#impersonate-service-account) | Allows users authenticated locally to access BigQuery resources under a specified service account | String | `service-account@project.iam.gserviceaccount.com` | +| [Job retry deadline seconds](#job-retry-deadline-seconds) | Sets the total number of seconds BigQuery will attempt to retry a job if it fails | Integer | `600` | +| [Job creation timeout seconds](#job-creation-timeout-seconds) | Specifies the maximum timeout for the job creation step | Integer | `120` | +| [Google cloud storage-bucket](#google-cloud-storage-bucket) | Location for storing objects in Google Cloud Storage | String | `my-bucket` | +| [Dataproc region](#dataproc-region) | Specifies the cloud region for running data processing jobs | String | `US`, `EU`, `asia-northeast1` | +| [Dataproc cluster name](#dataproc-cluster-name) | Assigns a unique identifier to a group of virtual machines in Dataproc | String | `my-cluster` | + + + + +The `priority` for the BigQuery jobs that dbt executes can be configured with the `priority` configuration in your BigQuery profile. The priority field can be set to one of `batch` or `interactive`. For more information on query priority, consult the [BigQuery documentation](https://cloud.google.com/bigquery/docs/running-queries). + + + + + +Retries in BigQuery help to ensure that jobs complete successfully by trying again after temporary failures, making your operations more robust and reliable. + + + + + +The `location` of BigQuery datasets can be set using the `location` setting in a BigQuery profile. As per the [BigQuery documentation](https://cloud.google.com/bigquery/docs/locations), `location` may be either a multi-regional location (for example, `EU`, `US`), or a regional location (like `us-west2`). + + + + + +When a `maximum_bytes_billed` value is configured for a BigQuery profile, that allows you to limit how much data your query can process. It’s a safeguard to prevent your query from accidentally processing more data than you expect, which could lead to higher costs. Queries executed by dbt will fail if they exceed the configured maximum bytes threshhold. This configuration should be supplied as an integer number of bytes. + +If your `maximum_bytes_billed` is 1000000000, you would enter that value in the `maximum_bytes_billed` field in dbt cloud. + + + + + + +By default, dbt will use the specified `project`/`database` as both: + +1. The location to materialize resources (models, seeds, snapshots, and so on), unless they specify a custom project/database config +2. The GCP project that receives the bill for query costs or slot usage + +Optionally, you may specify an execution project to bill for query execution, instead of the project/database where you materialize most resources. + + + + + +This feature allows users authenticating using local OAuth to access BigQuery resources based on the permissions of a service account. + +For a general overview of this process, see the official docs for [Creating Short-lived Service Account Credentials](https://cloud.google.com/iam/docs/create-short-lived-credentials-direct). + + + + + +Job retry deadline seconds is the maximum amount of time BigQuery will spend retrying a job before it gives up. + + + + + +Job creation timeout seconds is the maximum time BigQuery will wait to start the job. If the job doesn’t start within that time, it times out. + + + +#### Run dbt python models on Google Cloud Platform + +import BigQueryDataproc from '/snippets/_bigquery-dataproc.md'; + + + + + +Everything you store in Cloud Storage must be placed inside a [bucket](https://cloud.google.com/storage/docs/buckets). Buckets help you organize your data and manage access to it. + + + + + +A designated location in the cloud where you can run your data processing jobs efficiently. This region must match the location of your BigQuery dataset if you want to use Dataproc with BigQuery to ensure data doesn't move across regions, which can be inefficient and costly. + +For more information on [Dataproc regions](https://cloud.google.com/bigquery/docs/locations), refer to the BigQuery documentation. + + + + + +A unique label you give to your group of virtual machines to help you identify and manage your data processing tasks in the cloud. When you integrate Dataproc with BigQuery, you need to provide the cluster name so BigQuery knows which specific set of resources (the cluster) to use for running the data jobs. + +Have a look at [Dataproc's document on Create a cluster](https://cloud.google.com/dataproc/docs/guides/create-cluster) for an overview on how clusters work. + + + ### Account level connections and credential management You can re-use connections across multiple projects with [global connections](/docs/cloud/connect-data-platform/about-connections#migration-from-project-level-connections-to-account-level-connections). Connections are attached at the environment level (formerly project level), so you can utilize multiple connections inside of a single project (to handle dev, staging, production, etc.). @@ -147,3 +269,7 @@ For a project, you will first create an environment variable to store the secret "extended_attributes_id": FFFFF }' ``` + + + + diff --git a/website/docs/docs/cloud/dbt-assist-data.md b/website/docs/docs/cloud/dbt-assist-data.md deleted file mode 100644 index ad32c304ca8..00000000000 --- a/website/docs/docs/cloud/dbt-assist-data.md +++ /dev/null @@ -1,29 +0,0 @@ ---- -title: "dbt Assist privacy and data" -sidebar_label: "dbt Assist privacy" -description: "dbt Assist’s powerful AI feature helps you deliver data that works." ---- - -# dbt Assist privacy and data - -dbt Labs is committed to protecting your privacy and data. This page provides information about how dbt Labs handles your data when you use dbt Assist. - -#### Is my data used by dbt Labs to train AI models? - -No, dbt Assist does not use client warehouse data to train any AI models. It uses API calls to an AI provider. - -#### Does dbt Labs share my personal data with third parties - -dbt Labs only shares client personal information as needed to perform the services, under client instructions, or for legal, tax, or compliance reasons. - -#### Does dbt Assist store or use personal data? - -The user clicks the AI assist button, and the user does not otherwise enter data. - -#### Does dbt Assist access my warehouse data? - -dbt Assist utilizes metadata, including column names, model SQL, the model's name, and model documentation. The row-level data from the warehouse is never used or sent to a third-party provider. Such output must be double-checked by the user for completeness and accuracy. - -#### Can dbt Assist data be deleted upon client written request? - -dbt Assist data, aside from usage data, does not persist on dbt Labs systems. Usage data is retained by dbt Labs. dbt Labs does not have possession of any personal or sensitive data. To the extent client identifies personal or sensitive information uploaded by or on behalf of client to dbt Labs systems, such data can be deleted within 30 days of written request. diff --git a/website/docs/docs/cloud/dbt-assist.md b/website/docs/docs/cloud/dbt-assist.md deleted file mode 100644 index eafe7d05821..00000000000 --- a/website/docs/docs/cloud/dbt-assist.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: "About dbt Assist" -sidebar_label: "About dbt Assist" -description: "dbt Assist’s powerful AI co-pilot feature helps you deliver data that works." -pagination_next: "docs/cloud/enable-dbt-assist" -pagination_prev: null ---- - -# About dbt Assist - -dbt Assist is a powerful artificial intelligence (AI) co-pilot feature that helps automate development in dbt Cloud, allowing you to focus on delivering data that works. dbt Assist’s AI co-pilot generates [documentation](/docs/build/documentation) and [tests](/docs/build/data-tests) for your dbt SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. - -:::tip Beta feature -dbt Assist is an AI tool meant to _help_ developers generate documentation and tests in dbt Cloud. It's available in beta, in the dbt Cloud IDE only. - -To use dbt Assist, you must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing) and agree to use dbt Labs' OpenAI key. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta or reach out to your account team to begin this process. -::: - - - -## Feedback - -Please note: Always review AI-generated code and content as it may produce incorrect results. dbt Assist features and/or functionality may be added or eliminated as part of the beta trial. - -To give feedback, please reach out to your dbt Labs account team. We appreciate your feedback and suggestions as we improve dbt Assist. diff --git a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md index af303d0d9a0..de44de67b33 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md @@ -13,7 +13,7 @@ The dbt Cloud integrated development environment (IDE) is a single web-based int The dbt Cloud IDE offers several [keyboard shortcuts](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) and [editing features](/docs/cloud/dbt-cloud-ide/ide-user-interface#editing-features) for faster and efficient development and governance: - Syntax highlighting for SQL — Makes it easy to distinguish different parts of your code, reducing syntax errors and enhancing readability. -- AI co-pilot — Use [dbt Assist](/docs/cloud/dbt-assist), a powerful AI co-pilot feature, to generate documentation and tests for your dbt SQL models. +- AI copilot — Use [dbt Copilot](/docs/cloud/dbt-copilot), a powerful AI engine that can [generate code](/docs/cloud/use-dbt-copilot#generate-and-edit-code) using natural language, and [generate documentation](/docs/build/documentation), [tests](/docs/build/data-tests), and [semantic models](/docs/build/semantic-models) for you with the click of a button. - Auto-completion — Suggests table names, arguments, and column names as you type, saving time and reducing typos. - Code [formatting and linting](/docs/cloud/dbt-cloud-ide/lint-format) — Helps standardize and fix your SQL code effortlessly. - Navigation tools — Easily move around your code, jump to specific lines, find and replace text, and navigate between project files. @@ -53,9 +53,9 @@ To understand how to navigate the IDE and its user interface elements, refer to | Feature | Description | |---|---| | [**Keyboard shortcuts**](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) | You can access a variety of [commands and actions](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) in the IDE by choosing the appropriate keyboard shortcut. Use the shortcuts for common tasks like building modified models or resuming builds from the last failure. | -| **IDE version control** | The IDE version control section and git button allow you to apply the concept of [version control](/docs/collaborate/git/version-control-basics) to your project directly into the IDE.

- Create or change branches, execute git commands using the git button.
- Commit or revert individual files by right-clicking the edited file
- [Resolve merge conflicts](/docs/collaborate/git/merge-conflicts)
- Link to the repo directly by clicking the branch name
- Edit, format, or lint files and execute dbt commands in your primary protected branch, and commit to a new branch.
- Use Git diff view to view what has been changed in a file before you make a pull request.
- From dbt version 1.6 and higher, use the **Prune branches** [button](/docs/cloud/dbt-cloud-ide/ide-user-interface#prune-branches-modal) to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. | +| **IDE version control** | The IDE version control section and git button allow you to apply the concept of [version control](/docs/collaborate/git/version-control-basics) to your project directly into the IDE.

- Create or change branches, execute git commands using the git button.
- Commit or revert individual files by right-clicking the edited file
- [Resolve merge conflicts](/docs/collaborate/git/merge-conflicts)
- Link to the repo directly by clicking the branch name
- Edit, format, or lint files and execute dbt commands in your primary protected branch, and commit to a new branch.
- Use Git diff view to view what has been changed in a file before you make a pull request.
- Use the **Prune branches** [button](/docs/cloud/dbt-cloud-ide/ide-user-interface#prune-branches-modal) (dbt v1.6 and higher) to delete local branches that have been deleted from the remote repository, keeping your branch management tidy.
- Sign your [git commits](/docs/cloud/dbt-cloud-ide/git-commit-signing) to mark them as 'Verified'. | | **Preview and Compile button** | You can [compile or preview](/docs/cloud/dbt-cloud-ide/ide-user-interface#console-section) code, a snippet of dbt code, or one of your dbt models after editing and saving. | -| [**dbt Assist**](/docs/cloud/dbt-assist) | A powerful AI co-pilot feature that generates documentation and tests for your dbt SQL models. Available for dbt Cloud Enterprise plans. | +| [**dbt Copilot**](/docs/cloud/dbt-copilot) | A powerful AI engine that can generate documentation, tests, and semantic models for your dbt SQL models. Available for dbt Cloud Enterprise plans. | | **Build, test, and run button** | Build, test, and run your project with a button click or by using the Cloud IDE command bar. | **Command bar** | You can enter and run commands from the command bar at the bottom of the IDE. Use the [rich model selection syntax](/reference/node-selection/syntax) to execute [dbt commands](/reference/dbt-commands) directly within dbt Cloud. You can also view the history, status, and logs of previous runs by clicking History on the left of the bar. | **Drag and drop** | Drag and drop files located in the file explorer, and use the file breadcrumb on the top of the IDE for quick, linear navigation. Access adjacent files in the same file by right-clicking on the breadcrumb file. @@ -101,6 +101,7 @@ Nice job, you're ready to start developing and building models 🎉! ### Considerations - To improve your experience using dbt Cloud, we suggest that you turn off ad blockers. This is because some project file names, such as `google_adwords.sql`, might resemble ad traffic and trigger ad blockers. - To preserve performance, there's a file size limitation for repositories over 6 GB. If you have a repo over 6 GB, please contact [dbt Support](mailto:support@getdbt.com) before running dbt Cloud. +- The IDE's idle session timeout is one hour. - ### Start-up process @@ -127,8 +128,9 @@ Nice job, you're ready to start developing and building models 🎉! - If a model or test fails, dbt Cloud makes it easy for you to view and download the run logs for your dbt invocations to fix the issue. - Use dbt's [rich model selection syntax](/reference/node-selection/syntax) to [run dbt commands](/reference/dbt-commands) directly within dbt Cloud. - Starting from dbt v1.6, leverage [environments variables](/docs/build/environment-variables#special-environment-variables) to dynamically use the Git branch name. For example, using the branch name as a prefix for a development schema. + - Run [MetricFlow commands](/docs/build/metricflow-commands) to create and manage metrics in your project with the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl). -- **Generate your YAML configurations with dbt Assist** — [dbt Assist](/docs/cloud/dbt-assist) is a powerful artificial intelligence (AI) co-pilot feature that helps automate development in dbt Cloud. It generates documentation and tests for your dbt SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. Available for dbt Cloud Enterprise plans. +- **Generate your YAML configurations with dbt Copilot** — [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful artificial intelligence (AI) feature that helps automate development in dbt Cloud. It can generate documentation, tests, and semantic models for your dbt SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. Available for dbt Cloud Enterprise plans. - **Build and view your project's docs** — The dbt Cloud IDE makes it possible to [build and view](/docs/collaborate/build-and-view-your-docs) documentation for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. diff --git a/website/docs/docs/cloud/dbt-cloud-ide/git-commit-signing.md b/website/docs/docs/cloud/dbt-cloud-ide/git-commit-signing.md new file mode 100644 index 00000000000..afaa0751669 --- /dev/null +++ b/website/docs/docs/cloud/dbt-cloud-ide/git-commit-signing.md @@ -0,0 +1,80 @@ +--- +title: "Git commit signing" +description: "Learn how to sign your Git commits when using the IDE for development." +sidebar_label: Git commit signing +--- + +# Git commit signing + +To prevent impersonation and enhance security, you can sign your Git commits before pushing them to your repository. Using your signature, a Git provider can cryptographically verify a commit and mark it as "verified", providing increased confidence about its origin. + +You can configure dbt Cloud to sign your Git commits when using the IDE for development. To set up, enable the feature in dbt Cloud, follow the flow to generate a keypair, and upload the public key to your Git provider to use for signature verification. + + +## Prerequisites + +- GitHub or GitLab is your Git provider. Currently, Azure DevOps is not supported. +- You have a dbt Cloud account on the [Enterprise plan](https://www.getdbt.com/pricing/). + +## Generate GPG keypair in dbt Cloud + +To generate a GPG keypair in dbt Cloud, follow these steps: +1. Go to your **Personal profile** page in dbt Cloud. +2. Navigate to **Signed Commits** section. +3. Enable the **Sign commits originating from this user** toggle. +4. This will generate a GPG keypair. The private key will be used to sign all future Git commits. The public key will be displayed, allowing you to upload it to your Git provider. + + + +## Upload public key to Git provider + +To upload the public key to your Git provider, follow the detailed documentation provided by the supported Git provider: + +- [GitHub instructions](https://docs.github.com/en/authentication/managing-commit-signature-verification/adding-a-gpg-key-to-your-github-account) +- [GitLab instructions](https://docs.gitlab.com/ee/user/project/repository/signed_commits/gpg.html) + +Once you have uploaded the public key to your Git provider, your Git commits will be marked as "Verified" after you push the changes to the repository. + + + +## Considerations + +- The GPG keypair is tied to the user, not a specific account. There is a 1:1 relationship between the user and keypair. The same key will be used for signing commits on any accounts the user is a member of. +- The GPG keypair generated in dbt Cloud is linked to the email address associated with your account at the time of keypair creation. This email identifies the author of signed commits. +- For your Git commits to be marked as "verified", your dbt Cloud email address must be a verified email address with your Git provider. The Git provider (such as, GitHub, GitLab) checks that the commit's signed email matches a verified email in your Git provider account. If they don’t match, the commit won't be marked as "verified." +- Keep your dbt Cloud email and Git provider's verified email in sync to avoid verification issues. If you change your dbt Cloud email address: + - Generate a new GPG keypair with the updated email, following the [steps mentioned earlier](/docs/cloud/dbt-cloud-ide/git-commit-signing#generate-gpg-keypair-in-dbt-cloud). + - Add and verify the new email in your Git provider. + + + +## FAQs + + + + + +If you delete your GPG keypair in dbt Cloud, your Git commits will no longer be signed. You can generate a new GPG keypair by following the [steps mentioned earlier](/docs/cloud/dbt-cloud-ide/git-commit-signing#generate-gpg-keypair-in-dbt-cloud). + + + + +GitHub and GitLab support commit signing, while Azure DevOps does not. Commit signing is a [git feature](https://git-scm.com/book/ms/v2/Git-Tools-Signing-Your-Work), and is independent of any specific provider. However, not all providers support the upload of public keys, or the display of verification badges on commits. + + + + + +If your Git Provider does not explicitly support the uploading of public GPG keys, then +commits will still be signed using the private key, but no verification information will +be displayed by the provider. + + + + + +If your Git provider is configured to enforce commit verification, then unsigned commits +will be rejected. To avoid this, ensure that you have followed all previous steps to generate +a keypair, and uploaded the public key to the provider. + + diff --git a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md index 8d80483485c..36c6cc898dc 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/ide-user-interface.md @@ -35,7 +35,7 @@ The IDE streamlines your workflow, and features a popular user interface layout * Added (A) — The IDE detects added files * Deleted (D) — The IDE detects deleted files. - + 5. **Command bar —** The Command bar, located in the lower left of the IDE, is used to invoke [dbt commands](/reference/dbt-commands). When a command is invoked, the associated logs are shown in the Invocation History Drawer. @@ -107,15 +107,19 @@ Starting from dbt v1.6 or higher, when you save changes to a model, you can comp 3. **Build button —** The build button allows users to quickly access dbt commands related to the active model in the File Editor. The available commands include dbt build, dbt test, and dbt run, with options to include only the current resource, the resource and its upstream dependencies, the resource, and its downstream dependencies, or the resource with all dependencies. This menu is available for all executable nodes. -4. **Format button —** The editor has a **Format** button that can reformat the contents of your files. For SQL files, it uses either `sqlfmt` or `sqlfluff`, and for Python files, it uses `black`. +4. **Lint button** — The **Lint** button runs the [linter](/docs/cloud/dbt-cloud-ide/lint-format) on the active file in the File Editor. The linter checks for syntax errors and style issues in your code and displays the results in the **Code quality** tab. -5. **Results tab —** The Results console tab displays the most recent Preview results in tabular format. +5. **dbt Copilot** — [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful artificial intelligence engine that can generate documentation, tests, and semantic models for you. dbt Copilot is available in the IDE for Enterprise plans. + +6. **Results tab —** The Results console tab displays the most recent Preview results in tabular format. -6. **Compiled Code tab —** The Compile button triggers a compile invocation that generates compiled code, which is displayed in the Compiled Code tab. +7. **Code quality tab** — The Code Quality tab displays the results of the linter on the active file in the File Editor. It allows you to view code errors, provides code quality visibility and management, and displays the SQLFluff version used. + +8. **Compiled Code tab —** The Compile generates the compiled code when the Compile button is executed. The Compiled Code tab displays the compiled SQL code for the active file in the File Editor. -7. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). +9. **Lineage tab —** The Lineage tab in the File Editor displays the active model's lineage or . By default, it shows two degrees of lineage in both directions (`2+model_name+2`), however, you can change it to +model+ (full DAG). To use the lineage: - Double-click a node in the DAG to open that file in a new tab - Expand or shrink the DAG using node selection syntax. - Note, the `--exclude` flag isn't supported. @@ -158,11 +162,11 @@ Use menus and modals to interact with IDE and access useful options to help your - #### File Search You can easily search for and navigate between files using the File Navigation menu, which can be accessed by pressing Command-O or Control-O or clicking on the 🔍 icon in the File Explorer. - + - #### Global Command Palette The Global Command Palette provides helpful shortcuts to interact with the IDE, such as git actions, specialized dbt commands, and compile, and preview actions, among others. To open the menu, use Command-P or Control-P. - + - #### IDE Status modal The IDE Status modal shows the current error message and debug logs for the server. This also contains an option to restart the IDE. Open this by clicking on the IDE Status button. @@ -193,7 +197,7 @@ Use menus and modals to interact with IDE and access useful options to help your * Toggling between dark or light mode for a better viewing experience * Restarting the IDE - * Fully recloning your repository to refresh your git state and view status details + * Rollback your repo to remote, to refresh your git state and view status details * Viewing status details, including the IDE Status modal. - + diff --git a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md index d14435a97e0..abd3c86d4a8 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/lint-format.md @@ -81,7 +81,7 @@ To configure your own linting rules: :::tip Configure dbtonic linting rules -Refer to the [SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) to add the dbt code (or dbtonic) rules we use for our own projects: +Refer to the [Jaffle shop SQLFluff config file](https://github.com/dbt-labs/jaffle-shop-template/blob/main/.sqlfluff) for dbt-specific (or dbtonic) linting rules we use for our own projects:
dbtonic config code example provided by dbt Labs @@ -231,3 +231,4 @@ To avoid this, break up your model into smaller models (files) so that they are - [User interface](/docs/cloud/dbt-cloud-ide/ide-user-interface) - [Keyboard shortcuts](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) +- [SQL linting in CI jobs](/docs/deploy/continuous-integration#sql-linting) diff --git a/website/docs/docs/cloud/dbt-copilot-data.md b/website/docs/docs/cloud/dbt-copilot-data.md new file mode 100644 index 00000000000..b55681542e3 --- /dev/null +++ b/website/docs/docs/cloud/dbt-copilot-data.md @@ -0,0 +1,29 @@ +--- +title: "dbt Copilot privacy and data" +sidebar_label: "dbt Copilot privacy" +description: "dbt Copilot is a powerful AI engine to help you deliver data that works." +--- + +# dbt Copilot privacy and data + +dbt Labs is committed to protecting your privacy and data. This page provides information about how the dbt Copilot AI engine handles your data. + +#### Is my data used by dbt Labs to train AI models? + +No, dbt Copilot does not use client warehouse data to train any AI models. It uses API calls to an AI provider. + +#### Does dbt Labs share my personal data with third parties + +dbt Labs only shares client personal information as needed to perform the services, under client instructions, or for legal, tax, or compliance reasons. + +#### Does dbt Copilot store or use personal data? + +The user clicks the dbt Copilot button, and the user does not otherwise enter data. + +#### Does dbt Copilot access my warehouse data? + +dbt Copilot utilizes metadata, including column names, model SQL, the model's name, and model documentation. The row-level data from the warehouse is never used or sent to a third-party provider. Such output must be double-checked by the user for completeness and accuracy. + +#### Can dbt Copilot data be deleted upon client written request? + +The data from using dbt Copilot, aside from usage data, _doesn't_ persist on dbt Labs systems. Usage data is retained by dbt Labs. dbt Labs doesn't have possession of any personal or sensitive data. To the extent client identifies personal or sensitive information uploaded by or on behalf of client to dbt Labs systems, such data can be deleted within 30 days of written request. diff --git a/website/docs/docs/cloud/dbt-copilot.md b/website/docs/docs/cloud/dbt-copilot.md new file mode 100644 index 00000000000..bd2573e0ff8 --- /dev/null +++ b/website/docs/docs/cloud/dbt-copilot.md @@ -0,0 +1,27 @@ +--- +title: "About dbt Copilot" +sidebar_label: "About dbt Copilot" +description: "dbt Copilot is a powerful AI engine designed to accelerate your analytics workflows throughout your entire ADLC." +pagination_next: "docs/cloud/enable-dbt-copilot" +pagination_prev: null +--- + +# About dbt Copilot + +dbt Copilot is a powerful artificial intelligence (AI) engine that's fully integrated into your dbt Cloud experience and designed to accelerate your analytics workflows. dbt Copilot embeds AI-driven assistance across every stage of the analytics development life cycle (ADLC), empowering data practitioners to deliver data products faster, improve data quality, and enhance data accessibility. + +With automatic code generation, let dbt Copilot [generate code](/docs/cloud/use-dbt-copilot#generate-and-edit-code) using natural language, and [generate documentation](/docs/build/documentation), [tests](/docs/build/data-tests), and [semantic models](/docs/build/semantic-models) for you with the click of a button. + +:::tip Beta feature +dbt Copilot is designed to _help_ developers generate documentation, tests, and semantic models, as well as [code](/docs/cloud/use-dbt-copilot#generate-and-edit-code) using natural language, in dbt Cloud. It's available in beta, in the dbt Cloud IDE only. + +To use dbt Copilot, you must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing) and either agree to use dbt Labs' OpenAI key or provide your own Open AI API key. [Register here](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) or reach out to the Account Team if you're interested in joining the private beta. +::: + + + +## Feedback + +Please note: Always review AI-generated code and content as it may produce incorrect results. The features and/or functionality of dbt Copilot may be added or eliminated as part of the beta trial. + +To give feedback, please contact your dbt Labs account team. We appreciate your feedback and suggestions as we improve dbt Copilot. diff --git a/website/docs/docs/cloud/enable-dbt-assist.md b/website/docs/docs/cloud/enable-dbt-assist.md deleted file mode 100644 index 9432f858001..00000000000 --- a/website/docs/docs/cloud/enable-dbt-assist.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: "Enable dbt Assist" -sidebar_label: "Enable dbt Assist" -description: "Enable dbt Assist in dbt Cloud and leverage AI to speed up your development." ---- - -# Enable dbt Assist - -This page explains how to enable dbt Assist in dbt Cloud to leverage AI to speed up your development and allow you to focus on delivering quality data. - -## Prerequisites - -- Available in the dbt Cloud IDE only. -- Must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing). -- Development environment be ["Versionless"](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless). -- Current dbt Assist deployments use a central OpenAI API key managed by dbt Labs. In the future, you may provide your own key for Azure OpenAI or OpenAI. -- Accept and sign legal agreements. Reach out to your account team to begin this process. - -## Enable dbt Assist - -dbt Assist will only be available at an account level after your organization has signed the legal requirements. It will be disabled by default. Your dbt Cloud Admin(s) will enable it by following these steps: - -1. Navigate to **Account Settings** in the navigation menu. - -2. Under **Settings**, confirm the account you're enabling. - -3. Click **Edit** in the top right corner. - -4. To turn on dbt Assist, toggle the **Enable account access to AI-powered features** switch to the right. The toggle will slide to the right side, activating dbt Assist. - -5. Click **Save** and you should now have dbt Assist AI enabled to use. - -Note: To disable (only after enabled), repeat steps 1 to 3, toggle off in step 4, and repeat step 5. - - diff --git a/website/docs/docs/cloud/enable-dbt-copilot.md b/website/docs/docs/cloud/enable-dbt-copilot.md new file mode 100644 index 00000000000..2b954d1db5d --- /dev/null +++ b/website/docs/docs/cloud/enable-dbt-copilot.md @@ -0,0 +1,46 @@ +--- +title: "Enable dbt Copilot" +sidebar_label: "Enable dbt Copilot" +description: "Enable the dbt Copilot AI engine in dbt Cloud to speed up your development." +--- + +# Enable dbt Copilot + +This page explains how to enable the dbt Copilot engine in dbt Cloud, leveraging AI to speed up your development and allowing you to focus on delivering quality data. + +## Prerequisites + +- Available in the dbt Cloud IDE only. +- Must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing). +- Development environment is on a supported [release track](/docs/dbt-versions/cloud-release-tracks) to receive ongoing updates. +- By default, dbt Copilot deployments use a central OpenAI API key managed by dbt Labs. Alternatively, you can [provide your own OpenAI API key](#bringing-your-own-openai-api-key-byok). +- Accept and sign legal agreements. Reach out to your Account team to begin this process. + +## Enable dbt Copilot + +dbt Copilot is only available to your account after your organization has signed the required legal documents. It's disabled by default. A dbt Cloud admin can enable it by following these steps: + +1. Navigate to **Account settings** in the navigation menu. + +2. Under **Settings**, confirm the account you're enabling. + +3. Click **Edit** in the top right corner. + +4. Enable the **Enable account access to AI-powered features** option. + +5. Click **Save**. You should now have the dbt Copilot AI engine enabled for use. + +Note: To disable (only after enabled), repeat steps 1 to 3, toggle off in step 4, and repeat step 5. + + + +## Bringing your own OpenAI API key (BYOK) + +Once AI features have been enabled, you can provide your organization's OpenAI API key. dbt Cloud will then leverage your OpenAI account and terms to power dbt Copilot. This will incur billing charges to your organization from OpenAI for requests made by dbt Copilot. + +Configure AI keys using: +- [dbt Labs-managed OpenAI API key](/docs/cloud/account-integrations?ai-integration=dbtlabs#ai-integrations) +- Your own [OpenAI API key](/docs/cloud/account-integrations?ai-integration=openai#ai-integrations) +- [Azure OpenAI](/docs/cloud/account-integrations?ai-integration=azure#ai-integrations) + +For configuration details, see [Account integrations](/docs/cloud/account-integrations#ai-integrations). diff --git a/website/docs/docs/cloud/git/authenticate-azure.md b/website/docs/docs/cloud/git/authenticate-azure.md index 42028bf993b..5278c134f72 100644 --- a/website/docs/docs/cloud/git/authenticate-azure.md +++ b/website/docs/docs/cloud/git/authenticate-azure.md @@ -13,9 +13,9 @@ If you use the dbt Cloud IDE or dbt Cloud CLI to collaborate on your team's Azur Connect your dbt Cloud profile to Azure DevOps using OAuth: -1. Click the gear icon at the top right and select **Profile settings**. -2. Click **Linked Accounts**. -3. Next to Azure DevOps, click **Link**. +1. Click your account name at the bottom of the left-side menu and click **Account settings** +2. Scroll down to **Your profile** and select **Personal profile**. +3. Go to the **Linked accounts** section in the middle of the page. 4. Once you're redirected to Azure DevOps, sign into your account. diff --git a/website/docs/docs/cloud/git/connect-azure-devops.md b/website/docs/docs/cloud/git/connect-azure-devops.md index f6c0ee634fc..f3bb07a12d0 100644 --- a/website/docs/docs/cloud/git/connect-azure-devops.md +++ b/website/docs/docs/cloud/git/connect-azure-devops.md @@ -4,6 +4,8 @@ id: "connect-azure-devops" pagination_next: "docs/cloud/git/setup-azure" --- +# Connect to Azure DevOps + diff --git a/website/docs/docs/cloud/git/connect-github.md b/website/docs/docs/cloud/git/connect-github.md index f230f70e1f6..df5c6cb0728 100644 --- a/website/docs/docs/cloud/git/connect-github.md +++ b/website/docs/docs/cloud/git/connect-github.md @@ -17,7 +17,7 @@ Connecting your GitHub account to dbt Cloud provides convenience and another lay * **Note** — [Single tenant](/docs/cloud/about-cloud/tenancy#single-tenant) accounts offer enhanced connection options for integrating with an On-Premises GitHub deployment setup using the native integration. This integration allows you to use all the features of the integration, such as triggering CI builds. The dbt Labs infrastructure team will coordinate with you to ensure any additional networking configuration requirements are met and completed. To discuss details, contact dbt Labs support or your dbt Cloud account team. - You _must_ be a **GitHub organization owner** in order to [install the dbt Cloud application](/docs/cloud/git/connect-github#installing-dbt-cloud-in-your-github-account) in your GitHub organization. To learn about GitHub organization roles, see the [GitHub documentation](https://docs.github.com/en/organizations/managing-peoples-access-to-your-organization-with-roles/roles-in-an-organization). - The GitHub organization owner requires [_Owner_](/docs/cloud/manage-access/self-service-permissions) or [_Account Admin_](/docs/cloud/manage-access/enterprise-permissions) permissions when they log into dbt Cloud to integrate with a GitHub environment using organizations. -- You may need to temporarily provide an extra dbt Cloud user account with _Owner_ or _Account Admin_ [permissions](/docs/cloud/manage-access/self-service-permissions) for your GitHub organization owner until they complete the installation. +- You may need to temporarily provide an extra dbt Cloud user account with _Owner_ or _Account Admin_ [permissions](/docs/cloud/manage-access/enterprise-permissions) for your GitHub organization owner until they complete the installation. ## Installing dbt Cloud in your GitHub account @@ -25,19 +25,21 @@ Connecting your GitHub account to dbt Cloud provides convenience and another lay You can connect your dbt Cloud account to GitHub by installing the dbt Cloud application in your GitHub organization and providing access to the appropriate repositories. To connect your dbt Cloud account to your GitHub account: -1. Navigate to **Your Profile** settings by clicking the gear icon in the top right. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. -2. Select **Linked Accounts** from the left menu. +2. Select **Personal profile** under the **Your profile** section. - +3. Scroll down to **Linked accounts**. -3. In the **Linked Accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. + -4. Select the GitHub organization and repositories dbt Cloud should access. +4. In the **Linked accounts** section, set up your GitHub account connection to dbt Cloud by clicking **Link** to the right of GitHub. This redirects you to your account on GitHub where you will be asked to install and configure the dbt Cloud application. + +5. Select the GitHub organization and repositories dbt Cloud should access. -5. Assign the dbt Cloud GitHub App the following permissions: +6. Assign the dbt Cloud GitHub App the following permissions: - Read access to metadata - Read and write access to Checks - Read and write access to Commit statuses @@ -46,8 +48,8 @@ To connect your dbt Cloud account to your GitHub account: - Read and write access to Webhooks - Read and write access to Workflows -6. Once you grant access to the app, you will be redirected back to dbt Cloud and shown a linked account success state. You are now personally authenticated. -7. Ask your team members to individually authenticate by connecting their [personal GitHub profiles](#authenticate-your-personal-github-account). +7. Once you grant access to the app, you will be redirected back to dbt Cloud and shown a linked account success state. You are now personally authenticated. +8. Ask your team members to individually authenticate by connecting their [personal GitHub profiles](#authenticate-your-personal-github-account). ## Limiting repository access in GitHub If you are your GitHub organization owner, you can also configure the dbt Cloud GitHub application to have access to only select repositories. This configuration must be done in GitHub, but we provide an easy link in dbt Cloud to start this process. @@ -67,14 +69,16 @@ After the dbt Cloud administrator [sets up a connection](/docs/cloud/git/connect To connect a personal GitHub account: -1. Navigate to **Your Profile** settings by clicking the gear icon in the top right. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. + +2. Select **Personal profile** under the **Your profile** section. -2. Select **Linked Accounts** in the left menu. If your GitHub account is not connected, you’ll see "No connected account". +3. Scroll down to **Linked accounts**. If your GitHub account is not connected, you’ll see "No connected account". -3. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. +4. Select **Link** to begin the setup process. You’ll be redirected to GitHub, and asked to authorize dbt Cloud in a grant screen. -4. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. +5. Once you approve authorization, you will be redirected to dbt Cloud, and you should now see your connected account. You can now use the dbt Cloud IDE or dbt Cloud CLI. diff --git a/website/docs/docs/cloud/git/connect-gitlab.md b/website/docs/docs/cloud/git/connect-gitlab.md index f68f09ae73d..d16cdb15b8e 100644 --- a/website/docs/docs/cloud/git/connect-gitlab.md +++ b/website/docs/docs/cloud/git/connect-gitlab.md @@ -10,6 +10,7 @@ Connecting your GitLab account to dbt Cloud provides convenience and another lay - Clone repos using HTTPS rather than SSH. - Carry GitLab user permissions through to dbt Cloud or dbt Cloud CLI's git actions. - Trigger [Continuous integration](/docs/deploy/continuous-integration) builds when merge requests are opened in GitLab. + - GitLab automatically registers a webhook in your GitLab repository to enable seamless integration with dbt Cloud. The steps to integrate GitLab in dbt Cloud depend on your plan. If you are on: - the Developer or Team plan, read these [instructions](#for-dbt-cloud-developer-and-team-tiers). @@ -18,11 +19,12 @@ The steps to integrate GitLab in dbt Cloud depend on your plan. If you are on: ## For dbt Cloud Developer and Team tiers To connect your GitLab account: -1. Navigate to Your Profile settings by clicking the gear icon in the top right. -2. Select **Linked Accounts** in the left menu. -3. Click **Link** to the right of your GitLab account. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. +2. Select **Personal profile** under the **Your profile** section. +3. Scroll down to **Linked accounts**. +4. Click **Link** to the right of your GitLab account. - + When you click **Link**, you will be redirected to GitLab and prompted to sign into your account. GitLab will then ask for your explicit authorization: @@ -60,8 +62,8 @@ In GitLab, when creating your Group Application, input the following: | ------ | ----- | | **Name** | dbt Cloud | | **Redirect URI** | `https://YOUR_ACCESS_URL/complete/gitlab` | -| **Confidential** | ✔️ | -| **Scopes** | ✔️ api | +| **Confidential** | ✅ | +| **Scopes** | ✅ api | Replace `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/about-cloud/access-regions-ip-addresses) for your region and plan. @@ -99,7 +101,13 @@ Once you've accepted, you should be redirected back to dbt Cloud, and your integ ### Personally authenticating with GitLab dbt Cloud developers on the Enterprise plan must each connect their GitLab profiles to dbt Cloud, as every developer's read / write access for the dbt repo is checked in the dbt Cloud IDE or dbt Cloud CLI. -To connect a personal GitLab account, dbt Cloud developers should navigate to Your Profile settings by clicking the gear icon in the top right, then select **Linked Accounts** in the left menu. +To connect a personal GitLab account: + +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. + +2. Select **Personal profile** under the **Your profile** section. + +3. Scroll down to **Linked accounts**. If your GitLab account is not connected, you’ll see "No connected account". Select **Link** to begin the setup process. You’ll be redirected to GitLab, and asked to authorize dbt Cloud in a grant screen. @@ -107,20 +115,10 @@ If your GitLab account is not connected, you’ll see "No connected account". Se Once you approve authorization, you will be redirected to dbt Cloud, and you should see your connected account. You're now ready to start developing in the dbt Cloud IDE or dbt Cloud CLI. - ## Troubleshooting -### Errors when importing a repository on dbt Cloud project set up -If you do not see your repository listed, double-check that: -- Your repository is in a Gitlab group you have access to. dbt Cloud will not read repos associated with a user. - -If you do see your repository listed, but are unable to import the repository successfully, double-check that: -- You are a maintainer of that repository. Only users with maintainer permissions can set up repository connections. - -If you imported a repository using the dbt Cloud native integration with GitLab, you should be able to see the clone strategy is using a `deploy_token`. If it's relying on an SSH key, this means the repository was not set up using the native GitLab integration, but rather using the generic git clone option. The repository must be reconnected in order to get the benefits described above. - -## FAQs - + + diff --git a/website/docs/docs/cloud/git/import-a-project-by-git-url.md b/website/docs/docs/cloud/git/import-a-project-by-git-url.md index 90c54dbb1b1..2b499b39cb7 100644 --- a/website/docs/docs/cloud/git/import-a-project-by-git-url.md +++ b/website/docs/docs/cloud/git/import-a-project-by-git-url.md @@ -14,8 +14,8 @@ You must use the `git@...` or `ssh:..`. version of your git URL, not the `https: After importing a project by Git URL, dbt Cloud will generate a Deploy Key for your repository. To find the deploy key in dbt Cloud: -1. Click the gear icon in the upper right-hand corner. -2. Click **Account Settings** --> **Projects** and select a project. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. +2. Go to **Projects** and select a project. 3. Click the **Repository** link to the repository details page. 4. Copy the key under the **Deploy Key** section. @@ -49,7 +49,7 @@ If you use GitLab, you can import your repo directly using [dbt Cloud's GitLab A - To add a deploy key to a GitLab account, navigate to the [SSH keys](https://gitlab.com/profile/keys) tab in the User Settings page of your GitLab account. - Next, paste in the deploy key generated by dbt Cloud for your repository. - After saving this SSH key, dbt Cloud will be able to read and write files in your GitLab repository. -- Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/ssh/#per-repository-deploy-keys) +- Refer to [Adding a read only deploy key in GitLab](https://docs.gitlab.com/ee/user/project/deploy_keys/) diff --git a/website/docs/docs/cloud/git/setup-azure.md b/website/docs/docs/cloud/git/setup-azure.md index 6fdb2517f1a..c6213b49453 100644 --- a/website/docs/docs/cloud/git/setup-azure.md +++ b/website/docs/docs/cloud/git/setup-azure.md @@ -17,7 +17,7 @@ To use our native integration with Azure DevOps in dbt Cloud, an account admin n 4. [Connect Azure DevOps to your new app](#connect-azure-devops-to-your-new-app). 5. [Add your Entra ID app to dbt Cloud](#add-your-azure-ad-app-to-dbt-cloud). -Once the Microsoft Entra ID app is added to dbt Cloud, an account admin must also [connect a service user](#connecting-a-service-user) via OAuth, which will be used to power headless actions in dbt Cloud such as deployment runs and CI. +Once the Microsoft Entra ID app is added to dbt Cloud, an account admin must also [connect a service user](/docs/cloud/git/setup-azure#connect-a-service-user) via OAuth, which will be used to power headless actions in dbt Cloud such as deployment runs and CI. Once the Microsoft Entra ID app is added to dbt Cloud and the service user is connected, then dbt Cloud developers can personally authenticate in dbt Cloud from Azure DevOps. For more on this, see [Authenticate with Azure DevOps](/docs/cloud/git/authenticate-azure). @@ -89,7 +89,7 @@ An Azure admin will need one of the following permissions in both the Microsoft - Azure Service Administrator - Azure Co-administrator -If your Azure DevOps account is connected to Entra ID, then you can proceed to [Connecting a service user](#connecting-a-service-user). However, if you're just getting set up, connect Azure DevOps to the Microsoft Entra ID app you just created: +If your Azure DevOps account is connected to Entra ID, then you can proceed to [Connect a service user](#connect-a-service-user). However, if you're just getting set up, connect Azure DevOps to the Microsoft Entra ID app you just created: 1. From your Azure DevOps account, select **Organization settings** in the bottom left. 2. Navigate to Microsoft Entra ID. @@ -155,7 +155,7 @@ The service user's permissions will also power which repositories a team can sel While it's common to enforce multi-factor authentication (MFA) for normal user accounts, service user authentication must not need an extra factor. If you enable a second factor for the service user, this can interrupt production runs and cause a failure to clone the repository. In order for the OAuth access token to work, the best practice is to remove any more burden of proof of identity for service users. -As a result, MFA must be explicity disabled in the Office 365 or Microsoft Entra ID administration panel for the service user. Just having it "un-connected" will not be sufficient, as dbt Cloud will be prompted to set up MFA instead of allowing the credentials to be used as intended. +As a result, MFA must be explicitly disabled in the Office 365 or Microsoft Entra ID administration panel for the service user. Just having it "un-connected" will not be sufficient, as dbt Cloud will be prompted to set up MFA instead of allowing the credentials to be used as intended. **To disable MFA for a single user using the Office 365 Administration console:** @@ -373,6 +373,16 @@ A dbt Cloud account admin with access to the service user's Azure DevOps account Once connected, dbt Cloud displays the email address of the service user so you know which user's permissions are enabling headless actions in deployment environments. To change which account is connected, disconnect the profile in dbt Cloud, sign into the alternative Azure DevOps service account, and re-link the account in dbt Cloud. +### Using Azure AD for SSO with dbt Cloud and Microsoft tools + +If you're using Azure AD for SSO with dbt Cloud and Microsoft tools, the SSO flow may sometimes direct your account admin to their personal user account instead of the service user. If this happens, follow these steps to resolve it: + +1. Sign in to the service user's Azure DevOps account (ensure they are also connected to dbt Cloud through SSO). +2. When connected to dbt Cloud, sign out of Azure AD through the [Azure portal](https://portal.azure.com/). +3. Disconnect the service user in dbt Cloud, and follow the steps to set it up again. +4. You should then be prompted to enter service user credentials. + + :::info Personal Access Tokens (PATs) dbt Cloud leverages the service user to generate temporary access tokens called [PATs](https://learn.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-authenticate?toc=%2Fazure%2Fdevops%2Fmarketplace-extensibility%2Ftoc.json&view=azure-devops&tabs=Windows). diff --git a/website/docs/docs/cloud/manage-access/about-access.md b/website/docs/docs/cloud/manage-access/about-access.md index 64826531245..b9d23b28add 100644 --- a/website/docs/docs/cloud/manage-access/about-access.md +++ b/website/docs/docs/cloud/manage-access/about-access.md @@ -8,142 +8,211 @@ pagination_prev: null :::info "User access" is not "Model access" -**User groups and access** and **model groups and access** mean two different things. "Model groups and access" is a specific term used in the language of dbt-core. Refer to [Model access](/docs/collaborate/govern/model-access) for more info on what it means in dbt-core. +This page is specific to user groups and access, which includes: +- User licenses, permissions, and group memberships +- Role-based access controls for projects and environments +- Single sign-on and secure authentication -::: +"Model groups and access" is a feature specific to models and their availability across projects. Refer to [Model access](/docs/collaborate/govern/model-access) for more info on what it means for your dbt projects. -dbt Cloud administrators can use dbt Cloud's permissioning model to control -user-level access in a dbt Cloud account. This access control comes in two flavors: -License-based and Role-based. +::: -- **License-based Access Controls:** User are configured with account-wide - license types. These licenses control the specific parts of the dbt Cloud application - that a given user can access. -- **Role-based Access Control (RBAC):** Users are assigned to _groups_ that have - specific permissions on specific projects or the entire account. A user may be - a member of multiple groups, and those groups may have permissions on multiple - projects. +# About user access +You can regulate access to dbt Cloud by various measures, including licenses, groups, permissions, and role-based access control (RBAC). To understand the possible approaches to user access to dbt Cloud features and functionality, you should first know how we approach users and groups. -## License-based access control +### Users -Each user on an account is assigned a license type when the user is first -invited to a given account. This license type may change over time, but a -user can only have one type of license at any given time. +Individual users in dbt Cloud can be people you [manually invite](/docs/cloud/manage-access/invite-users) or grant access via an external identity provider (IdP), such as Microsoft Entra ID, Okta, or Google Workspace. -A user's license type controls the features in dbt Cloud that the user is able -to access. dbt Cloud's three license types are: +In either scenario, when you add a user to dbt Cloud, they are assigned a [license](#licenses). You assign licenses at the individual user or group levels. When you manually invite a user, you will assign the license in the invitation window. - - **Developer** — User may be granted _any_ permissions. - - **Read-Only** — User has read-only permissions applied to all dbt Cloud resources regardless of the role-based permissions that the user is assigned. - - **IT** — User has [Security Admin](/docs/cloud/manage-access/enterprise-permissions#security-admin) and [Billing Admin](/docs/cloud/manage-access/enterprise-permissions#billing-admin) permissions applied regardless of the role-based permissions that the user is assigned. + -For more information on these license types, see [Seats & Users](/docs/cloud/manage-access/seats-and-users). +You can edit an existing user's license by navigating to the **Users** section of the **Account settings**, clicking on a user, and clicking **Edit** on the user pane. Delete users from this same window to free up licenses for new users. -## Role-based access control + -:::info dbt Cloud Enterprise -Role-based access control is a feature of the dbt Cloud Enterprise plan +### Groups -::: +Groups in dbt Cloud serve much of the same purpose as they do in traditional directory tools — to gather individual users together to make bulk assignment of permissions easier. Admins use groups in dbt Cloud to assign [licenses](#licenses) and [permissions](#permissions). The permissions are more granular than licenses, and you only assign them at the group level; _you can’t assign permissions at the user level._ Every user in dbt Cloud must be assigned to at least one group. -Role-based access control allows for fine-grained permissioning in the dbt Cloud -application. With role-based access control, users can be assigned varying -permissions to different projects within a dbt Cloud account. For teams on the -Enterprise tier, role-based permissions can be generated dynamically from -configurations in an [Identity Provider](sso-overview). +There are three default groups available as soon as you create your dbt Cloud account (the person who created the account is added to all three automatically): -Role-based permissions are applied to _groups_ and pertain to _projects_. The -assignable permissions themselves are granted via _permission sets_. +- **Owner:** This group is for individuals responsible for the entire account and will give them elevated account admin privileges. You cannot change the permissions. +- **Member:** This group is for the general members of your organization, who will also have full access to the account. You cannot change the permissions. By default, dbt Cloud adds new users to this group. +- **Everyone:** A general group for all members of your organization. Customize the permissions to fit your organizational needs. By default, dbt Cloud adds new users to this group. +We recommend deleting the default `Owner`, `Member`, and `Everyone` groups before deploying and replacing them with your organizational groups. This prevents users from receiving more elevated privileges than they should and helps admins ensure they are properly placed. -### Groups +Create new groups from the **Groups & Licenses** section of the **Account settings**. If you use an external IdP for SSO, you can sync those SSO groups to dbt Cloud from the **Group details** pane when creating or editing existing groups. -A group is a collection of users. Users may belong to multiple groups. Members -of a group inherit any permissions applied to the group itself. + -Users can be added to a dbt Cloud group based on their group memberships in the -configured [Identity Provider](sso-overview) for the account. In this way, dbt -Cloud administrators can manage access to dbt Cloud resources via identity -management software like Microsoft Entra ID (formerly Azure AD), Okta, or GSuite. See _SSO Mappings_ below for -more information. +:::important -You can view the groups in your account or create new groups from the **Groups & Licenses** -page in your Account Settings.
+If a user is assigned licenses and permissions from multiple groups, the group that grants the most access will take precedence. You must assign a permission set to any groups created beyond the three defaults, or users assigned will not have access to features beyond their user profile. - +::: -### SSO mappings +#### SSO mappings -SSO Mappings connect Identity Provider (IdP) group membership to dbt Cloud group -membership. When a user logs into dbt Cloud via a supported identity provider, -their IdP group memberships are synced with dbt Cloud. Upon logging in -successfully, the user's group memberships (and therefore, permissions) are -adjusted accordingly within dbt Cloud automatically. +SSO Mappings connect an identity provider (IdP) group membership to a dbt Cloud group. When users log into dbt Cloud via a supported identity provider, their IdP group memberships sync with dbt Cloud. Upon logging in successfully, the user's group memberships (and permissions) will automatically adjust within dbt Cloud. :::tip Creating SSO Mappings -While dbt Cloud supports mapping multiple IdP groups to a single dbt Cloud -group, we recommend using a 1:1 mapping to make administration as simple as -possible. Consider using the same name for your dbt Cloud groups and your IdP -groups. +While dbt Cloud supports mapping multiple IdP groups to a single dbt Cloud group, we recommend using a 1:1 mapping to make administration as simple as possible. Use the same names for your dbt Cloud groups and your IdP groups. ::: +Create an SSO mapping in the group view: + +1. Open an existing group to edit or create a new group. +2. In the **SSO** portion of the group screen, enter the name of the SSO group exactly as it appears in the IdP. If the name is not the same, the users will not be properly placed into the group. +3. In the **Users** section, ensure the **Add all users by default** option is disabled. +4. Save the group configuration. New SSO users will be added to the group upon login, and existing users will be added to the group upon their next login. + + + +Refer to [role-based access control](#role-based-access-control) for more information about mapping SSO groups for user assignment to dbt Cloud groups. + +## Grant access + +dbt Cloud users have both a license (assigned to an individual user or by group membership) and permissions (by group membership only) that determine what actions they can take. Licenses are account-wide, and permissions provide more granular access or restrictions to specific features. + +### Licenses + +Every user in dbt Cloud will have a license assigned. Licenses consume "seats" which impact how your account is [billed](/docs/cloud/billing), depending on your [service plan](https://www.getdbt.com/pricing). + +There are three license types in dbt Cloud: + +- **Developer** — User can be granted _any_ permissions. +- **Read-Only** — User has read-only permissions applied to all dbt Cloud resources regardless of the role-based permissions that the user is assigned. +- **IT** — User has Security Admin and Billing Admin [permissions](/docs/cloud/manage-access/enterprise-permissions) applied, regardless of the group permissions assigned. + +Developer licenses will make up a majority of the users in your environment and have the highest impact on billing, so it's important to monitor how many you have at any given time. + +For more information on these license types, see [Seats & Users](/docs/cloud/manage-access/seats-and-users) + +### Permissions + +Permissions determine what a developer-licensed user can do in your dbt Cloud account. By default, members of the `Owner` and `Member` groups have full access to all areas and features. When you want to restrict access to features, assign users to groups with stricter permission sets. Keep in mind that if a user belongs to multiple groups, the most permissive group will take precedence. + +The permissions available depends on whether you're on an [Enterprise](/docs/cloud/manage-access/enterprise-permissions) or [self-service Team](/docs/cloud/manage-access/self-service-permissions) plan. Developer accounts only have a single user, so permissions aren't applicable. + + + +Some permissions (those that don't grant full access, like admins) allow groups to be "assigned" to specific projects and environments only. Read about [environment-level permissions](/docs/cloud/manage-access/environment-permissions-setup) for more information on restricting environment access. + + + +## Role-based access control + +Role-based access control (RBAC) allows you to grant users access to features and functionality based on their group membership. With this method, you can grant users varying access levels to different projects and environments. You can take access and security to the next level by integrating dbt Cloud with a third-party identity provider (IdP) to grant users access when they authenticate with your SSO or OAuth service. + +There are a few things you need to know before you configure RBAC for SSO users: +- New SSO users join any groups with the **Add all new users by default** option enabled. By default, the `Everyone` and `Member` groups have this option enabled. Disable this option across all groups for the best RBAC experience. +- You must have the appropriate SSO groups configured in the group details SSO section. If the SSO group name does not match _exactly_, users will not be placed in the group correctly. + +- dbt Labs recommends that your dbt Cloud group names match the IdP group names. + +Let's say you have a new employee being onboarded into your organization using [Okta](/docs/cloud/manage-access/set-up-sso-okta) as the IdP and dbt Cloud groups with SSO mappings. In this scenario, users are working on `The Big Project` and a new analyst named `Euclid Ean` is joining the group. + +Check out the following example configurations for an idea of how you can implement RBAC for your organization (these examples assume you have already configured [SSO](/docs/cloud/manage-access/sso-overview)): + + -### Permission sets +You and your IdP team add `Euclid Ean` to your Okta environment and assign them to the `dbt Cloud` SSO app via a group called `The Big Project`. -Permission sets are predefined collections of granular permissions. Permission -sets combine low-level permission grants into high-level roles that can be -assigned to groups. Some examples of existing permission sets are: - - Account Admin - - Git Admin - - Job Admin - - Job Viewer - - ...and more + -For a full list of enterprise permission sets, see [Enterprise Permissions](/docs/cloud/manage-access/enterprise-permissions). -These permission sets are available for assignment to groups and control the ability -for users in these groups to take specific actions in the dbt Cloud application. +Configure the group attribute statements the `dbt Cloud` application in Okta. The group statements in the following example are set to the group name exactly (`The Big Project`), but yours will likely be a much broader configuration. Companies often use the same prefix across all dbt groups in their IdP. For example `DBT_GROUP_` -In the following example, the _dbt Cloud Owners_ group is configured with the -**Account Admin** permission set on _All Projects_ and the **Job Admin** permission -set on the _Internal Analytics_ project. + - + + -### Manual assignment +You and your dbt Cloud admin team configure the groups in your account's settings: +1. Navigate to the **Account settings** and click **Groups & Licenses** on the left-side menu. +2. Click **Create group** or select an existing group and click **Edit**. +3. Enter the group name in the **SSO** field. +4. Configure the **Access and permissions** fields to your needs. Select a [permission set](/docs/cloud/manage-access/enterprise-permissions), the project they can access, and [environment-level access](/docs/cloud/manage-access/environment-permissions). -dbt Cloud administrators can manually assign users to groups independently of -IdP attributes. If a dbt Cloud group is configured _without_ any -SSO Mappings, then the group will be _unmanaged_ and dbt Cloud will not adjust -group membership automatically when users log into dbt Cloud via an identity -provider. This behavior may be desirable for teams that have connected an identity -provider, but have not yet configured SSO Mappings between dbt Cloud and the -IdP. + -If an SSO Mapping is added to an _unmanaged_ group, then it will become -_managed_, and dbt Cloud may add or remove users to the group automatically at -sign-in time based on the user's IdP-provided group membership information. +Euclid is limited to the `Analyst` role, the `Jaffle Shop` project, and the `Development`, `Staging`, and `General` environments of that project. Euclid has no access to the `Production` environment in their role. + + + + +Euclid takes the following steps to log in: + +1. Access the SSO URL or the dbt Cloud app in their Okta account. The URL can be found on the **Single sign-on** configuration page in the **Account settings**. + + + +2. Login with their Okta credentials. + + + +3. Since it's their first time logging in with SSO, Euclid Ean is presented with a message and no option to move forward until they check the email address associated with their Okta account. + + + +4. They now open their email and click the link to join dbt Labs, which completes the process. + + + +Euclid is now logged in to their account. They only have access to the `Jaffle Shop` pr, and the project selection option is removed from their UI entirely. + + + +They can now configure development credentials. The `Production` environment is visible, but it is `read-only`, and they have full access in the `Staging` environment. + + + + + + + +With RBAC configured, you now have granular control over user access to features across dbt Cloud. ## FAQs -- **When are IdP group memberships updated for SSO Mapped groups?**
- Group memberships are updated whenever a user logs into dbt Cloud via a supported SSO provider. If you've changed group memberships in your identity provider or dbt Cloud, ask your users to log back into dbt Cloud to synchronize these group memberships. -- **Can I set up SSO without RBAC?**
+ + +Group memberships are updated whenever a user logs into dbt Cloud via a supported SSO provider. If you've changed group memberships in your identity provider or dbt Cloud, ask your users to log back into dbt Cloud to synchronize these group memberships. + + + + + Yes, see the documentation on [Manual Assignment](#manual-assignment) above for more information on using SSO without RBAC. -- **Can I configure a user's License Type based on IdP Attributes?**
- Yes, see the docs on [managing license types](/docs/cloud/manage-access/seats-and-users#managing-license-types) for more information. -- **Why can't I edit a user's group membership?**
-Make sure you're not trying to edit your own user as this isn't allowed for security reasons. To edit the group membership of your own user, you'll need a different user to make those changes. +
+ + -- **How do I add or remove users**?
-Each dbt Cloud plan comes with a base number of Developer and Read-Only licenses. You can add or remove licenses by modifying the number of users in your account settings. - - If you're on an Enterprise plans and have the correct [permissions](/docs/cloud/manage-access/enterprise-permissions), you can add or remove developers by adjusting your developer user seat count in **Account settings** -> **Users**. +Yes, see the docs on [managing license types](/docs/cloud/manage-access/seats-and-users#managing-license-types) for more information. + +
+ + + +Don't try to edit your own user, as this isn't allowed for security reasons. You'll need a different user to make changes to your own user's group membership. + + + + + +Each dbt Cloud plan has a base number of Developer and Read-Only licenses. You can add or remove licenses by modifying the number of users in your account settings. + - If you're on an Enterprise plan and have the correct [permissions](/docs/cloud/manage-access/enterprise-permissions), you can add or remove developers by adjusting your developer user seat count in **Account settings** -> **Users**. - If you're on a Team plan and have the correct [permissions](/docs/cloud/manage-access/self-service-permissions), you can add or remove developers by making two changes: adjust your developer user seat count AND your developer billing seat count in **Account settings** -> **Users** and then in **Account settings** -> **Billing**. - Refer to [Users and licenses](/docs/cloud/manage-access/seats-and-users#licenses) for detailed steps. +For detailed steps, refer to [Users and licenses](/docs/cloud/manage-access/seats-and-users#licenses). + + \ No newline at end of file diff --git a/website/docs/docs/cloud/manage-access/audit-log.md b/website/docs/docs/cloud/manage-access/audit-log.md index 70ef4d66f8e..de52434be06 100644 --- a/website/docs/docs/cloud/manage-access/audit-log.md +++ b/website/docs/docs/cloud/manage-access/audit-log.md @@ -18,7 +18,7 @@ The dbt Cloud audit log stores all the events that occurred in your organization ## Accessing the audit log -To access the audit log, click the gear icon in the top right, then click **Audit Log**. +To access the audit log, click on your account name in the left side menu and select **Account settings**. @@ -32,7 +32,7 @@ On the audit log page, you will see a list of various events and their associate ### Event details -Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#events-in-audit-log). +Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#audit-log-events). The event details provide the key factors of an event: @@ -60,10 +60,9 @@ The audit log supports various events for different objects in dbt Cloud. You wi | Event Name | Event Type | Description | | -------------------------- | ---------------------------------------- | ------------------------------------------------------ | | Auth Provider Changed | auth_provider.Changed | Authentication provider settings changed | -| Credential Login Failed | auth.CredentialsLoginFailed | User login via username and password failed | | Credential Login Succeeded | auth.CredentialsLoginSucceeded | User successfully logged in with username and password | | SSO Login Failed | auth.SsoLoginFailed | User login via SSO failed | -| SSO Login Succeeded | auth.SsoLoginSucceeded | User successfully logged in via SSO +| SSO Login Succeeded | auth.SsoLoginSucceeded | User successfully logged in via SSO | ### Environment @@ -94,7 +93,7 @@ The audit log supports various events for different objects in dbt Cloud. You wi | ------------- | ----------------------------- | ------------------------------ | | Group Added | user_group.Added | New Group successfully created | | Group Changed | user_group.Changed | Group settings changed | -| Group Removed | user_group.Changed | Group successfully removed | +| Group Removed | user_group.Removed | Group successfully removed | ### User @@ -150,12 +149,65 @@ The audit log supports various events for different objects in dbt Cloud. You wi ### Credentials -| Event Name | Event Type | Description | -| -------------------------------- | ----------------------------- | -------------------------------- | +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| | Credentials Added to Project | credentials.Added | Project credentials added | | Credentials Changed in Project | credentials.Changed | Credentials changed in project | | Credentials Removed from Project | credentials.Removed | Credentials removed from project | + +### Git integration + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| GitLab Application Changed | gitlab_application.changed | GitLab configuration in dbt Cloud changed | + +### Webhooks + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Webhook Subscriptions Added | webhook_subscription.added | New webhook configured in settings | +| Webhook Subscriptions Changed | webhook_subscription.changed | Existing webhook configuration altered | +| Webhook Subscriptions Removed | webhook_subscription.removed | Existing webhook deleted | + + +### Semantic Layer + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Semantic Layer Config Added | semantic_layer_config.added | Semantic Layer config added | +| Semantic Layer Config Changed | semantic_layer_config.changed | Semantic Layer config (not related to credentials) changed | +| Semantic Layer Config Removed | semantic_layer_config.removed | Semantic Layer config removed | +| Semantic Layer Credentials Added | semantic_layer_credentials.added | Semantic Layer credentials added | +| Semantic Layer Credentials Changed| semantic_layer_credentials.changed | Semantic Layer credentials changed. Does not trigger semantic_layer_config.changed| +| Semantic Layer Credentials Removed| semantic_layer_credentials.removed | Semantic Layer credentials removed | + +### Extended attributes + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Extended Attribute Added | extended_attributes.added | Extended attribute added to a project | +| Extended Attribute Changed | extended_attributes.changed | Extended attribute changed or removed | + + +### Account-scoped personal access token + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| Account Scoped Personal Access Token Created | account_scoped_pat.created | An account-scoped PAT was created | +| Account Scoped Personal Access Token Deleted | account_scoped_pat.deleted | An account-scoped PAT was deleted | + +### IP restrictions + +| Event Name | Event Type | Description | +| -------------------------------- | ----------------------------- | -----------------------| +| IP Restrictions Toggled | ip_restrictions.toggled | IP restrictions feature enabled or disabled | +| IP Restrictions Rule Added | ip_restrictions.rule.added | IP restriction rule created | +| IP Restrictions Rule Changed | ip_restrictions.rule.changed | IP restriction rule edited | +| IP Restrictions Rule Removed | ip_restrictions.rule.removed | IP restriction rule deleted | + + + ## Searching the audit log You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window. diff --git a/website/docs/docs/cloud/manage-access/auth0-migration.md b/website/docs/docs/cloud/manage-access/auth0-migration.md index b7bab836810..2f45ad7dcc8 100644 --- a/website/docs/docs/cloud/manage-access/auth0-migration.md +++ b/website/docs/docs/cloud/manage-access/auth0-migration.md @@ -5,22 +5,10 @@ sidebar: "SSO Auth0 Migration" description: "Required actions for migrating to Auth0 for SSO services on dbt Cloud." --- -:::note - -This migration is a feature of the dbt Cloud Enterprise plan. To learn more about an Enterprise plan, contact us at [sales@getdbt.com](mailto::sales@getdbt.com). - -For single-tenant Virtual Private Cloud, you should [email dbt Cloud Support](mailto::support@getdbt.com) to set up or update your SSO configuration. - -::: - dbt Labs is partnering with Auth0 to bring enhanced features to dbt Cloud's single sign-on (SSO) capabilities. Auth0 is an identity and access management (IAM) platform with advanced security features, and it will be leveraged by dbt Cloud. These changes will require some action from customers with SSO configured in dbt Cloud today, and this guide will outline the necessary changes for each environment. If you have not yet configured SSO in dbt Cloud, refer instead to our setup guides for [SAML](/docs/cloud/manage-access/set-up-sso-saml-2.0), [Okta](/docs/cloud/manage-access/set-up-sso-okta), [Google Workspace](/docs/cloud/manage-access/set-up-sso-google-workspace), or [Microsoft Entra ID (formerly Azure AD)](/docs/cloud/manage-access/set-up-sso-microsoft-entra-id) single sign-on services. -## Auth0 Multi-tenant URIs - - - ## Start the migration The Auth0 migration feature is being rolled out incrementally to customers who have SSO features already enabled. When the migration option has been enabled on your account, you will see **SSO Updates Available** on the right side of the menu bar, near the settings icon. diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index f636be796d3..5628314c922 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -3,7 +3,7 @@ title: "Users and licenses" description: "Learn how dbt Cloud administrators can use licenses and seats to control access in a dbt Cloud account." id: "seats-and-users" sidebar: "Users and licenses" -pagination_next: "docs/cloud/manage-access/self-service-permissions" +pagination_next: "docs/cloud/manage-access/enterprise-permissions" pagination_prev: null --- @@ -49,13 +49,13 @@ The following tabs detail steps on how to modify your user license count: If you're on an Enterprise plan and have the correct [permissions](/docs/cloud/manage-access/enterprise-permissions), you can add or remove licenses by adjusting your user seat count. Note, an IT license does not count toward seat usage. -- To remove a user, go to **Account Settings** and select **Users**. +- To remove a user, click on your account name in the left side menu, click **Account settings** and select **Users**. - Select the user you want to remove, click **Edit**, and then **Delete**. - This action cannot be undone. However, you can re-invite the user with the same info if you deleted the user in error.
- To add a user, go to **Account Settings** and select **Users**. - Click the [**Invite Users**](/docs/cloud/manage-access/invite-users) button. - - For fine-grained permission configuration, refer to [Role based access control](/docs/cloud/manage-access/enterprise-permissions). + - For fine-grained permission configuration, refer to [Role based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control-). @@ -64,7 +64,7 @@ If you're on an Enterprise plan and have the correct [permissions](/docs/cloud/m If you're on a Team plan and have the correct [permissions](/docs/cloud/manage-access/self-service-permissions), you can add or remove developers. You'll need to make two changes: -- Adjust your developer user seat count, which manages the users invited to your dbt Cloud project. AND +- Adjust your developer user seat count, which manages the users invited to your dbt Cloud project. - Adjust your developer billing seat count, which manages the number of billable seats. @@ -75,7 +75,7 @@ You can add or remove developers by increasing or decreasing the number of users To add a user in dbt Cloud, you must be an account owner or have admin privileges. -1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. @@ -95,11 +95,11 @@ Great work! After completing those these steps, your dbt Cloud user count and bi To delete a user in dbt Cloud, you must be an account owner or have admin privileges. If the user has a `developer` license type, this will open up their seat for another user or allow the admins to lower the total number of seats. -1. From dbt Cloud, click the gear icon at the top right and select **Account Settings**. +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. -2. In **Account Settings** and select **Users**. +2. In **Account Settings**, select **Users**. 3. Select the user you want to delete, then click **Edit**. 4. Click **Delete** in the bottom left. Click **Confirm Delete** to immediately delete the user without additional password prompts. This action cannot be undone. However, you can re-invite the user with the same information if the deletion was made in error. @@ -124,9 +124,7 @@ Great work! After completing these steps, your dbt Cloud user count and billing ## Managing license types -Licenses can be assigned manually, or automatically based on IdP configuration -(enterprise only). By default, new users in an account will be assigned a -Developer license. +Licenses can be assigned to users individually or through group membership. To assign a license via group membership, you can manually add a user to a group during the invitation process or assign them to a group after they’ve enrolled in dbt Cloud. Alternatively, with [SSO configuration](/docs/cloud/manage-access/sso-overview) and [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control-) (Enterprise only), users can be automatically assigned to groups. By default, new users in an account are assigned a Developer license. ### Manual configuration @@ -142,16 +140,9 @@ change. -### Mapped configuration +### Mapped configuration -**Note:** This feature is only available on the Enterprise plan. - -If your account is connected to an Identity Provider (IdP) for [Single Sign -On](/docs/cloud/manage-access/sso-overview), you can automatically map IdP user -groups to specific license types in dbt Cloud. To configure license mappings, -navigate to the Account Settings > Team > License Mappings page. From -here, you can create or edit SSO mappings for both Read-Only and Developer -license types. +If your account is connected to an Identity Provider (IdP) for [Single Sign On](/docs/cloud/manage-access/sso-overview), you can automatically map IdP user groups to specific groups in dbt Cloud and assign license types to those groups. To configure license mappings, navigate to the **Account Settings** > **Groups & Licenses** > **License Mappings** page. From here, you can create or edit SSO mappings for both Read-Only and Developer license types. By default, all new members of a dbt Cloud account will be assigned a Developer license. To assign Read-Only licenses to certain groups of users, create a new diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index a1f6d795c23..5a56900d529 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -22,22 +22,14 @@ The following roles and permission sets are available for assignment in dbt Clou :::tip Licenses or Permission sets -The user's [license](/docs/cloud/manage-access/seats-and-users) type always overrides their assigned permission set. This means that even if a user belongs to a dbt Cloud group with 'Account Admin' permissions, having a 'Read-Only' license would still prevent them from performing administrative actions on the account. +The user's [license](/docs/cloud/manage-access/about-user-access) type always overrides their assigned permission set. This means that even if a user belongs to a dbt Cloud group with 'Account Admin' permissions, having a 'Read-Only' license would still prevent them from performing administrative actions on the account. ::: -## How to set up RBAC Groups in dbt Cloud +## Additional resources -Role-Based Access Control (RBAC) is helpful for automatically assigning permissions to dbt admins based on their SSO provider group associations. RBAC does not apply to [model groups](/docs/collaborate/govern/model-access#groups). +- [Grant users access](/docs/cloud/manage-access/about-user-access#grant-access) +- [Role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control-) +- [Environment-level permissions](/docs/cloud/manage-access/environment-permissions) -1. Click the gear icon to the top right and select **Account Settings**. Click **Groups & Licenses** - - - -2. Select an existing group or create a new group to add RBAC. Name the group (this can be any name you like, but it's recommended to keep it consistent with the SSO groups). If you have configured SSO with SAML 2.0, you may have to use the GroupID instead of the name of the group. -3. Configure the SSO provider groups you want to add RBAC by clicking **Add** in the **SSO** section. These fields are case-sensitive and must match the source group formatting. -4. Configure the permissions for users within those groups by clicking **Add** in the **Access** section of the window. - - -5. When you've completed your configurations, click **Save**. Users will begin to populate the group automatically once they have signed in to dbt Cloud with their SSO credentials. diff --git a/website/docs/docs/cloud/manage-access/environment-permissions-setup.md b/website/docs/docs/cloud/manage-access/environment-permissions-setup.md index 1a3f2724819..5b41477e456 100644 --- a/website/docs/docs/cloud/manage-access/environment-permissions-setup.md +++ b/website/docs/docs/cloud/manage-access/environment-permissions-setup.md @@ -15,7 +15,7 @@ Environment-level permissions are not the same as account-level [role-based acce In your dbt Cloud account: -1. Open the **gear menu** and select **Account settings**. From the left-side menu, select **Groups & Licenses**. While you can edit existing groups, we recommend not altering the default `Everyone`, `Member`, and `Owner` groups. +1. Click your account name, above your profile icon on the left side panel, then select **Account settings**. From there, select **Groups & Licenses**. While you can edit existing groups, we recommend not altering the default `Everyone`, `Member`, and `Owner` groups. diff --git a/website/docs/docs/cloud/manage-access/environment-permissions.md b/website/docs/docs/cloud/manage-access/environment-permissions.md index 44cf2dc9a64..20acfae51f7 100644 --- a/website/docs/docs/cloud/manage-access/environment-permissions.md +++ b/website/docs/docs/cloud/manage-access/environment-permissions.md @@ -17,8 +17,8 @@ Environment-level permissions give dbt Cloud admins more flexibility to protect - Environment-level permissions do not allow you to create custom roles and permissions for each resource type in dbt Cloud. - You can only select environment types, and can’t specify a particular environment within a project. -- You can't select specific resources within environments. dbt Cloud jobs, runs, and environment variables are all environment resources. - - For example, you can't specify that a user only has access to jobs but not environment variables. Access to a given environment gives the user access to everything within that environment. +- You can't select specific resources within environments. dbt Cloud jobs and runs are environment resources. + - For example, you can't specify that a user only has access to jobs but not runs. Access to a given environment gives the user access to everything within that environment. ## Environments and roles @@ -77,4 +77,4 @@ If the user has the same roles across projects, you can apply environment access ## Related docs --[Environment-level permissions setup](/docs/cloud/manage-access/environment-permissions-setup) +- [Environment-level permissions setup](/docs/cloud/manage-access/environment-permissions-setup) diff --git a/website/docs/docs/cloud/manage-access/external-oauth.md b/website/docs/docs/cloud/manage-access/external-oauth.md index 7ed9e4ef446..c25b44d1513 100644 --- a/website/docs/docs/cloud/manage-access/external-oauth.md +++ b/website/docs/docs/cloud/manage-access/external-oauth.md @@ -1,20 +1,17 @@ --- -title: "Set up external Oauth" +title: "Set up external OAuth" id: external-oauth -description: "Configuration instructions for dbt Cloud and external Oauth connections" -sidebar_label: "Set up external Oauth" -unlisted: true +description: "Configuration instructions for dbt Cloud and external OAuth connections" +sidebar_label: "Set up external OAuth" pagination_next: null pagination_prev: null --- -# Set up external Oauth +# Set up external OAuth -:::note Beta feature +:::note -External OAuth for authentication is available in a limited beta. If you are interested in joining the beta, please contact your account manager. - -This feature is currently only available for the Okta and Entra ID identity providers and Snowflake connections. Only available to Enterprise accounts. +This feature is currently only available for the Okta and Entra ID identity providers and [Snowflake connections](/docs/cloud/connect-data-platform/connect-snowflake). ::: @@ -23,7 +20,7 @@ dbt Cloud Enterprise supports [external OAuth authentication](https://docs.snow ## Getting started -The process of setting up external Oauth will require a little bit of back-and-forth between your dbt Cloud, IdP, and Snowflake accounts, and having them open in multiple browser tabs will help speed up the configuration process: +The process of setting up external OAuth will require a little bit of back-and-forth between your dbt Cloud, IdP, and Snowflake accounts, and having them open in multiple browser tabs will help speed up the configuration process: - **dbt Cloud:** You’ll primarily be working in the **Account Settings** —> **Integrations** page. You will need [proper permission](/docs/cloud/manage-access/enterprise-permissions) to set up the integration and create the connections. - **Snowflake:** Open a worksheet in an account that has permissions to [create a security integration](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration). @@ -34,7 +31,7 @@ If the admins that handle these products are all different people, it’s better ### Snowflake commands -The following is a template for creating the Oauth configurations in the Snowflake environment: +The following is a template for creating the OAuth configurations in the Snowflake environment: ```sql @@ -53,41 +50,45 @@ external_oauth_any_role_mode = 'ENABLE' The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_user_mapping_attribute` can be modified based on the your organizations needs. These values point to the claim in the users’ token. In the example, Snowflake will look up the Snowflake user whose `email` matches the value in the `sub` claim. -**Note:** The Snowflake default roles ACCOUNTADMIN, ORGADMIN, or SECURITYADMIN, are blocked from external Oauth by default and they will likely fail to authenticate. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration-oauth-external) for more information. +**Note:** The Snowflake default roles ACCOUNTADMIN, ORGADMIN, or SECURITYADMIN, are blocked from external OAuth by default and they will likely fail to authenticate. See the [Snowflake documentation](https://docs.snowflake.com/en/sql-reference/sql/create-security-integration-oauth-external) for more information. + +## Identity provider configuration -## Set up with Okta +Select a supported identity provider (IdP) for instructions on configuring external OAuth in their environment and completing the integration in dbt Cloud. + + ### 1. Initialize the dbt Cloud settings -1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. +1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations** -3. Leave this window open. You can set the **Integration type** to Okta and make a note of the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. +3. Leave this window open. You can set the **Integration type** to Okta and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ### 2. Create the Okta app -1. From the Okta dashboard, expand the **Applications** section and click **Applications.** Click the **Create app integration** button. +1. Expand the **Applications** section from the Okta dashboard and click **Applications.** Click the **Create app integration** button. 2. Select **OIDC** as the sign-in method and **Web applications** as the application type. Click **Next**. -3. Give the application an appropriate name, something like “External Oauth app for dbt Cloud” that will make it easily identifiable. +3. Give the application an appropriate name, something like “External OAuth app for dbt Cloud,” that will make it easily identifiable. 4. In the **Grant type** section, enable the **Refresh token** option. -5. Scroll down to the **Sign-in redirect URIs** option. Here, you’ll need to paste the redirect URI you gathered from dbt Cloud in step 1.3. +5. Scroll down to the **Sign-in redirect URIs** option. You’ll need to paste the redirect URI you gathered from dbt Cloud in step 1.3. - + -6. Save the app configuration. You’ll come back to it, but for now move on to the next steps. +6. Save the app configuration. You’ll come back to it, but move on to the next steps for now. ### 3. Create the Okta API -1. From the Okta sidebar menu, expand the **Security** section and clicl **API**. -2. On the API screen, click **Add authorization server**. Give the authorizations server a name (a nickname for your Snowflake account would be appropriate). For the **Audience** field, copy and paste your Snowflake login URL (for example, https://abdc-ef1234.snowflakecomputing.com). Give the server an appropriate description and click **Save**. +1. Expand the **Security** section and click **API** from the Okta sidebar menu. +2. On the API screen, click **Add authorization server**. Give the authorization server a name (a nickname for your Snowflake account would be appropriate). For the **Audience** field, copy and paste your Snowflake login URL (for example, https://abdc-ef1234.snowflakecomputing.com). Give the server an appropriate description and click **Save**. -3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. +3. On the authorization server config screen, open the **Metadata URI** in a new tab. You’ll need information from this screen in later steps. @@ -97,7 +98,7 @@ The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_u -5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3 and you’ll see it autofill. Select the app and click **Create Policy**. +5. Open the **Access policies** tab and click **Add policy**. Give the policy a **Name** and **Description** and set **Assign to** as **The following clients**. Start typing the name of the app you created in step 2.3, and you’ll see it autofill. Select the app and click **Create Policy**. @@ -105,13 +106,13 @@ The `external_oauth_token_user_mapping_claim` and `external_oauth_snowflake_u -7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increases the odds of your users having to re-authenticate. +7. Give the rule a descriptive name and scroll down to **token lifetimes**. Configure the **Access token lifetime is**, **Refresh token lifetime is**, and **but will expire if not used every** settings according to your organizational policies. We recommend the defaults of 1 hour and 90 days. Stricter rules increase the odds of your users having to re-authenticate. 8. Navigate back to the **Settings** tab and leave it open in your browser. You’ll need some of the information in later steps. -### 4. Create the Oauth settings in Snowflake +### 4. Create the OAuth settings in Snowflake 1. Open up a Snowflake worksheet and copy/paste the following: @@ -130,9 +131,9 @@ external_oauth_any_role_mode = 'ENABLE' ``` -2. Change `your_integration_name` to something appropriately descriptive. For example, `dev_OktaAccountNumber_okta`. Copy the `external_oauth_issuer` and `external_oauth_jws_keys_url` from the metadate URI in step 3.3. Use the same Snowflake URL that you entered in step 3.2 as the `external_oauth_audience_list`. +2. Change `your_integration_name` to something appropriately descriptive. For example, `dev_OktaAccountNumber_okta`. Copy the `external_oauth_issuer` and `external_oauth_jws_keys_url` from the metadata URI in step 3.3. Use the same Snowflake URL you entered in step 3.2 as the `external_oauth_audience_list`. -Adjust the other settings as needed to meet your organizations configurations in Okta and Snowflake. +Adjust the other settings as needed to meet your organization's configurations in Okta and Snowflake. @@ -140,39 +141,47 @@ Adjust the other settings as needed to meet your organizations configurations in ### 5. Configuring the integration in dbt Cloud -1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. - 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. - 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. - - 3. Authorize URL and Token URL: Found in the metadata URI. - +1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. + 1. `Integration name`: Give the integration a descriptive name that includes identifying information about the Okta environment so future users won’t have to guess where it belongs. + 2. `Client ID` and `Client secrets`: Retrieve these from the Okta application page. + + 3. Authorize URL and Token URL: Found in the metadata URI. + 2. **Save** the configuration + ### 6. Create a new connection in dbt Cloud -1. Navigate the **Account settings** and click **Connections** from the menu. Click **Add connection**. -2. Configure the `Account`, `Database`, and `Warehouse` as you normally would and for the `Oauth method` select the external Oauth you just created. - +1. Navigate the **Account settings** and click **Connections** from the menu. Click **Add connection**. +2. Configure the `Account`, `Database`, and `Warehouse` as you normally would, and for the `OAuth method`, select the external OAuth you just created. + + + + + +3. Scroll down to the **External OAuth** configurations box and select the config from the list. -3. Scroll down to the **External Oauth** configurations box and select the config from the list. - + -4. **Save** the connection and you have now configured External Oauth with Okta and Snowflake! -## Set up with Entra ID +4. **Save** the connection, and you have now configured External OAuth with Okta and Snowflake! + + + + ### 1. Initialize the dbt Cloud settings -1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. +1. In your dbt Cloud account, navigate to **Account settings** —> **Integrations**. 2. Scroll down to **Custom integrations** and click **Add integrations**. -3. Leave this window open. You can set the **Integration type** to Entra ID and make a note of the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. +3. Leave this window open. You can set the **Integration type** to Entra ID and note the **Redirect URI** at the bottom of the page. Copy this to your clipboard for use in the next steps. ### Entra ID -You’ll create two different `apps` in the Azure portal — A resource server and a client app. +You’ll create two apps in the Azure portal: A resource server and a client app. :::important @@ -187,68 +196,78 @@ In your Azure portal, open the **Entra ID** and click **App registrations** from ### 1. Create a resource server 1. From the app registrations screen, click **New registration**. - 1. Give the app a name. - 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” - 3. Click **Register** and you will be taken to the apps overview. + 1. Give the app a name. + 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” + 3. Click **Register**to see the application’s overview. 2. From the app overview page, click **Expose an API** from the left menu. -3. Click **Add** next to **Application ID URI**. The field will automatically populate. Click **Save**. -4. Record the `value` field as it will be used in a future step. *This is only displayed once. Be sure to record it immediately. It will be hidden when you leave the page and come back.* +3. Click **Add** next to **Application ID URI**. The field will automatically populate. Click **Save**. +4. Record the `value` field for use in a future step. _This is only displayed once. Be sure to record it immediately. Microsoft hides the field when you leave the page and come back._ 5. From the same screen, click **Add scope**. - 1. Give the scope a name. - 2. Set “Who can consent?” to **Admins and users**. - 3. Set **Admin consent display name** session:role-any and give it a description. - 4. Ensure **State** is set to **Enabled**. - 5. Click **Add scope**. + 1. Give the scope a name. + 2. Set “Who can consent?” to **Admins and users**. + 3. Set **Admin consent display name** session:role-any and give it a description. + 4. Ensure **State** is set to **Enabled**. + 5. Click **Add scope**. ### 2. Create a client app 1. From the **App registration page**, click **New registration**. - 1. Give the app a name that uniquely identifies it as the client app. - 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” - 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt Cloud into the field. - 4. Click **Register**. + 1. Give the app a name that uniquely identifies it as the client app. + 2. Ensure **Supported account types** are set to “Accounts in this organizational directory only (`Org name` - Single Tenant).” + 3. Set the **Redirect URI** to **Web** and copy/paste the **Redirect URI** from dbt Cloud into the field. + 4. Click **Register**. 2. From the app overview page, click **API permissions** from the left menu, and click **Add permission**. 3. From the pop-out screen, click **APIs my organization uses**, search for the resource server name from the previous steps, and click it. 4. Ensure the box for the **Permissions** `session:role-any` is enabled and click **Add permissions**. 5. Click **Grant admin consent** and from the popup modal click **Yes**. -6. From the left menu, click **Certificates and secrets** and cllick **New client secret**. Name the secret, set an expiration, and click **Add**. -**Note**: Microsoft does not allow “forever” as an expiration. The maximum time is two years. It’s essential to document the expiration date so that the secret can be refreshed before the expiration or user authorization will fail. -7. Record the `value` for use in a future step and record it immediately. -**Note**: This value will not be displayed again once you navigate away from this screen. +6. From the left menu, click **Certificates and secrets** and click **New client secret**. Name the secret, set an expiration, and click **Add**. +**Note**: Microsoft does not allow “forever” as an expiration date. The maximum time is two years. Documenting the expiration date so you can refresh the secret before the expiration or user authorization fails is essential. +7. Record the `value` for use in a future step and record it immediately. +**Note**: Entra ID will not display this value again once you navigate away from this screen. ### 3. Snowflake configuration -You'll be switching between the Entra ID site and Snowflake. Keep your Entra ID account open for this process. +You'll be switching between the Entra ID site and Snowflake. Keep your Entra ID account open for this process. Copy and paste the following as a template in a Snowflake worksheet: ```sql + create or replace security integration - type = external_oauth - enabled = true - external_oauth_type = azure - external_oauth_issuer = '' - external_oauth_jws_keys_url = '' - external_oauth_audience_list = ('') - external_oauth_token_user_mapping_claim = 'upn' - external_oauth_any_role_mode = 'ENABLE' - external_oauth_snowflake_user_mapping_attribute = 'login_name'; + type = external_oauth + enabled = true + external_oauth_type = azure + external_oauth_issuer = '' + external_oauth_jws_keys_url = '' + external_oauth_audience_list = ('') + external_oauth_token_user_mapping_claim = 'upn' + external_oauth_any_role_mode = 'ENABLE' + external_oauth_snowflake_user_mapping_attribute = 'login_name'; + ``` + On the Entra ID site: -1. From the Client ID app in Entra ID, click **Endpoints** and open the **Federation metadata document** in a new tab. - - The **entity ID** on this page maps to the `external_oauth_issuer` field in the Snowflake config. +1. From the Client ID +app in Entra ID, click **Endpoints** and open the **Federation metadata document** in a new tab. + - The **entity ID** on this page maps to the `external_oauth_issuer` field in the Snowflake config. 2. Back on the list of endpoints, open the **OpenID Connect metadata document** in a new tab. - - The **jwks_uri** field maps to the `external_oauth_jws_keys_url` field in Snowflake. + - The **jwks_uri** field maps to the `external_oauth_jws_keys_url` field in Snowflake. 3. Navigate to the resource server in previous steps. - - The **Application ID URI** maps to teh `external_oauth_audience_list` field in Snowflake. -4. Run the configurations. Be sure the admin who created the Microsoft apps is also a user in Snowflake, or the configuration will fail. + - The **Application ID URI** maps to the `external_oauth_audience_list` field in Snowflake. +4. Run the configurations. Be sure the admin who created the Microsoft apps is also a user in Snowflake, or the configuration will fail. ### 4. Configuring the integration in dbt Cloud -1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt Cloud. -2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. -3. `Client secrets`: These are found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret` . Note that it only appears when created; if you return later, it will be hidden, and you must recreate the secret. +1. Navigate back to the dbt Cloud **Account settings** —> **Integrations** page you were on at the beginning. It’s time to start filling out all of the fields. There will be some back-and-forth between the Entra ID account and dbt Cloud. +2. `Integration name`: Give the integration a descriptive name that includes identifying information about the Entra ID environment so future users won’t have to guess where it belongs. +3. `Client secrets`: Found in the Client ID from the **Certificates and secrets** page. `Value` is the `Client secret`. Note that it only appears when created; _Microsoft hides the secret if you return later, and you must recreate it._ 4. `Client ID`: Copy the’ Application (client) ID’ on the overview page for the client ID app. -5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. The `Oauth 2.0 authorization endpoint (v2)` and `Oauth 2.0 token endpoint (v2)` fields map to these. *You must use v2 of the `Oauth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `Oauth 2.0 token endpoint`. +5. `Authorization URL` and `Token URL`: From the client ID app, open the `Endpoints` tab. These URLs map to the `OAuth 2.0 authorization endpoint (v2)` and `OAuth 2.0 token endpoint (v2)` fields. *You must use v2 of the `OAuth 2.0 authorization endpoint`. Do not use V1.* You can use either version of the `OAuth 2.0 token endpoint`. 6. `Application ID URI`: Copy the `Application ID URI` field from the resource server’s Overview screen. + + + +## FAQs + + diff --git a/website/docs/docs/cloud/manage-access/invite-users.md b/website/docs/docs/cloud/manage-access/invite-users.md index c82e15fd48f..b9a12bae7c6 100644 --- a/website/docs/docs/cloud/manage-access/invite-users.md +++ b/website/docs/docs/cloud/manage-access/invite-users.md @@ -17,19 +17,16 @@ You must have proper permissions to invite new users: ## Invite new users -1. In your dbt Cloud account, select the gear menu in the upper right corner and then select **Account Settings**. -2. From the left sidebar, select **Users**. - - - -3. Click on **Invite Users**. +1. In your dbt Cloud account, select your account name in the bottom left corner. Then select **Account settings**. +2. Under **Settings**, select **Users**. +3. Click on **Invite users**. -4. In the **Email Addresses** field, enter the email addresses of the users you would like to invite separated by comma, semicolon, or a new line. +4. In the **Email Addresses** field, enter the email addresses of the users you want to invite separated by a comma, semicolon, or a new line. 5. Select the license type for the batch of users from the **License** dropdown. -6. Select the group(s) you would like the invitees to belong to. -7. Click **Send Invitations**. +6. Select the group(s) you want the invitees to belong to. +7. Click **Send invitations**. - If the list of invitees exceeds the number of licenses your account has available, you will receive a warning when you click **Send Invitations** and the invitations will not be sent. @@ -69,7 +66,7 @@ Once the user completes this process, their email and user information will popu * Is there a limit to the number of users I can invite? _Your ability to invite users is limited to the number of licenses you have available._ * Why are users are clicking the invitation link and getting an `Invalid Invitation Code` error? _We have seen scenarios where embedded secure link technology (such as enterprise Outlooks [Safe Link](https://learn.microsoft.com/en-us/microsoft-365/security/office-365-security/safe-links-about?view=o365-worldwide) feature) can result in errors when clicking on the email link. Be sure to include the `getdbt.com` URL in the allowlists for these services._ -* Can I have a mixure of users with SSO and username/password authentication? _Once SSO is enabled, you will no longer be able to add local users. If you have contractors or similar contingent workers, we recommend you add them to your SSO service._ +* Can I have a mixture of users with SSO and username/password authentication? _Once SSO is enabled, you will no longer be able to add local users. If you have contractors or similar contingent workers, we recommend you add them to your SSO service._ * What happens if I need to resend the invitation? _From the Users page, click on the invite record, and you will be presented with the option to resend the invitation._ * What can I do if I entered an email address incorrectly? _From the Users page, click on the invite record, and you will be presented with the option to revoke it. Once revoked, generate a new invitation to the correct email address._ diff --git a/website/docs/docs/cloud/manage-access/licenses-and-groups.md b/website/docs/docs/cloud/manage-access/licenses-and-groups.md deleted file mode 100644 index b91af80f9b3..00000000000 --- a/website/docs/docs/cloud/manage-access/licenses-and-groups.md +++ /dev/null @@ -1,145 +0,0 @@ ---- -title: "Licenses and groups" -id: "licenses-and-groups" ---- - -## Overview - -dbt Cloud administrators can use dbt Cloud's permissioning model to control -user-level access in a dbt Cloud account. This access control comes in two flavors: -License-based and Role-based. - -- **License-based Access Controls:** User are configured with account-wide - license types. These licenses control the specific parts of the dbt Cloud application - that a given user can access. -- **Role-based Access Control (RBAC):** Users are assigned to _groups_ that have - specific permissions on specific projects or the entire account. A user may be - a member of multiple groups, and those groups may have permissions on multiple - projects. - -## License-based access control - -Each user on an account is assigned a license type when the user is first -invited to a given account. This license type may change over time, but a -user can only have one type of license at any given time. - -A user's license type controls the features in dbt Cloud that the user is able -to access. dbt Cloud's three license types are: - - **Read-Only** - - **Developer** - - **IT** - -For more information on these license types, see [Seats & Users](/docs/cloud/manage-access/seats-and-users). -At a high-level, Developers may be granted _any_ permissions, whereas Read-Only -users will have read-only permissions applied to all dbt Cloud resources -regardless of the role-based permissions that the user is assigned. IT users will have Security Admin and Billing Admin permissions applied regardless of the role-based permissions that the user is assigned. - -## Role-based access control - -:::info dbt Cloud Enterprise - -Role-based access control is a feature of the dbt Cloud Enterprise plan - -::: - -Role-based access control allows for fine-grained permissioning in the dbt Cloud -application. With role-based access control, users can be assigned varying -permissions to different projects within a dbt Cloud account. For teams on the -Enterprise tier, role-based permissions can be generated dynamically from -configurations in an [Identity Provider](sso-overview). - -Role-based permissions are applied to _groups_ and pertain to _projects_. The -assignable permissions themselves are granted via _permission sets_. - - -### Groups - -A group is a collection of users. Users may belong to multiple groups. Members -of a group inherit any permissions applied to the group itself. - -Users can be added to a dbt Cloud group based on their group memberships in the -configured [Identity Provider](sso-overview) for the account. In this way, dbt -Cloud administrators can manage access to dbt Cloud resources via identity -management software like Microsoft Entra ID (formerly Azure AD), Okta, or GSuite. See _SSO Mappings_ below for -more information. - -You can view the groups in your account or create new groups from the **Team > Groups** -page in your Account Settings. - - - - -### SSO Mappings - -SSO Mappings connect Identity Provider (IdP) group membership to dbt Cloud group -membership. When a user logs into dbt Cloud via a supported identity provider, -their IdP group memberships are synced with dbt Cloud. Upon logging in -successfully, the user's group memberships (and therefore, permissions) are -adjusted accordingly within dbt Cloud automatically. - -:::tip Creating SSO Mappings - -While dbt Cloud supports mapping multiple IdP groups to a single dbt Cloud -group, we recommend using a 1:1 mapping to make administration as simple as -possible. Consider using the same name for your dbt Cloud groups and your IdP -groups. - -::: - - -### Permission Sets - -Permission sets are predefined collections of granular permissions. Permission -sets combine low-level permission grants into high-level roles that can be -assigned to groups. Some examples of existing permission sets are: - - Account Admin - - Git Admin - - Job Admin - - Job Viewer - - ...and more - -For a full list of enterprise permission sets, see [Enterprise Permissions](/docs/cloud/manage-access/enterprise-permissions). -These permission sets are available for assignment to groups and control the ability -for users in these groups to take specific actions in the dbt Cloud application. - -In the following example, the _dbt Cloud Owners_ group is configured with the -**Account Admin** permission set on _All Projects_ and the **Job Admin** permission -set on the _Internal Analytics_ project. - - - - -### Manual assignment - -dbt Cloud administrators can manually assign users to groups independently of -IdP attributes. If a dbt Cloud group is configured _without_ any -SSO Mappings, then the group will be _unmanaged_ and dbt Cloud will not adjust -group membership automatically when users log into dbt Cloud via an identity -provider. This behavior may be desirable for teams that have connected an identity -provider, but have not yet configured SSO Mappings between dbt Cloud and the -IdP. - -If an SSO Mapping is added to an _unmanaged_ group, then it will become -_managed_, and dbt Cloud may add or remove users to the group automatically at -sign-in time based on the user's IdP-provided group membership information. - - -## FAQs -- **When are IdP group memberships updated for SSO Mapped groups?** Group memberships - are updated every time a user logs into dbt Cloud via a supported SSO provider. If - you've changed group memberships in your identity provider or dbt Cloud, ask your - users to log back into dbt Cloud for these group memberships to be synchronized. - -- **Can I set up SSO without RBAC?** Yes, see the documentation on - [Manual Assignment](#manual-assignment) above for more information on using - SSO without RBAC. - -- **Can I configure a user's License Type based on IdP Attributes?** Yes, see - the docs on [managing license types](/docs/cloud/manage-access/seats-and-users#managing-license-types) - for more information. diff --git a/website/docs/docs/cloud/manage-access/mfa.md b/website/docs/docs/cloud/manage-access/mfa.md index a06251e6468..644fcdb32c2 100644 --- a/website/docs/docs/cloud/manage-access/mfa.md +++ b/website/docs/docs/cloud/manage-access/mfa.md @@ -7,6 +7,13 @@ sidebar: null # Multi-factor authentication +:::important + + +dbt Cloud enforces multi-factor authentication (MFA) for all users with username and password credentials. If MFA is not set up, you will see a notification bar prompting you to configure one of the supported methods when you log in. If you do not, you will have to configure MFA upon subsequent logins, or you will be unable to access dbt Cloud. + +::: + dbt Cloud provides multiple options for multi-factor authentication (MFA). MFA provides an additional layer of security to username and password logins for Developer and Team plan accounts. The available MFA methods are: - SMS verification code (US-based phone numbers only) @@ -51,7 +58,7 @@ Choose the next steps based on your preferred enrollment selection: 2. Follow the instructions in the modal window and click **Use security key**. - + 3. Scan the QR code or insert and touch activate your USB key to begin the process. Follow the on-screen prompts. diff --git a/website/docs/docs/cloud/manage-access/self-service-permissions.md b/website/docs/docs/cloud/manage-access/self-service-permissions.md index 24e1283b126..6b326645d44 100644 --- a/website/docs/docs/cloud/manage-access/self-service-permissions.md +++ b/website/docs/docs/cloud/manage-access/self-service-permissions.md @@ -1,42 +1,84 @@ --- -title: "Self-service permissions" -description: "Learn how dbt Cloud administrators can use self-service permissions to control access in a dbt Cloud account." +title: "Self-service Team account permissions" +description: "Learn how dbt Cloud administrators can use self-service permissions to control access in a dbt Cloud Team account." +sidebar_label: "Team permissions" id: "self-service-permissions" --- -import Permissions from '/snippets/_self-service-permissions-table.md'; +Self-service Team accounts are a quick and easy way to get dbt Cloud up and running for a small team. For teams looking to scale and access advanced features like SSO, group management, and support for larger user bases, upgrading to an [Enterprise](/docs/cloud/manage-access/enterprise-permissions) account unlocks these capabilities. +If you're interested in upgrading, contact [dbt Labs today](https://www.getdbt.com/contact) - +## Groups and permissions -## Read-Only vs. Developer License Types +Groups determine a user's permission and there are three groups are available for Team plan dbt Cloud accounts: Owner, Member, and Everyone. The first Owner user is the person who created the dbt Cloud account. -Users configured with Read-Only license types will experience a restricted set of permissions in dbt Cloud. If a user is associated with a _Member_ permission set and a Read-Only seat license, then they will only have access to what a Read-Only seat allows. See [Seats and Users](/docs/cloud/manage-access/seats-and-users) for more information on the impact of licenses on these permissions. +New users are added to the Member and Everyone groups when they onboard but this can be changed when the invitation is created. These groups only affect users with a [Developer license](#licenses) assigned. -## Owner and Member Groups in dbt Cloud Enterprise +The group access permissions are as follows: -By default, new users are added to the Member and Owner groups when they onboard to a new dbt Cloud account. Member and Owner groups are included with every new dbt Cloud account because they provide access for administrators to add users and groups, and to apply permission sets. +- **Owner** — Full access to account features. +- **Member** — Robust access to the account with restrictions on features that can alter billing or security. +- **Everyone** — A catch-all group for all users in the account. This group does not have any permission assignments beyond the user's profile. Users must be assigned to either the Member or Owner group to work in dbt Cloud. -You will need owner and member groups to help with account onboarding, but these groups can create confusion when initially setting up SSO and RBAC for dbt Cloud Enterprise accounts as described in the [Enterprise Permissions](enterprise-permissions) guide. Owner and Member groups are **account level** groups, so their permissions override any project-level permissions you wish to apply. +## Licenses -After onboarding administrative users and configuring RBAC/SSO groups, we recommend the following steps for onboarding users to a dbt Cloud Enterprise account. +You assign licenses to every user onboarded into dbt Cloud. You only assign Developer-licensed users to the Owner and Member groups. The groups have no impact on Read-only or IT licensed users. +There are three license types: -### Prerequisites +- **Developer** — The default license. Developer licenses don't restrict access to any features, so users with this license should be assigned to either the Owner or Member group. You're allotted up to 8 developer licenses per account. +- **Read-Only** — Read-only access to your project, including environments dbt Explorer. Doesn't have access to account settings at all. Functions the same regardless of group assignments. You're allotted up to 5 read-only licenses per account. +- **IT** — Partial access to the account settings including users, integrations, billing, and API settings. Cannot create or edit connects or access the project at all. Functions the same regardless of group assignments. You're allocated 1 seat per account. -You need to create an Account Admins group before removing any other groups. +See [Seats and Users](/docs/cloud/manage-access/seats-and-users) for more information on the impact of licenses on these permissions. -1. Create an Account Admins group. -2. Assign at least one user to the Account Admins group. The assigned user can manage future group, SSO mapping, and user or group assignment. +## Table of groups, licenses, and permissions -### Remove the Owner and Member groups +Key: -Follow these steps for both Owner and Member groups: +* (W)rite — Create new or modify existing. Includes `send`, `create`, `delete`, `allocate`, `modify`, and `read`. +* (R)ead — Can view but can not create or change any fields. +* No value — No access to the feature. + +Permissions: + +* [Account-level permissions](#account-permissions-for-account-roles) — Permissions related to management of the dbt Cloud account. For example, billing and account settings. +* [Project-level permissions](#project-permissions-for-account-roles) — Permissions related to the projects in dbt Cloud. For example, Explorer and the IDE. + +The following tables outline the access that users have if they are assigned a Developer license and the Owner or Member group, Read-only license, or IT license. + +#### Account permissions for account roles + +| Account-level permission| Owner | Member | Read-only license| IT license | +|:------------------------|:-----:|:------:|:----------------:|:------------:| +| Account settings | W | W | - | W | +| Billing | W | - | - | W | +| Invitations | W | W | - | W | +| Licenses | W | R | - | W | +| Users | W | R | - | W | +| Project (create) | W | W | - | W | +| Connections | W | W | - | W | +| Service tokens | W | - | - | W | +| Webhooks | W | W | - | - | + +#### Project permissions for account roles + +|Project-level permission | Owner | Member | Read-only | IT license | +|:------------------------|:-----:|:-------:|:---------:|:----------:| +| Adapters | W | W | R | - | +| Connections | W | W | R | - | +| Credentials | W | W | R | - | +| Custom env. variables | W | W | R | - | +| Develop (IDE or dbt Cloud CLI)| W | W | - | - | +| Environments | W | W | R | - | +| Jobs | W | W | R | - | +| dbt Explorer | W | W | R | - | +| Permissions | W | R | - | - | +| Profile | W | W | R | - | +| Projects | W | W | R | - | +| Repositories | W | W | R | - | +| Runs | W | W | R | - | +| Semantic Layer Config | W | W | R | - | -1. Log into dbt Cloud. -2. Click the gear icon at the top right and select **Account settings**. -3. Select **Groups** then select **OWNER** or **MEMBER**** group. -4. Click **Edit**. -5. At the bottom of the Group page, click **Delete**. -The Account Admin can add additional SSO mapping groups, permission sets, and users as needed. diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index 9a356814111..e528e2ebc1f 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -25,13 +25,14 @@ To use BigQuery in the dbt Cloud IDE, all developers must: ### Locate the redirect URI value To get started, locate the connection's redirect URI for configuring BigQuery OAuth. To do so: - - Select the gear menu in the upper left corner and choose **Account settings** + - Navigate to your account name, above your profile icon on the left side panel + - Select **Account settings** from the menu - From the left sidebar, select **Projects** - Choose the project from the list - Select **Connection** to edit the connection details - Locate the **Redirect URI** field under the **OAuth 2.0 Settings** section. Copy this value to your clipboard to use later on. - + ### Creating a BigQuery OAuth 2.0 client ID and secret To get started, you need to create a client ID and secret for [authentication](https://cloud.google.com/bigquery/docs/authentication) with BigQuery. This client ID and secret will be stored in dbt Cloud to manage the OAuth connection between dbt Cloud users and BigQuery. @@ -64,10 +65,12 @@ Now that you have an OAuth app set up in BigQuery, you'll need to add the client ### Authenticating to BigQuery Once the BigQuery OAuth app is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with BigQuery in order to use the IDE. To do so: -- Select the gear menu in the upper left corner and choose **Profile settings** +- Navigate to your account name, above your profile icon on the left side panel +- Select **Account settings** from the menu - From the left sidebar, select **Credentials** - Choose the project from the list - Select **Authenticate BigQuery Account** + You will then be redirected to BigQuery and asked to approve the drive, cloud platform, and BigQuery scopes, unless the connection is less privileged. diff --git a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md index e5c42c3fa59..067d51513b7 100644 --- a/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-databricks-oauth.md @@ -45,11 +45,11 @@ You can use the following table to set up the redirect URLs for your application ### Configure the Connection in dbt Cloud (dbt Cloud project admin) Now that you have an OAuth app set up in Databricks, you'll need to add the client ID and secret to dbt Cloud. To do so: - - go to Settings by clicking the gear in the top right. - - on the left, select **Projects** under **Account Settings** - - choose your project from the list - - select **Connection** to edit the connection details - - add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section + - From dbt Cloud, click on your account name in the left side menu and select **Account settings** + - Select **Projects** from the menu + - Choose your project from the list + - Select **Connection** to edit the connection details + - Add the `OAuth Client ID` and `OAuth Client Secret` from the Databricks OAuth app under the **Optional Settings** section @@ -57,7 +57,8 @@ Now that you have an OAuth app set up in Databricks, you'll need to add the clie Once the Databricks connection via OAuth is set up for a dbt Cloud project, each dbt Cloud user will need to authenticate with Databricks in order to use the IDE. To do so: -- Click the gear icon at the top right and select **Profile settings**. +- From dbt Cloud, click on your account name in the left side menu and select **Account settings** +- Select **Profile settings**. - Select **Credentials**. - Choose your project from the list - Select `OAuth` as the authentication method, and click **Save** diff --git a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md index e9c4236438e..27c09cbca09 100644 --- a/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-snowflake-oauth.md @@ -10,7 +10,7 @@ This guide describes a feature of the dbt Cloud Enterprise plan. If you’re int ::: -dbt Cloud Enterprise supports [OAuth authentication](https://docs.snowflake.net/manuals/user-guide/oauth-intro.html) with Snowflake. When Snowflake OAuth is enabled, users can authorize their Development credentials using Single Sign On (SSO) via Snowflake rather than submitting a username and password to dbt Cloud. If Snowflake is setup with SSO through a third-party identity provider, developers can use this method to log into Snowflake and authorize the dbt Development credentials without any additional setup. +dbt Cloud Enterprise supports [OAuth authentication](https://docs.snowflake.net/manuals/user-guide/oauth-intro.html) with Snowflake. When Snowflake OAuth is enabled, users can authorize their Development credentials using Single Sign On (SSO) via Snowflake rather than submitting a username and password to dbt Cloud. If Snowflake is set up with SSO through a third-party identity provider, developers can use this method to log into Snowflake and authorize the dbt Development credentials without any additional setup. To set up Snowflake OAuth in dbt Cloud, admins from both are required for the following steps: 1. [Locate the redirect URI value](#locate-the-redirect-uri-value) in dbt Cloud. @@ -22,10 +22,10 @@ To use Snowflake in the dbt Cloud IDE, all developers must [authenticate with Sn ### Locate the redirect URI value To get started, copy the connection's redirect URI from dbt Cloud: -1. Navigate to **Account settings** -1. Select **Projects** and choose a project from the list -1. Select the connection to view its details abd set the **OAuth method** to "Snowflake SSO" -1. Copy the **Redirect URI** for use in later steps +1. Navigate to **Account settings**. +1. Select **Projects** and choose a project from the list. +1. Select the connection to view its details and set the **OAuth method** to "Snowflake SSO". +1. Copy the **Redirect URI** to use in the later steps. ` value with the Redirect URI (also referred to as the [access URL](/docs/cloud/about-cloud/access-regions-ip-addresses)) copied in dbt Cloud. To locate the Redirect URI, refer to the previous [locate the redirect URI value](#locate-the-redirect-uri-value) section. ``` CREATE OR REPLACE SECURITY INTEGRATION DBT_CLOUD @@ -43,7 +45,7 @@ CREATE OR REPLACE SECURITY INTEGRATION DBT_CLOUD ENABLED = TRUE OAUTH_CLIENT = CUSTOM OAUTH_CLIENT_TYPE = 'CONFIDENTIAL' - OAUTH_REDIRECT_URI = 'LOCATED_REDIRECT_URI' + OAUTH_REDIRECT_URI = '' OAUTH_ISSUE_REFRESH_TOKENS = TRUE OAUTH_REFRESH_TOKEN_VALIDITY = 7776000; ``` @@ -88,11 +90,11 @@ Enter the Client ID and Client Secret into dbt Cloud to complete the creation of -### Authorize Developer Credentials +### Authorize developer credentials Once Snowflake SSO is enabled, users on the project will be able to configure their credentials in their Profiles. By clicking the "Connect to Snowflake Account" button, users will be redirected to Snowflake to authorize with the configured SSO provider, then back to dbt Cloud to complete the setup process. At this point, users should now be able to use the dbt IDE with their development credentials. -### SSO OAuth Flow Diagram +### SSO OAuth flow diagram diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md index e4ff998015c..2b2575efc57 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-google-workspace.md @@ -117,7 +117,7 @@ If the verification information looks appropriate, then you have completed the c ## Setting up RBAC Now you have completed setting up SSO with GSuite, the next steps will be to set up -[RBAC groups](/docs/cloud/manage-access/enterprise-permissions) to complete your access control configuration. +[RBAC groups](/docs/cloud/manage-access/about-user-access#role-based-access-control-) to complete your access control configuration. ## Troubleshooting diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md b/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md index 4658141034c..81463cf9ee5 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-microsoft-entra-id.md @@ -61,6 +61,13 @@ Depending on your Microsoft Entra ID settings, your App Registration page might ### Azure <-> dbt Cloud User and Group mapping +:::important + +There is a [limitation](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-fed-group-claims#important-caveats-for-this-functionality) on the number of groups Azure will emit (capped at 150) via the SSO token, meaning if a user belongs to more than 150 groups, it will appear as though they belong to none. To prevent this, configure [group assignments](https://learn.microsoft.com/en-us/entra/identity/enterprise-apps/assign-user-or-group-access-portal?pivots=portal) with the dbt Cloud app in Azure and set a [group claim](https://learn.microsoft.com/en-us/entra/identity/hybrid/connect/how-to-connect-fed-group-claims#add-group-claims-to-tokens-for-saml-applications-using-sso-configuration) so Azure emits only the relevant groups. + +::: + + The Azure users and groups you will create in the following steps are mapped to groups created in dbt Cloud based on the group name. Reference the docs on [enterprise permissions](enterprise-permissions) for additional information on how users, groups, and permission sets are configured in dbt Cloud. ### Adding users to an Enterprise application @@ -120,8 +127,9 @@ To complete setup, follow the steps below in the dbt Cloud application. ### Supplying credentials -25. Click the gear icon at the top right and select **Profile settings**. To the left, select **Single Sign On** under **Account Settings**. -26. Click the **Edit** button and supply the following SSO details: +25. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. +26. Click **Single sign-on** from the menu. +27. Click the **Edit** button and supply the following SSO details: | Field | Value | | ----- | ----- | diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md index 53986513ce2..fda32f118ef 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-okta.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-okta.md @@ -190,4 +190,4 @@ configured in the steps above. ## Setting up RBAC Now you have completed setting up SSO with Okta, the next steps will be to set up -[RBAC groups](/docs/cloud/manage-access/enterprise-permissions) to complete your access control configuration. +[RBAC groups](/docs/cloud/manage-access/about-user-access#role-based-access-control-) to complete your access control configuration. diff --git a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md index 7083e7ac5f8..34c1a91fbee 100644 --- a/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md +++ b/website/docs/docs/cloud/manage-access/set-up-sso-saml-2.0.md @@ -16,7 +16,7 @@ Currently supported features include: This document details the steps to integrate dbt Cloud with an identity provider in order to configure Single Sign On and [role-based access control](/docs/cloud/manage-access/about-user-access#role-based-access-control). -## Auth0 Multi-tenant URIs +## Auth0 URIs diff --git a/website/docs/docs/cloud/manage-access/sso-overview.md b/website/docs/docs/cloud/manage-access/sso-overview.md index 560be72e31d..e922a073fc8 100644 --- a/website/docs/docs/cloud/manage-access/sso-overview.md +++ b/website/docs/docs/cloud/manage-access/sso-overview.md @@ -12,7 +12,7 @@ dbt Cloud supports JIT (Just-in-Time) provisioning and IdP-initiated login. You - You have a dbt Cloud account enrolled in the Enterprise plan. [Contact us](mailto:sales@getdbt.com) to learn more and enroll. -## Auth0 Multi-tenant URIs +## Auth0 URIs @@ -43,7 +43,7 @@ Then, assign all of these (and only these) to the user license. This step will a ## SSO enforcement -* **SSO Enforcement:** If you have SSO turned on in your organization, dbt Cloud will enforce SSO-only logins for all non-admin users. If an Account Admin already has a password, they can continue logging in with a password. +* **SSO Enforcement:** If SSO is turned on in your organization, dbt Cloud will enforce SSO-only logins for all non-admin users. By default, if an Account Admin or Security Admin already has a password, they can continue logging in with a password. To restrict admins from using passwords, turn off **Allow password logins for account administrators** in the **Single sign-on** section of your organization's **Account settings**. * **SSO Re-Authentication:** dbt Cloud will prompt you to re-authenticate using your SSO provider every 24 hours to ensure high security. ### How should non-admin users log in? diff --git a/website/docs/docs/cloud/migration.md b/website/docs/docs/cloud/migration.md index 3aec1956297..2665b8f6a97 100644 --- a/website/docs/docs/cloud/migration.md +++ b/website/docs/docs/cloud/migration.md @@ -11,15 +11,22 @@ dbt Labs is in the process of rolling out a new cell-based architecture for dbt We're scheduling migrations by account. When we're ready to migrate your account, you will receive a banner or email communication with your migration date. If you have not received this communication, then you don't need to take action at this time. dbt Labs will share information about your migration with you, with appropriate advance notice, when applicable to your account. -Your account will be automatically migrated on its scheduled date. However, if you use certain features, you must take action before that date to avoid service disruptions. +Your account will be automatically migrated on or after its scheduled date. However, if you use certain features, you must take action before that date to avoid service disruptions. ## Recommended actions +:::info Rescheduling your migration + +If you're on the dbt Cloud Enterprise tier, you can postpone your account migration by up to 45 days. To reschedule your migration, navigate to **Account Settings** → **Migration guide**. + +For help, contact the dbt Support Team at [support@getdbt.com](mailto:support@getdbt.com). +::: + We highly recommended you take these actions: -- Ensure pending user invitations are accepted or note outstanding invitations. Pending user invitations will be voided during the migration and must be resent after it is complete. -- Commit unsaved changes in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). Unsaved changes will be lost during migration. -- Export and download [audit logs](/docs/cloud/manage-access/audit-log) older than 90 days, as they will be lost during migration. If you lose critical logs older than 90 days during the migration, you will have to work with the dbt Labs Customer Support team to recover. +- Ensure pending user invitations are accepted or note outstanding invitations. Pending user invitations might be voided during the migration. You can resend user invitations after the migration is complete. +- Commit unsaved changes in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). Unsaved changes might be lost during migration. +- Export and download [audit logs](/docs/cloud/manage-access/audit-log) older than 90 days, as they will be unavailable from dbt Cloud after the migration is complete. Logs older than 90 days while within the data retention period are not deleted, but you will have to work with the dbt Labs Customer Support team to recover. ## Required actions diff --git a/website/docs/docs/cloud/secure/about-privatelink.md b/website/docs/docs/cloud/secure/about-privatelink.md index 731cef3f019..f19790fd708 100644 --- a/website/docs/docs/cloud/secure/about-privatelink.md +++ b/website/docs/docs/cloud/secure/about-privatelink.md @@ -7,10 +7,13 @@ sidebar_label: "About PrivateLink" import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkHostnameWarning from '/snippets/_privatelink-hostname-restriction.md'; +import CloudProviders from '/snippets/_privatelink-across-providers.md'; -PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on AWS using [AWS PrivateLink](https://aws.amazon.com/privatelink/) technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. +PrivateLink enables a private connection from any dbt Cloud Multi-Tenant environment to your data platform hosted on a cloud provider, such as [AWS](https://aws.amazon.com/privatelink/) or [Azure](https://azure.microsoft.com/en-us/products/private-link), using that provider’s PrivateLink technology. PrivateLink allows dbt Cloud customers to meet security and compliance controls as it allows connectivity between dbt Cloud and your data platform without traversing the public internet. This feature is supported in most regions across NA, Europe, and Asia, but [contact us](https://www.getdbt.com/contact/) if you have questions about availability. + + ### Cross-region PrivateLink diff --git a/website/docs/docs/cloud/secure/databricks-privatelink.md b/website/docs/docs/cloud/secure/databricks-privatelink.md index a02683e1269..aaa6e0c6eb7 100644 --- a/website/docs/docs/cloud/secure/databricks-privatelink.md +++ b/website/docs/docs/cloud/secure/databricks-privatelink.md @@ -8,11 +8,14 @@ pagination_next: null import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkSLA from '/snippets/_PrivateLink-SLA.md'; +import CloudProviders from '/snippets/_privatelink-across-providers.md'; The following steps will walk you through the setup of a Databricks AWS PrivateLink or Azure Private Link endpoint in the dbt Cloud multi-tenant environment. + + ## Configure AWS PrivateLink 1. Locate your [Databricks instance name](https://docs.databricks.com/en/workspace/workspace-details.html#workspace-instance-names-urls-and-ids) @@ -31,7 +34,7 @@ The following steps will walk you through the setup of a Databricks AWS PrivateL 1. Once dbt Cloud support has notified you that setup is complete, [register the VPC endpoint in Databricks](https://docs.databricks.com/administration-guide/cloud-configurations/aws/privatelink.html#step-3-register-privatelink-objects-and-attach-them-to-a-workspace) and attach it to the workspace: - [Register your VPC endpoint](https://docs.databricks.com/en/security/network/classic/vpc-endpoints.html) — Register the VPC endpoint using the VPC endpoint ID provided by dbt Support. - [Create a Private Access Settings object](https://docs.databricks.com/en/security/network/classic/private-access-settings.html) — Create a Private Access Settings (PAS) object with your desired public access settings, and setting Private Access Level to **Endpoint**. Choose the registered endpoint created in the previous step. - - [Create or update your workspace](https://docs.databricks.com/en/security/network/classic/privatelink.html#step-3d-create-or-update-the-workspace-front-end-back-end-or-both) — Create a workspace, or update your an existing workspace. Under **Advanced configurations → Private Link** choose the private access settings object created in the previous step. + - [Create or update your workspace](https://docs.databricks.com/en/security/network/classic/privatelink.html#step-3d-create-or-update-the-workspace-front-end-back-end-or-both) — Create a workspace, or update an existing workspace. Under **Advanced configurations → Private Link** choose the private access settings object created in the previous step. :::warning If using an existing Databricks workspace, all workloads running in the workspace need to be stopped to enable Private Link. Workloads also can't be started for another 20 minutes after making changes. From the [Databricks documentation](https://docs.databricks.com/en/security/network/classic/privatelink.html#step-3d-create-or-update-the-workspace-front-end-back-end-or-both): diff --git a/website/docs/docs/cloud/secure/postgres-privatelink.md b/website/docs/docs/cloud/secure/postgres-privatelink.md index 864cfe4acba..4d670354686 100644 --- a/website/docs/docs/cloud/secure/postgres-privatelink.md +++ b/website/docs/docs/cloud/secure/postgres-privatelink.md @@ -6,11 +6,15 @@ sidebar_label: "PrivateLink for Postgres" --- import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; +import CloudProviders from '/snippets/_privatelink-across-providers.md'; A Postgres database, hosted either in AWS or in a properly connected on-prem data center, can be accessed through a private network connection using AWS Interface-type PrivateLink. The type of Target Group connected to the Network Load Balancer (NLB) may vary based on the location and type of Postgres instance being connected, as explained in the following steps. + + ## Configuring Postgres interface-type PrivateLink ### 1. Provision AWS resources @@ -41,9 +45,16 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - Target Group protocol: **TCP** - **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group for port `5432` + - **Scheme:** Internal + - **IP address type:** IPv4 + - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that _this is different_ than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. + - **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **VPC Endpoint Service** — Attach to the newly created NLB. - Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. + + ### 2. Grant dbt AWS account access to the VPC Endpoint Service On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. @@ -88,4 +99,4 @@ Once dbt Cloud support completes the configuration, you can start creating new c 4. Configure the remaining data platform details. 5. Test your connection and save it. - \ No newline at end of file + diff --git a/website/docs/docs/cloud/secure/redshift-privatelink.md b/website/docs/docs/cloud/secure/redshift-privatelink.md index a9d4332918b..75924cf76a9 100644 --- a/website/docs/docs/cloud/secure/redshift-privatelink.md +++ b/website/docs/docs/cloud/secure/redshift-privatelink.md @@ -7,6 +7,8 @@ sidebar_label: "PrivateLink for Redshift" import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; +import CloudProviders from '/snippets/_privatelink-across-providers.md'; @@ -16,6 +18,8 @@ AWS provides two different ways to create a PrivateLink VPC endpoint for a Redsh dbt Cloud supports both types of endpoints, but there are a number of [considerations](https://docs.aws.amazon.com/redshift/latest/mgmt/managing-cluster-cross-vpc.html#managing-cluster-cross-vpc-considerations) to take into account when deciding which endpoint type to use. Redshift-managed provides a far simpler setup with no additional cost, which might make it the preferred option for many, but may not be an option in all environments. Based on these criteria, you will need to determine which is the right type for your system. Follow the instructions from the section below that corresponds to your chosen endpoint type. + + :::note Redshift Serverless While Redshift Serverless does support Redshift-managed type VPC endpoints, this functionality is not currently available across AWS accounts. Due to this limitation, an Interface-type VPC endpoint service must be used for Redshift Serverless cluster PrivateLink connectivity from dbt Cloud. ::: @@ -79,9 +83,16 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - Target Group protocol: **TCP** - **Network Load Balancer (NLB)** — Requires creating a Listener that attaches to the newly created Target Group for port `5439` + - **Scheme:** Internal + - **IP address type:** IPv4 + - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC endpoint service must either not have an associated security group, or the security group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that _this is different_ than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. dbt Support can provide the correct private CIDR(s) upon request. If necessary, until you can refine the rule to the smaller CIDR provided by dbt, allow connectivity by temporarily adding an allow rule of `10.0.0.0/8`. + - **Listeners:** Create one listener per target group that maps the appropriate incoming port to the corresponding target group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **VPC Endpoint Service** — Attach to the newly created NLB. - Acceptance required (optional) — Requires you to [accept our connection request](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests) after dbt creates the endpoint. + + ### 2. Grant dbt AWS Account access to the VPC Endpoint Service On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the root user in the appropriate production AWS account and save your changes. @@ -117,4 +128,4 @@ Once dbt Cloud support completes the configuration, you can start creating new c 4. Configure the remaining data platform details. 5. Test your connection and save it. - \ No newline at end of file + diff --git a/website/docs/docs/cloud/secure/snowflake-privatelink.md b/website/docs/docs/cloud/secure/snowflake-privatelink.md index c6775be2444..dc0cb64ba31 100644 --- a/website/docs/docs/cloud/secure/snowflake-privatelink.md +++ b/website/docs/docs/cloud/secure/snowflake-privatelink.md @@ -6,11 +6,14 @@ sidebar_label: "PrivateLink for Snowflake" --- import SetUpPages from '/snippets/_available-tiers-privatelink.md'; +import CloudProviders from '/snippets/_privatelink-across-providers.md'; The following steps walk you through the setup of a Snowflake AWS PrivateLink or Azure Private Link endpoint in a dbt Cloud multi-tenant environment. + + :::note Snowflake SSO with PrivateLink Users connecting to Snowflake using SSO over a PrivateLink connection from dbt Cloud will also require access to a PrivateLink endpoint from their local workstation. @@ -94,12 +97,18 @@ Once dbt Cloud support completes the configuration, you can start creating new c 4. Configure the remaining data platform details. 5. Test your connection and save it. -## Enable the connection in Snowflake +### Enable the connection in Snowflake hosted on Azure + +:::note + +AWS private internal stages are not currently supported. + +::: To complete the setup, follow the remaining steps from the Snowflake setup guides. The instructions vary based on the platform: -- [Snowflake AWS PrivateLink](https://docs.snowflake.com/en/user-guide/admin-security-privatelink) - [Snowflake Azure Private Link](https://docs.snowflake.com/en/user-guide/privatelink-azure) +- [Azure private endpoints for internal stages](https://docs.snowflake.com/en/user-guide/private-internal-stages-azure) There are some nuances for each connection and you will need a Snowflake administrator. As the Snowflake administrator, call the `SYSTEM$AUTHORIZE_STAGE_PRIVATELINK_ACCESS` function using the privateEndpointResourceID value as the function argument. This authorizes access to the Snowflake internal stage through the private endpoint. @@ -107,14 +116,12 @@ There are some nuances for each connection and you will need a Snowflake adminis USE ROLE ACCOUNTADMIN; --- AWS PrivateLink -SELECT SYSTEMS$AUTHORIZE_STATE_PRIVATELINK_ACCESS ( `AWS VPC ID` ); - -- Azure Private Link -SELECT SYSTEMS$AUTHORIZE_STATE_PRIVATELINK_ACCESS ( `AZURE PRIVATE ENDPOINT RESOURCE ID` ); +SELECT SYSTEMS$AUTHORIZE_STAGE_PRIVATELINK_ACCESS ( `AZURE PRIVATE ENDPOINT RESOURCE ID` ); ``` + ## Configuring Network Policies If your organization uses [Snowflake Network Policies](https://docs.snowflake.com/en/user-guide/network-policies) to restrict access to your Snowflake account, you will need to add a network rule for dbt Cloud. diff --git a/website/docs/docs/cloud/secure/vcs-privatelink.md b/website/docs/docs/cloud/secure/vcs-privatelink.md index 6041b1cb4ed..28b4df8f706 100644 --- a/website/docs/docs/cloud/secure/vcs-privatelink.md +++ b/website/docs/docs/cloud/secure/vcs-privatelink.md @@ -7,6 +7,7 @@ sidebar_label: "PrivateLink for VCS" import SetUpPages from '/snippets/_available-tiers-privatelink.md'; import PrivateLinkTroubleshooting from '/snippets/_privatelink-troubleshooting.md'; +import PrivateLinkCrossZone from '/snippets/_privatelink-cross-zone-load-balancing.md'; @@ -44,12 +45,15 @@ Creating an Interface VPC PrivateLink connection requires creating multiple AWS - **Scheme:** Internal - **IP address type:** IPv4 - **Network mapping:** Choose the VPC that the VPC Endpoint Service and NLB are being deployed in, and choose subnets from at least two Availability Zones. + - **Security Groups:** The Network Load Balancer (NLB) associated with the VPC Endpoint Service must either not have an associated Security Group, or the Security Group must have a rule that allows requests from the appropriate dbt Cloud **private CIDR(s)**. Note that **this is different** than the static public IPs listed on the dbt Cloud [Access, Regions, & IP addresses](https://docs.getdbt.com/docs/cloud/about-cloud/access-regions-ip-addresses) page. The correct private CIDR(s) can be provided by dbt Support upon request. If necessary, temporarily adding an allow rule of `10.0.0.0/8` should allow connectivity until the rule can be refined to the smaller dbt provided CIDR. - **Listeners:** Create one Listener per Target Group that maps the appropriate incoming port to the corresponding Target Group ([details](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-listeners.html)). - **Endpoint Service** - The VPC Endpoint Service is what allows for the VPC to VPC connection, routing incoming requests to the configured load balancer. - **Load balancer type:** Network. - **Load balancer:** Attach the NLB created in the previous step. - **Acceptance required (recommended)**: When enabled, requires a new connection request to the VPC Endpoint Service to be accepted by the customer before connectivity is allowed ([details](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#accept-reject-connection-requests)). + + ### 2. Grant dbt AWS account access to the VPC Endpoint Service Once these resources have been provisioned, access needs to be granted for the dbt Labs AWS account to create a VPC Endpoint in our VPC. On the provisioned VPC endpoint service, click the **Allow principals** tab. Click **Allow principals** to grant access. Enter the ARN of the following IAM role in the appropriate production AWS account and save your changes ([details](https://docs.aws.amazon.com/vpc/latest/privatelink/configure-endpoint-service.html#add-remove-permissions)). diff --git a/website/docs/docs/cloud/use-dbt-assist.md b/website/docs/docs/cloud/use-dbt-assist.md deleted file mode 100644 index 4eef6c87329..00000000000 --- a/website/docs/docs/cloud/use-dbt-assist.md +++ /dev/null @@ -1,22 +0,0 @@ ---- -title: "Use dbt Assist" -sidebar_label: "Use dbt Assist" -description: "Use dbt Assist to generate documentation and tests from scratch, giving you the flexibility to modify or fix generated code." ---- - -# Use dbt Assist - -Use dbt Assist to generate documentation and tests from scratch, giving you the flexibility to modify or fix generated code. To access and use dbt Assist: - -1. Navigate to the dbt Cloud IDE and select a SQL model file under the **File Explorer**. - -2. In the **Console** section (under the **File Editor**), select the **dbt Assist** to view the available AI options. - -3. Select the available options: **Documentation** or **Tests** to generate the YAML config. - - To generate both for the same model, click each option separately. dbt Assist intelligently saves the YAML config in the same file. - -4. Verify the AI-generated code. You can update or fix the code as needed. - -5. Click **Save** to save the code. You should see the file changes under the **Version control** section. - - diff --git a/website/docs/docs/cloud/use-dbt-copilot.md b/website/docs/docs/cloud/use-dbt-copilot.md new file mode 100644 index 00000000000..48e5ffa6fa7 --- /dev/null +++ b/website/docs/docs/cloud/use-dbt-copilot.md @@ -0,0 +1,73 @@ +--- +title: "Use dbt Copilot" +sidebar_label: "Use dbt Copilot" +description: "Use dbt Copilot to generate documentation, tests, semantic models, and sql code from scratch, giving you the flexibility to modify or fix generated code." +--- + +# Use dbt Copilot + +Use dbt Copilot to generate documentation, tests, semantic models, and code from scratch, giving you the flexibility to modify or fix generated code. + +This page explains how to use dbt Copilot to: + +- [Generate resources](#generate-resources) — Save time by using dbt Copilot’s generation button to generate documentation, tests, and semantic model files during your development. +- [Generate and edit code](#generate-and-edit-code) — Use natural language prompts to generate SQL code from scratch or to edit existing SQL file by using keyboard shortcuts or highlighting code. + +## Generate resources + +Generate documentation, tests, and semantic models resources with the click-of-a-button using dbt Copilot, saving you time. To access and use this AI feature: + +1. Navigate to the dbt Cloud IDE and select a SQL model file under the **File Explorer**. +2. In the **Console** section (under the **File Editor**), click **dbt Copilot** to view the available AI options. +3. Select the available options to generate the YAML config: **Generate Documentation**, **Generate Tests**, or **Generate Semantic Model**. + - To generate multiple YAML configs for the same model, click each option separately. dbt Copilot intelligently saves the YAML config in the same file. +4. Verify the AI-generated code. You can update or fix the code as needed. +5. Click **Save As**. You should see the file changes under the **Version control** section. + + + +## Generate and edit code + +dbt Copilot also allows you to generate SQL code directly within the SQL file in the dbt Cloud IDE, using natural language prompts. This means you can rewrite or add specific portions of the SQL file without needing to edit the entire file. + +This intelligent AI tool streamlines SQL development by reducing errors, scaling effortlessly with complexity, and saving valuable time. dbt Copilot's [prompt window](#use-the-prompt-window), accessible by keyboard shortcut, handles repetitive or complex SQL generation effortlessly so you can focus on high-level tasks. + +Use Copilot's prompt window for use cases like: + +- Writing advanced transformations +- Performing bulk edits efficiently +- Crafting complex patterns like regex + +### Use the prompt window + +Access dbt Copilot's AI prompt window using the keyboard shortcut Cmd+B (Mac) or Ctrl+B (Windows) to: + +#### 1. Generate SQL from scratch +- Use the keyboard shortcuts Cmd+B (Mac) or Ctrl+B (Windows) to generate SQL from scratch. +- Enter your instructions to generate SQL code tailored to your needs using natural language. +- Ask dbt Copilot to fix the code or add a specific portion of the SQL file. + + + +#### 2. Edit existing SQL code +- Highlight a section of SQL code and press Cmd+B (Mac) or Ctrl+B (Windows) to open the prompt window for editing. +- Use this to refine or modify specific code snippets based on your needs. +- Ask dbt Copilot to fix the code or add a specific portion of the SQL file. + +#### 3. Review changes with the diff view to quickly assess the impact of the changes before making changes +- When a suggestion is generated, Copilot displays a visual "diff" view to help you compare the proposed changes with your existing code: + - **Green**: Means new code that will be added if you accept the suggestion. + - **Red**: Highlights existing code that will be removed or replaced by the suggested changes. + +#### 4. Accept or reject suggestions +- **Accept**: If the generated SQL meets your requirements, click the **Accept** button to apply the changes directly to your `.sql` file directly in the IDE. +- **Reject**: If the suggestion don’t align with your request/prompt, click **Reject** to discard the generated SQL without making changes and start again. + +#### 5. Regenerate code +- To regenerate, press the **Escape** button on your keyboard (or click the Reject button in the popup). This will remove the generated code and puts your cursor back into the prompt text area. +- Update your prompt and press **Enter** to try another generation. Press **Escape** again to close the popover entirely. + +Once you've accepted a suggestion, you can continue to use the prompt window to generate additional SQL code and commit your changes to the branch. + + + diff --git a/website/docs/docs/cloud/use-visual-editor.md b/website/docs/docs/cloud/use-visual-editor.md new file mode 100644 index 00000000000..2ab6a5b82d1 --- /dev/null +++ b/website/docs/docs/cloud/use-visual-editor.md @@ -0,0 +1,83 @@ +--- +title: "Edit and create dbt models" +id: use-visual-editor +sidebar_label: "Edit and create dbt models" +description: "Access and use the visual editor to create or edit dbt models through a visual, drag-and-drop experience inside of dbt Cloud." +pagination_prev: "docs/cloud/visual-editor-interface" +--- + +# Edit and create dbt models + +

+Access and use the dbt Cloud visual editor to create or edit dbt models through a visual, drag-and-drop experience. Use the built-in AI for custom code generation in your development experience. +

+ +:::tip Beta feature +The visual editor provides users with a seamless and drag-and-drop experience inside of dbt Cloud. It's available in private beta for [dbt Cloud Enterprise accounts](https://www.getdbt.com/pricing). + +To join the private beta, [register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) or reach out to your account team to begin this process. +::: + +## Prerequisites +- You have a [dbt Cloud Enterprise](https://www.getdbt.com/pricing) account +- You have a [developer license](/docs/cloud/manage-access/seats-and-users) with developer credentials set up +- You have an existing dbt Cloud project already created +- Your Development environment is on a supported [release track](/docs/dbt-versions/cloud-release-tracks) to receive ongoing updates. +- Have AI-powered features toggle enabled + +## Access visual editor + +Before accessing the editor, you should have a dbt Cloud project already set up. This includes a Git repository, data platform connection, environments, and developer credentials. If you don't have this set up, please contact your dbt Cloud Admin. + +To access the visual editor: +- Type in the following URL, replacing the ACCOUNT_ID and ENVIRONMENT_ID with your own account and environment ID: `https://ACCESS_URL/visual-editor/ACCOUNT_ID/env/ENVIRONMENT_ID/` + - The environment ID must have had runs that generated catalogs in it. + +- For example, if my region is North America multi-tenant, account ID is 10, environment ID with a generated catalog run is 100, my URL should be: + + - `https://cloud.getdbt.com/visual-editor/10/env/100/` + + + +## Create a model +To create a dbt SQL model, click on **Create a new model** and perform the following steps. Note that you can't create source models in the visual editor. This is because you need to have production run with sources already created. + +1. Drag an operator from the operator toolbar and drop it onto the canvas. +2. Click on the operator to open its configuration panel: + - **Model**: Select the model and columns you want to use. + - **Join**: Define the join conditions and choose columns from both tables. + - **Select**: Pick the columns you need from the model. + - **Aggregate**: Specify the aggregation functions and the columns they apply to. + - **Formula**: Add the formula to create a new column. Use the built-AI code generator to help generate SQL code by clicking on the question mark (?) icon. Enter your prompt and wait to see the results. + - **Filter**: Set the conditions to filter data. + - **Order**: Select the columns to sort by and the sort order. + - **Limit**: Set the maximum number of rows you want to return. +3. View the **Output** and **SQL Code** tabs. + - Each operator has an Output tab that allows you to preview the data from that configured node. + - The Code tab displays the SQL code generated by the node's configuration. Use this to see the SQL for your visual model config. +4. Connect the operators by using the connector by dragging your cursor between the operator's "+" start point and linking it to the other operators you want to connect to. This should create a connector line. + - Doing this allows the data to flow from the source table through various transformations you configured, to the final output. +5. Keep building your dbt model and ensure you confirm the out through the **Output** tab. + + + +## Edit an existing model +To edit an existing model, navigate to the Visual Editor, click on the **Get Started** button on the upper right, and click **Edit existing model**. This will allow you to select the model you'd like to edit. + + + +## Version control + +Testing and documenting your models is an important part of the development process. + +Stay tuned! Coming very soon, you'll be able to version control your dbt modes in the visual editor. This ensures you can track changes and revert to previous versions if needed. + + diff --git a/website/docs/docs/cloud/visual-editor-interface.md b/website/docs/docs/cloud/visual-editor-interface.md new file mode 100644 index 00000000000..16e5a038d0e --- /dev/null +++ b/website/docs/docs/cloud/visual-editor-interface.md @@ -0,0 +1,84 @@ +--- +title: "Navigate the interface" +id: visual-editor-interface +sidebar_label: "Navigate the interface" +description: "The visual editor interface contains an operator toolbar, operators, and a canvas to help you create dbt models through a seamless drag-and-drop experience in dbt Cloud." +pagination_next: "docs/cloud/use-visual-editor" +pagination_prev: "docs/cloud/visual-editor" + +--- + +# Navigate the interface + +

+The visual editor interface contains an operator toolbar, operators, canvas, built-in AI, and more to help you create dbt models through a seamless drag-and-drop experience in dbt Cloud. +

+ +:::tip Beta feature +The visual editor provides users with a seamless and visual, drag-and-drop experience inside dbt Cloud. It's available in private beta for [dbt Cloud Enterprise accounts](https://www.getdbt.com/pricing). + +To join the private beta, [register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) or reach out to your account team to begin this process. +::: + +This page offers comprehensive definitions and terminology of user interface elements, allowing you to navigate the dbt Cloud visual editor landscape with ease. + +The visual editor interface is composed of: + +- **Operator toolbar** — Located at the top of the interface, the toolbar displays all the nodes available. Use the toggle on the left of the toolbar to display or hide it. +- **Operators** — perform specific transformations or configurations (such as model, join, aggregate, filter, and so on). Use connectors to link the operators and build a complete data transformation pipeline. +- **Canvas** — The main whiteboard space below the node toolbar. The canvas allows you to create or modify models through a sleek drag-and-drop experience. +- **Configuration panel** — Each operator has a configuration panel that opens when you click on it. The configuration panel allows you to configure the operator, review the current model, preview changes to the table, view the SQL code for the node, and delete the operator. + +## Operators + +The operator toolbar above the canvas contains the different transformation operators available to use. Use each operator to configure or perform specific tasks, like adding filters or joining models by dragging an operator onto the canvas. You can connect operators using the connector line, which allows you to form a complete dbt model for your data transformation. + + + +Here the following operators are available: +- **Model**: This represents a data model. Use this to select the source table and the columns you want to include. There are no limits to the number of models you can have in a session. +- **Join**: Join two models and configure the join conditions by selecting which columns to include from each table. Requires two inputs. For example, you might want to join both tables using the 'ID' column found in both tables. +- **Select**: Use this to 'select' specific columns from a table. +- **Aggregate**: Allows you to perform aggregations like GROUP, SUM, AVG, COUNT, and so on. +- **Formula**: Create new columns using custom SQL formulas. Use a built-in AI code generator to generate SQL by clicking the ? icon. For example, you can use the formula node to only extract the email domain and ask the AI code generator to help you write the SQL for that code extraction. +- **Filter**: Filter data based on conditions you set. +- **Order**: Sort data by specific columns. +- **Limit**: Limits the number of rows returned back. + +When you click on each operator, it opens a configuration panel. The configuration panel allows you to configure the operator, review the current model, preview changes to the model, view the SQL code for the node, and delete the operator. + + + +If you have any feedback on additional operators that you might need, we'd love to hear it! Please contact your dbt Labs account team and share your thoughts. + +## Canvas + +The visual editor has a sleek drag-and-drop canvas interface that allows you to create or modify dbt SQL models. It's like a digital whiteboard space that allows analysts to deliver trustworthy data. Use the canvas to: + +- Drag-and-drop operators to create and configure your model(s) +- Generate SQL code using the built-in AI generator +- Zoom in or out for better visualization +- Version-control your dbt models +- [Coming soon] Test and document your created models + + + +### Connector + +Connectors allow you to connect your operators to create dbt models. Once you've added operators to the canvas: +- Hover over the "+" sign next to the operator and click. +- Drag your cursor between the operator's "+" start point to the other node you want to connect to. This should create a connector line. +- As an example, to create a join, connect one operator to the "L" (Left) and the other to the "R" (Right). The endpoints are located to the left of the operator so you can easily drag the connectors to the endpoint. + + + +## Configuration panel +Each operator has a configuration side panel that opens when you click on it. The configuration panel allows you to configure the operator, review the current model, preview changes, view the SQL code for the operator, and delete the operator. + +The configuration side panel has the following: +- Configure tab — This section allows you to configure the operator to your specified requirements, such as using the built-in AI code generator to generate SQL. +- Input tab — This section allows you to view the data for the current source table. Not available for model operators. +- Output tab — This section allows you to preview the data for the modified source model. +- Code — This section allows you to view the underlying SQL code for the data transformation. + + diff --git a/website/docs/docs/cloud/visual-editor.md b/website/docs/docs/cloud/visual-editor.md new file mode 100644 index 00000000000..8dc9dfa2863 --- /dev/null +++ b/website/docs/docs/cloud/visual-editor.md @@ -0,0 +1,37 @@ +--- +title: "About the visual editor" +id: visual-editor +sidebar_label: "About the visual editor" +description: "The visual editor enables analysts to quickly create and visualize dbt models through a visual, drag-and-drop experience inside of dbt Cloud." +pagination_next: "docs/cloud/visual-editor-interface" +pagination_prev: null +--- + +# About the visual editor + +

+The dbt Cloud visual editor helps analysts quickly create, edit, and visualize dbt models through a visual, drag-and-drop experience and with a built-in AI for custom code generation. +

+ +:::tip Beta feature +The visual editor in dbt Cloud provides users with a seamless and visual, drag-and-drop experience inside dbt Cloud. It's available in private beta for [dbt Cloud Enterprise accounts](https://www.getdbt.com/pricing). + +To join the private beta, [register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) or reach out to your account team to begin this process. +::: + +The visual editor allows organizations to enjoy the many benefits of code-driven development—such as increased precision, ease of debugging, and ease of validation — while retaining the flexibility to have different contributors develop wherever they are most comfortable. Users can also take advantage of built-in AI for custom code generation, making it an end-to-end frictionless experience. + +These models compile directly to SQL and are indistinguishable from other dbt models in your projects: +- Visual models are version-controlled in your backing Git provider. +- All models are accessible across projects in [dbt Mesh](/best-practices/how-we-mesh/mesh-1-intro). +- Models can be materialized into production through [dbt Cloud orchestration](/docs/deploy/deployments), or be built directly into a user's development schema. +- Integrate with [dbt Explorer](/docs/collaborate/explore-projects) and the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). + + + +## Feedback + +Please note, always review AI-generated code and content as it may produce incorrect results. The visual editor features and/or functionality may be added or eliminated as part of the beta trial. + +To give feedback, please reach out to your dbt Labs account team. We appreciate your feedback and suggestions as we improve the visual editor. + diff --git a/website/docs/docs/collaborate/auto-exposures.md b/website/docs/docs/collaborate/auto-exposures.md index 371f6e80248..a333df19831 100644 --- a/website/docs/docs/collaborate/auto-exposures.md +++ b/website/docs/docs/collaborate/auto-exposures.md @@ -7,16 +7,18 @@ pagination_next: "docs/collaborate/data-tile" image: /img/docs/cloud-integrations/auto-exposures/explorer-lineage.jpg --- -# Auto-exposures +# Auto-exposures -As a data team, it’s critical that you have context into the downstream use cases and users of your data products. Auto-exposures integrates natively with Tableau (Power BI coming soon) and auto-generates downstream lineage in dbt Explorer for a richer experience. -:::info Available in beta -Auto-exposures are currently available in beta to a limited group of users and are gradually being rolled out. If you're interested in gaining access or learning more, stay tuned for updates! -::: +As a data team, it’s critical that you have context into the downstream use cases and users of your data products. Auto-exposures integrate natively with Tableau (Power BI coming soon) and auto-generate downstream lineage in dbt Explorer for a richer experience. + +Auto-exposures help users understand how their models are used in downstream analytics tools to inform investments and reduce incidents — ultimately building trust and confidence in data products. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. -Auto-exposures helps users understand how their models are used in downstream analytics tools to inform investments and reduce incidents — ultimately building trust and confidence in data products. It imports and auto-generates exposures based on Tableau dashboards, with user-defined curation. +## Supported plans +Auto-exposures is available on the [dbt Cloud Enterprise](https://www.getdbt.com/pricing/) plan. Currently, you can only connect to a single Tableau site on the same server. -Auto-exposures is available on [Versionless](/docs/dbt-versions/versionless-cloud) and on [dbt Cloud Enterprise](https://www.getdbt.com/pricing/) plans. +:::info Tableau Server +If you're using Tableau Server, you need to [allowlist dbt Cloud's IP addresses](/docs/cloud/about-cloud/access-regions-ip-addresses) for your dbt Cloud region. +::: For more information on how to set up auto-exposures, prerequisites, and more — refer to [configure auto-exposures in Tableau and dbt Cloud](/docs/cloud-integrations/configure-auto-exposures). diff --git a/website/docs/docs/collaborate/build-and-view-your-docs.md b/website/docs/docs/collaborate/build-and-view-your-docs.md index 06716a67674..1a16f034eff 100644 --- a/website/docs/docs/collaborate/build-and-view-your-docs.md +++ b/website/docs/docs/collaborate/build-and-view-your-docs.md @@ -24,7 +24,7 @@ To set up a job to generate docs: 1. In the top left, click **Deploy** and select **Jobs**. 2. Create a new job or select an existing job and click **Settings**. 3. Under **Execution Settings**, select **Generate docs on run** and click **Save**. - + *Note, for dbt Docs users you need to configure the job to generate docs when it runs, then manually link that job to your project. Proceed to [configure project documentation](#configure-project-documentation) so your project generates the documentation when this job runs.* @@ -51,12 +51,11 @@ dbt Docs, available on developer plans or dbt Core users, generates a website fr You configure project documentation to generate documentation when the job you set up in the previous section runs. In the project settings, specify the job that generates documentation artifacts for that project. Once you configure this setting, subsequent runs of the job will automatically include a step to generate documentation. -1. Click the gear icon in the top right. -2. Select **Account Settings**. -3. Navigate to **Projects** and select the project that needs documentation. -4. Click **Edit**. -5. Under **Artifacts**, select the job that should generate docs when it runs and click **Save**. - +1. From dbt Cloud, click on your account name in the left side menu and select **Account settings**. +2. Navigate to **Projects** and select the project that needs documentation. +3. Click **Edit**. +4. Under **Artifacts**, select the job that should generate docs when it runs and click **Save**. + :::tip Use dbt Explorer for a richer documentation experience For a richer and more interactive experience, try out [dbt Explorer](/docs/collaborate/explore-projects), available on [Team or Enterprise plans](https://www.getdbt.com/pricing/). It includes map layers of your DAG, keyword search, interacts with the IDE, model performance, project recommendations, and more. diff --git a/website/docs/docs/collaborate/data-health-signals.md b/website/docs/docs/collaborate/data-health-signals.md new file mode 100644 index 00000000000..756b43e3583 --- /dev/null +++ b/website/docs/docs/collaborate/data-health-signals.md @@ -0,0 +1,88 @@ +--- +title: "Data health signals" +sidebar_label: "Data health signals" +id: data-health-signals +description: "Learn how data health signals offer a quick, at-a-glance view of data health when browsing your resources in dbt Explorer." +image: /img/docs/collaborate/dbt-explorer/data-health-signal.jpg +--- + +# Data health signals +Data health signals offer a quick, at-a-glance view of data health when browsing your resources in dbt Explorer. They keep you informed on the status of your resource's health using the indicators **Healthy**, **Caution**, **Degraded**, or **Unknown**. + +- Supported resources are [models](/docs/build/models), [sources](/docs/build/sources), and [exposures](/docs/build/exposures). +- For accurate health data, ensure the resource is up-to-date and had a recent job run. +- Each data health signal reflects key data health components, such as test success status, missing resource descriptions, missing tests, absence of builds in 30-day windows, [and more](#data-health-signal-criteria) + + + +## Access data health signals + +Access data health signals in the following places: +- In the [search function](/docs/collaborate/explore-projects#search-resources) or under **Models**, **Sources**, or **Exposures** in the **Resource** tab. + - For sources, the data health signal also indicates the [source freshness](/docs/deploy/source-freshness) status. +- In the **Health** column on [each resource's details page](/docs/collaborate/explore-projects#view-resource-details). Hover over or click the signal to view detailed information. +- In the **Health** column of public models tables. +- In the [DAG lineage graph](/docs/collaborate/explore-projects#project-lineage). Click any node to open the node details panel where you can view it and its details. +- In [Data health tiles](/docs/collaborate/data-tile) through an embeddable iFrame and visible in your BI dashboard. + + + +## Data health signal criteria + +Each resource has a health state that is determined by specific set of criteria. Select the following tabs to view the criteria for that resource type. + + + +The health state of a model is determined by the following criteria: + +| **Health state** | **Criteria** | +|-------------------|---------------| +| ✅ **Healthy** | All of the following must be true:

- Built successfully in the last run
- Built in the last 30 days
- Model has tests configured
- All tests passed
- All upstream [sources are fresh](/docs/build/sources#source-data-freshness) or freshness is not applicable (set to `null`)
- Has a description | +| 🟡 **Caution** | One of the following must be true:

- Not built in the last 30 days
- Tests are not configured
- Tests return warnings
- One or more upstream sources are stale:
    - Has a freshness check configured
    - Freshness check ran in the past 30 days
    - Freshness check returned a warning
- Missing a description | +| 🔴 **Degraded** | One of the following must be true:

- Model failed to build
- Model has failing tests
- One or more upstream sources are stale:
    - Freshness check hasn’t run in the past 30 days
    - Freshness check returned an error | +| ⚪ **Unknown** | - Unable to determine health of resource; no job runs have processed the resource. | + +
+ + + +The health state of a source is determined by the following criteria: + +| **Health state** | **Criteria** | +|-------------------|---------------| +| ✅ Healthy | All of the following must be true:

- Freshness check configured
- Freshness check passed
- Freshness check ran in the past 30 days
- Has a description | +| 🟡 Caution | One of the following must be true:

- Freshness check returned a warning
- Freshness check not configured
- Freshness check not run in the past 30 days
- Missing a description | +| 🔴 Degraded | - Freshness check returned an error | +| ⚪ Unknown | Unable to determine health of resource; no job runs have processed the resource. | + +
+ + + +The health state of an exposure is determined by the following criteria: + +| **Health state** | **Criteria** | +|-------------------|---------------| +| ✅ Healthy | All of the following must be true:

- Underlying sources are fresh
- Underlying models built successfully
- Underlying models’ tests passing
| +| 🟡 Caution | One of the following must be true:

- At least one underlying source’s freshness checks returned a warning
- At least one underlying model was skipped
- At least one underlying model’s tests returned a warning
| +| 🔴 Degraded | One of the following must be true:

- At least one underlying source’s freshness checks returned an error
- At least one underlying model did not build successfully
- At least one model’s tests returned an error | + +
+ + + +
diff --git a/website/docs/docs/collaborate/data-tile.md b/website/docs/docs/collaborate/data-tile.md index efd6a0d59aa..077a4f5a740 100644 --- a/website/docs/docs/collaborate/data-tile.md +++ b/website/docs/docs/collaborate/data-tile.md @@ -2,30 +2,24 @@ title: "Data health tile" id: "data-tile" sidebar_label: "Data health tile" -description: "Embed data health tiles in your dashboards to distill trust signals for data consumers." +description: "Embed data health tiles in your dashboards to distill data health signals for data consumers." image: /img/docs/collaborate/dbt-explorer/data-tile-pass.jpg --- -# Embed data health tile in dashboards - -With data health tiles, stakeholders will get an at-a-glance confirmation on whether the data they’re looking at is stale or degraded. This trust signal allows teams to immediately go back into Explorer to see more details and investigate issues. - -:::info Available in beta -Data health tile is currently available in open beta. -::: +With data health tiles, stakeholders will get an at-a-glance confirmation on whether the data they’re looking at is stale or degraded. It allows teams to immediately go back into Explorer to see more details and investigate issues. The data health tile: -- Distills trust signals for data consumers. +- Distills [data health signals](/docs/collaborate/data-health-signals) for data consumers. - Deep links you into dbt Explorer where you can further dive into upstream data issues. - Provides richer information and makes it easier to debug. - Revamps the existing, [job-based tiles](#job-based-data-health). -Data health tiles rely on [exposures](/docs/build/exposures) to surface trust signals in your dashboards. When you configure exposures in your dbt project, you are explicitly defining how specific outputs—like dashboards or reports—depend on your data models. +Data health tiles rely on [exposures](/docs/build/exposures) to surface data health signals in your dashboards. When you configure exposures in your dbt project, you are explicitly defining how specific outputs—like dashboards or reports—depend on your data models. - + ## Prerequisites @@ -69,7 +63,7 @@ Follow these steps to set up your data health tile: 6. Navigate back to dbt Explorer and select an exposure. 7. Below the **Data health** section, expand on the toggle for instructions on how to embed the exposure tile (if you're an account admin with develop permissions). 8. In the expanded toggle, you'll see a text field where you can paste your **Metadata Only token**. - + 9. Once you’ve pasted your token, you can select either **URL** or **iFrame** depending on which you need to add to your dashboard. @@ -98,7 +92,7 @@ Follow these steps to embed the data health tile in PowerBI: ```html Website = - "" + "" ``` @@ -122,19 +116,38 @@ Follow these steps to embed the data health tile in Tableau: 1. Create a dashboard in Tableau and connect to your database to pull in the data. -2. Ensure you've copied the iFrame snippet available in dbt Explorer's **Data health** section, under the **Embed data health into your dashboard** toggle. -3. Embed the snippet in your dashboard. +2. Ensure you've copied the URL or iFrame snippet available in dbt Explorer's **Data health** section, under the **Embed data health into your dashboard** toggle. +3. Insert a **Web Page** object. +4. Insert the URL and click **Ok**. - `