From e9aa36db63548ceb3480a26a57748b690dcb9e47 Mon Sep 17 00:00:00 2001 From: winnie <91998347+gwenwindflower@users.noreply.github.com> Date: Thu, 21 Sep 2023 15:17:44 -0500 Subject: [PATCH] Add example copypastable style guide starter to style guide (#4104) ## What are you changing in this pull request and why? Update to the style guide with a copy and pasteable starter 'snippet'/template on the last page. ## Checklist Just adding to an existing page! --- .../how-we-style/6-how-we-style-conclusion.md | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md b/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md index 22f8e36190a..a6402e46870 100644 --- a/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md +++ b/website/docs/guides/best-practices/how-we-style/6-how-we-style-conclusion.md @@ -10,3 +10,98 @@ Now that you've seen how we style our dbt projects, it's time to build your own. ## Pre-commit hooks Lastly, to ensure your style guide's automated rules are being followed without additional mental overhead to your team, you can use [pre-commit hooks](https://pre-commit.com/) to automatically check your code for style violations (and often fix them automagically) before it's committed. This is a great way to make sure your style guide is followed by all contributors. We recommend implementing this once you've settled on and published your style guide, and your codebase is conforming to it. This will ensure that all future commits follow the style guide. You can find an excellent set of open source pre-commit hooks for dbt from the community [here in the dbt-checkpoint project](https://github.com/dbt-checkpoint/dbt-checkpoint). + +## Style guide template + +```markdown +# dbt Example Style Guide + +## SQL Style + +- Use lowercase keywords. +- Use trailing commas. + +## Model Organization + +Our models (typically) fit into two main categories:\ + +- Staging — Contains models that clean and standardize data. +- Marts — Contains models which combine or heavily transform data. + +Things to note: + +- There are different types of models that typically exist in each of the above categories. See [Model Layers](#model-layers) for more information. +- Read [How we structure our dbt projects](https://docs.getdbt.com/guides/best-practices/how-we-structure/1-guide-overview) for an example and more details around organization. + +## Model Layers + +- Only models in `staging` should select from [sources](https://docs.getdbt.com/docs/building-a-dbt-project/using-sources). +- Models not in the `staging` folder should select from [refs](https://docs.getdbt.com/reference/dbt-jinja-functions/ref). + +## Model File Naming and Coding + +- All objects should be plural. + Example: `stg_stripe__invoices.sql` vs. `stg_stripe__invoice.sql` + +- All models should use the naming convention `___`. See [this article](https://docs.getdbt.com/blog/stakeholder-friendly-model-names) for more information. + + - Models in the **staging** folder should use the source's name as the `` and the entity name as the `additional_context`. + + Examples: + + - seed_snowflake_spend.csv + - base_stripe\_\_invoices.sql + - stg_stripe\_\_customers.sql + - stg_salesforce\_\_customers.sql + - int_customers\_\_unioned.sql + - fct_orders.sql + +- Schema, table, and column names should be in `snake_case`. + +- Limit the use of abbreviations that are related to domain knowledge. An onboarding employee will understand `current_order_status` better than `current_os`. + +- Use names based on the _business_ rather than the source terminology. + +- Each model should have a primary key to identify the unique row and should be named `_id`. For example, `account_id`. This makes it easier to know what `id` is referenced in downstream joined models. + +- For `base` or `staging` models, columns should be ordered in categories, where identifiers are first and date/time fields are at the end. +- Date/time columns should be named according to these conventions: + + - Timestamps: `_at` + Format: UTC + Example: `created_at` + + - Dates: `_date` + Format: Date + Example: `created_date` + +- Booleans should be prefixed with `is_` or `has_`. + Example: `is_active_customer` and `has_admin_access` + +- Price/revenue fields should be in decimal currency (for example, `19.99` for $19.99; many app databases store prices as integers in cents). If a non-decimal currency is used, indicate this with suffixes. For example, `price_in_cents`. + +- Avoid using reserved words (such as [these](https://docs.snowflake.com/en/sql-reference/reserved-keywords.html) for Snowflake) as column names. + +- Consistency is key! Use the same field names across models where possible. For example, a key to the `customers` table should be named `customer_id` rather than `user_id`. + +## Model Configurations + +- Model configurations at the [folder level](https://docs.getdbt.com/reference/model-configs#configuring-directories-of-models-in-dbt_projectyml) should be considered (and if applicable, applied) first. +- More specific configurations should be applied at the model level [using one of these methods](https://docs.getdbt.com/reference/model-configs#apply-configurations-to-one-model-only). +- Models within the `marts` folder should be materialized as `table` or `incremental`. + - By default, `marts` should be materialized as `table` within `dbt_project.yml`. + - If switching to `incremental`, this should be specified in the model's configuration. + +## Testing + +- At a minimum, `unique` and `not_null` tests should be applied to the expected primary key of each model. + +## CTEs + +For more information about why we use so many CTEs, read [this glossary entry](https://docs.getdbt.com/terms/cte). + +- Where performance permits, CTEs should perform a single, logical unit of work. +- CTE names should be as verbose as needed to convey what they do. +- CTEs with confusing or noteable logic should be commented with SQL comments as you would with any complex functions and should be located above the CTE. +- CTEs duplicated across models should be pulled out and created as their own models. +```