Skip to content

Commit

Permalink
add microbatch to data platform configs (#6588)
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Dec 5, 2024
2 parents 9d4107a + d87c7c2 commit 12fcd50
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 9 deletions.
7 changes: 4 additions & 3 deletions website/docs/reference/resource-configs/bigquery-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -425,9 +425,10 @@ Please note that in order for policy tags to take effect, [column-level `persist

The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. dbt uses a [merge statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax) on BigQuery to refresh incremental tables.

The `incremental_strategy` config can be set to one of two values:
- `merge` (default)
- `insert_overwrite`
The `incremental_strategy` config can be set to one of the following values:
- `merge` (default)
- `insert_overwrite`
- [`microbatch`](/docs/build/incremental-microbatch)

### Performance and cost

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ In dbt-postgres, the following incremental materialization strategies are suppor
- `append` (default when `unique_key` is not defined)
- `merge`
- `delete+insert` (default when `unique_key` is defined)
- [`microbatch`](/docs/build/incremental-microbatch)

## Performance optimizations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ In dbt-redshift, the following incremental materialization strategies are suppor
- `append` (default when `unique_key` is not defined)
- `merge`
- `delete+insert` (default when `unique_key` is defined)
- [`microbatch`](/docs/build/incremental-microbatch)

All of these strategies are inherited from dbt-postgres.

Expand Down
17 changes: 12 additions & 5 deletions website/docs/reference/resource-configs/snowflake-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,11 @@ flags:
The following configurations are supported.
For more information, check out the Snowflake reference for [`CREATE ICEBERG TABLE` (Snowflake as the catalog)](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake).

| Field | Type | Required | Description | Sample input | Note |
| --------------------- | ------ | -------- | -------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. |
| Field | Type | Required | Description | Sample input | Note |
| ------ | ----- | -------- | ------------- | ------------ | ------ |
| Table Format | String | Yes | Configures the objects table format. | `iceberg` | `iceberg` is the only accepted value. |
| External volume | String | Yes(*) | Specifies the identifier (name) of the external volume where Snowflake writes the Iceberg table's metadata and data files. | `my_s3_bucket` | *You don't need to specify this if the account, database, or schema already has an associated external volume. [More info](https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-snowflake#:~:text=Snowflake%20Table%20Structures.-,external_volume) |
| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. |
| Base location Subpath | String | No | An optional suffix to add to the `base_location` path that dbt automatically specifies. | `jaffle_marketing_folder` | We recommend that you do not specify this. Modifying this parameter results in a new Iceberg table. See [Base Location](#base-location) for more info. |

### Example configuration

Expand Down Expand Up @@ -470,8 +470,15 @@ In this example, you can set up a query tag to be applied to every query with th

The [`incremental_strategy` config](/docs/build/incremental-strategy) controls how dbt builds incremental models. By default, dbt will use a [merge statement](https://docs.snowflake.net/manuals/sql-reference/sql/merge.html) on Snowflake to refresh incremental tables.

Snowflake supports the following incremental strategies:
- Merge (default)
- Append
- Delete+insert
- [`microbatch`](/docs/build/incremental-microbatch)

Snowflake's `merge` statement fails with a "nondeterministic merge" error if the `unique_key` specified in your model config is not actually unique. If you encounter this error, you can instruct dbt to use a two-step incremental approach by setting the `incremental_strategy` config for your model to `delete+insert`.


## Configuring table clustering

dbt supports [table clustering](https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html) on Snowflake. To control clustering for a <Term id="table" /> or incremental model, use the `cluster_by` config. When this configuration is applied, dbt will do two things:
Expand Down Expand Up @@ -701,4 +708,4 @@ flags:
```

</VersionBlock>
</VersionBlock>
3 changes: 2 additions & 1 deletion website/docs/reference/resource-configs/spark-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,8 @@ For that reason, the dbt-spark plugin leans heavily on the [`incremental_strateg
- **`append`** (default): Insert new records without updating or overwriting any existing data.
- **`insert_overwrite`**: If `partition_by` is specified, overwrite partitions in the <Term id="table" /> with new data. If no `partition_by` is specified, overwrite the entire table with new data.
- **`merge`** (Delta, Iceberg and Hudi file format only): Match records based on a `unique_key`; update old records, insert new ones. (If no `unique_key` is specified, all new data is inserted, similar to `append`.)

- `microbatch` Implements the [microbatch strategy](/docs/build/incremental-microbatch) using `event_time` to define time-based ranges for filtering data.

Each of these strategies has its pros and cons, which we'll discuss below. As with any model config, `incremental_strategy` may be specified in `dbt_project.yml` or within a model file's `config()` block.

### The `append` strategy
Expand Down

0 comments on commit 12fcd50

Please sign in to comment.