Skip to content

Commit

Permalink
Update quality
Browse files Browse the repository at this point in the history
  • Loading branch information
jochenchrist committed Jul 20, 2024
1 parent b9a976e commit 4f5ebe2
Showing 1 changed file with 13 additions and 8 deletions.
21 changes: 13 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -785,7 +785,7 @@ Quality attributes are checks that can be applied to the data to ensure its qual
Quality attributes can be:
- Text: A human-readable text that describes the quality of the data.
- SQL: An individual SQL query that returns a single value that can be compared.
- Engine-specific Types: Currently engines `soda` and `great-expectations` are supported.
- Engine-specific types: Pre-defined quality checks, as defined by data quality libraries. Currently, the engines `soda` and `great-expectations` are supported.

A quality object can be specified on field level, or on model level.
The top-level quality object are deprecated.
Expand All @@ -809,10 +809,11 @@ Example:
models:
my_table:
fields:
iban:
account_iban:
quality:
- type: text
description: Must be a valid IBAN.
name: Valid IBAN
description: Must be a valid IBAN. Must not be empty.
```


Expand Down Expand Up @@ -843,6 +844,7 @@ models:
my_table:
quality:
- type: sql
name: Maximum duration between two orders
description: The maximum duration between two orders should be less that 3600 seconds
query: |
SELECT MAX(EXTRACT(EPOCH FROM (order_timestamp - LAG(order_timestamp) OVER (ORDER BY order_timestamp)))) AS max_duration
Expand All @@ -851,13 +853,16 @@ models:
```


#### Soda Data Contract Checks
#### Engine: Soda

Applicable on: [x] model, [x] field


Quality attributes can be defined with the engine `soda` as [Data contract check reference](https://docs.soda.io/soda/data-contracts-checks.html).

Note: Soda Data contract check reference is experimental and may change in the future.

Note: Currently only supported by types Postgres, Snowflake, and Spark (Databricks)

##### Duplicate

- `no_duplicate_values` (equal to the property `unique: true`, but supports also multiple fields)
Expand All @@ -881,12 +886,12 @@ models:
type: string
quality:
- engine: soda
name: A shipment number should be unique for one carrier
type: duplicate_percent
must_be_less_than: 1.0
name: A shipment number is unique for one carrier
columns:
- carrier
- shipment_numer
must_be_less_than: 1.0
```

Freshness
Expand Down Expand Up @@ -938,7 +943,7 @@ models:
valid_sql_regex: '^[A-Z]{2}[0-9]{3}$'
```

#### Great Expectations
#### Engine: Great Expectations

Applicable on: [x] model, [ ] field

Expand Down

0 comments on commit 4f5ebe2

Please sign in to comment.