Skip to content

Commit

Permalink
Clean up roadmap, change file available to file exists for validation…
Browse files Browse the repository at this point in the history
… wait condition
  • Loading branch information
pflooky committed Nov 29, 2023
1 parent d047410 commit 998bfab
Show file tree
Hide file tree
Showing 6 changed files with 15 additions and 15 deletions.
2 changes: 1 addition & 1 deletion docs/setup/validation.md
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ validations. This can be via:
url: "http://localhost:8080/finished"
```

### File available
### File exists

=== "Java"

Expand Down
10 changes: 5 additions & 5 deletions docs/use-case/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ Items below summarise the roadmap of Data Caterer. As each task gets completed,
| Feature | Description | Sub Tasks |
|----------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data source support | Batch or real time data sources that can be added to Data Caterer. Support data sources that users want | - AWS, GCP and Azure related data services (:white_check_mark: [cloud storage](../setup/advanced.md#cloud-storage))<br>- Deltalake<br>- RabbitMQ<br>- ActiveMQ<br>- MongoDB<br>- Elasticsearch<br>- Snowflake<br>- Databricks<br>- Pulsar |
| Metadata discovery | Allow for schema and data profiling from external metadata sources | - :white_check_mark: [HTTP (OpenAPI spec)](../setup/guide/data-source/http.md)<br>- JMS<br>- Read from samples- :white_check_mark: [OpenLineage metadata (Marquez)](../setup/guide/data-source/marquez-metadata-source.md)<br>- :white_check_mark: [OpenMetadata](../setup/guide/data-source/open-metadata-source.md)<br>- ODCS (Open Data Contract Standard)<br>- Amundsen<br>- Datahub<br>- Solace Event Portal<br>- Airflow<br>- DBT |
| Developer API | Scala/Java interface for developers/testers to create data generation and validation tasks | - :white_check_mark: Scala<br>- :white_check_mark: Java |
| Report generation | Generate a report that summarises the data generation or validation results | - [Report for data generated and validation rules](../sample/report/html/index.html) |
| Metadata discovery | Allow for schema and data profiling from external metadata sources | - :white_check_mark: [HTTP (OpenAPI spec)](../setup/guide/data-source/http.md)<br>- JMS<br>- Read from samples<br>- :white_check_mark: [OpenLineage metadata (Marquez)](../setup/guide/data-source/marquez-metadata-source.md)<br>- :white_check_mark: [OpenMetadata](../setup/guide/data-source/open-metadata-source.md)<br>- ODCS (Open Data Contract Standard)<br>- Amundsen<br>- Datahub<br>- Solace Event Portal<br>- Airflow<br>- DBT |
| Developer API | Scala/Java interface for developers/testers to create data generation and validation tasks | - :white_check_mark: [Scala](https://github.com/data-catering/data-caterer-example)<br>- :white_check_mark: [Java](https://github.com/data-catering/data-caterer-example) |
| Report generation | Generate a report that summarises the data generation or validation results | - :white_check_mark: [Report for data generated and validation rules](../sample/report/html/index.html) |
| UI portal | Allow users to access a UI to input data generation or validation tasks. Also be able to view report results | - Metadata stored in database<br>- Store data generation/validation run information in file/database |
| Integration with data validation tools | Derive data validation rules from existing data validation tools | - [Great Expectation](https://greatexpectations.io/)<br>- [DBT constraints](https://docs.getdbt.com/reference/resource-properties/constraints)<br>- [SodaCL](https://docs.soda.io/soda-cl/soda-cl-overview.html)<br>- [MonteCarlo](https://docs.getmontecarlo.com/docs/monitors-as-code) |
| Data validation rule suggestions | Based on metadata, generate data validation rules appropriate for the dataset | - :white_check_mark: Suggest basic data validations |
| Wait conditions before data validation | Define certain conditions to be met before starting data validations | - :white_check_mark: Webhook<br>- :white_check_mark: File exists<br>- :white_check_mark: Data exists via SQL expression<br>- :white_check_mark: Pause |
| Data validation rule suggestions | Based on metadata, generate data validation rules appropriate for the dataset | - :white_check_mark: Suggest basic data validations (yet to document) |
| Wait conditions before data validation | Define certain conditions to be met before starting data validations | - :white_check_mark: [Webhook](../setup/validation.md#webhook)<br>- :white_check_mark: [File exists](../setup/validation.md#file-exists)<br>- :white_check_mark: [Data exists via SQL expression](../setup/validation.md#data-exists)<br>- :white_check_mark: [Pause](../setup/validation.md#pause) |
| Validation types | Ability to define simple/complex data validations | - :white_check_mark: [Basic validations](../setup/validation/basic-validation.md)<br>- :white_check_mark: [Aggregates](../setup/validation/group-by-validation.md) (sum of amount per account is > 500)<br>- Ordering (transactions are ordered by date)<br>- :white_check_mark: [Relationship](../setup/validation/upstream-data-source-validation.md) (at least one account entry in history table per account in accounts table)<br>- Data profile (how close the generated data profile is compared to the expected data profile) |
| Data generation record count | Generate scenarios where there are one to many, many to many situations relating to record count. Also ability to cover all edge cases or scenarios | - Cover all possible cases (i.e. record for each combination of oneOf values, positive/negative values etc.)<br>- Ability to override edge cases |
| Alerting | When tasks have completed, ability to define alerts based on certain conditions | - Slack<br>- Email |
Expand Down
2 changes: 1 addition & 1 deletion site/search/search_index.json

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions site/setup/validation/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -1452,10 +1452,10 @@
</li>

<li class="md-nav__item">
<a href="#file-available" class="md-nav__link">
<a href="#file-exists" class="md-nav__link">
<span class="md-ellipsis">

File available
File exists

</span>
</a>
Expand Down Expand Up @@ -2336,7 +2336,7 @@ <h3 id="webhook">Webhook</h3>
</div>
</div>
</div>
<h3 id="file-available">File available</h3>
<h3 id="file-exists">File exists</h3>
<div class="tabbed-set tabbed-alternate" data-tabs="5:3"><input checked="checked" id="__tabbed_5_1" name="__tabbed_5" type="radio" /><input id="__tabbed_5_2" name="__tabbed_5" type="radio" /><input id="__tabbed_5_3" name="__tabbed_5" type="radio" /><div class="tabbed-labels"><label for="__tabbed_5_1">Java</label><label for="__tabbed_5_2">Scala</label><label for="__tabbed_5_3">YAML</label></div>
<div class="tabbed-content">
<div class="tabbed-block">
Expand Down
Binary file modified site/sitemap.xml.gz
Binary file not shown.
10 changes: 5 additions & 5 deletions site/use-case/roadmap/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -2023,17 +2023,17 @@ <h1 id="roadmap">Roadmap</h1>
<tr>
<td>Metadata discovery</td>
<td>Allow for schema and data profiling from external metadata sources</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/http/">HTTP (OpenAPI spec)</a><br>- JMS<br>- Read from samples- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/marquez-metadata-source/">OpenLineage metadata (Marquez)</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/open-metadata-source/">OpenMetadata</a><br>- ODCS (Open Data Contract Standard)<br>- Amundsen<br>- Datahub<br>- Solace Event Portal<br>- Airflow<br>- DBT</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/http/">HTTP (OpenAPI spec)</a><br>- JMS<br>- Read from samples<br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/marquez-metadata-source/">OpenLineage metadata (Marquez)</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/guide/data-source/open-metadata-source/">OpenMetadata</a><br>- ODCS (Open Data Contract Standard)<br>- Amundsen<br>- Datahub<br>- Solace Event Portal<br>- Airflow<br>- DBT</td>
</tr>
<tr>
<td>Developer API</td>
<td>Scala/Java interface for developers/testers to create data generation and validation tasks</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Scala<br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Java</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="https://github.com/data-catering/data-caterer-example">Scala</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="https://github.com/data-catering/data-caterer-example">Java</a></td>
</tr>
<tr>
<td>Report generation</td>
<td>Generate a report that summarises the data generation or validation results</td>
<td>- <a href="../../sample/report/html/">Report for data generated and validation rules</a></td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../sample/report/html/">Report for data generated and validation rules</a></td>
</tr>
<tr>
<td>UI portal</td>
Expand All @@ -2048,12 +2048,12 @@ <h1 id="roadmap">Roadmap</h1>
<tr>
<td>Data validation rule suggestions</td>
<td>Based on metadata, generate data validation rules appropriate for the dataset</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Suggest basic data validations</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Suggest basic data validations (yet to document)</td>
</tr>
<tr>
<td>Wait conditions before data validation</td>
<td>Define certain conditions to be met before starting data validations</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Webhook<br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> File exists<br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Data exists via SQL expression<br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> Pause</td>
<td>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/validation/#webhook">Webhook</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/validation/#file-exists">File exists</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/validation/#data-exists">Data exists via SQL expression</a><br>- <img alt="" class="twemoji" src="https://cdn.jsdelivr.net/gh/jdecked/[email protected]/assets/svg/2705.svg" title=":white_check_mark:" /> <a href="../../setup/validation/#pause">Pause</a></td>
</tr>
<tr>
<td>Validation types</td>
Expand Down

0 comments on commit 998bfab

Please sign in to comment.