Skip to content

Commit

Permalink
Add in new section for report, add in alert configuration and externa…
Browse files Browse the repository at this point in the history
…l validation source pages, clean up package names to use updated path of io.github.datacatering, ensure all links point to repos in data-catering
  • Loading branch information
pflooky committed Jan 4, 2024
1 parent cc08230 commit 9b8c607
Show file tree
Hide file tree
Showing 78 changed files with 10,906 additions and 1,388 deletions.
Binary file added .cache/plugin/optimize/diagrams/slack_alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 2 additions & 1 deletion .cache/plugin/optimize/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,5 +19,6 @@
"diagrams/solace_messages_queued.png": "1ee16a1829114c52ef05234d46f0c487669fd6a3",
"diagrams/data_validation_report.png": "6b78f44b31983b53eea1bd3d7ae55cbebe7415b9",
"diagrams/high_level_flow-basic-flow.png": "a71a9aac071aa5dbcdd112c76daa5f8dc321c3a7",
"diagrams/upstream_validation_report.png": "188f61be3a2b39f03019ef7f6bbd57bc3d3892c8"
"diagrams/upstream_validation_report.png": "188f61be3a2b39f03019ef7f6bbd57bc3d3892c8",
"diagrams/slack_alert.png": "7b02a01c40fcbeeb953fd9753913e2fd3ab49a86"
}
Binary file added docs/diagrams/basic_data_caterer_flow_medium.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/diagrams/high_level_flow-external-source-validation.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/diagrams/slack_alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/get-started/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Ensure you have `docker` installed and running.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
cd data-caterer-example && ./run.sh
#check results under docker/sample/report/index.html folder
```
Expand All @@ -23,7 +23,7 @@ Sample report can also be seen [**here**](../sample/report/html/index.html)
1. Join the [Slack Data Catering Slack group here](https://join.slack.com/t/data-catering/shared_invite/zt-2664ylbpi-w3n7lWAO~PHeOG9Ujpm~~w)
2. Get an API_KEY by using slash command `/token` in the Slack group (will only be visible to you)
3.
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
cd data-caterer-example && export DATA_CATERING_API_KEY=<insert api key>
./run.sh

Expand Down
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ testing tool that aids in creating production-like data across both batch and ev
to ensure your systems have ingested it as expected, then clean up the data afterwards.</h1>

<figure markdown>
![Data Caterer generate and validate data flows](diagrams/high_level_flow-basic-flow.svg)
![Data Caterer generate and validate data flows](diagrams/basic_data_caterer_flow_medium.gif)
</figure>

<h1 align="center">Simplify your data testing</h1>
Expand Down
6 changes: 3 additions & 3 deletions docs/setup/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ You can alter the `status` column in the account data to only generate `open` ac
and define a foreign key between Postgres and parquet to ensure the same `account_id` is being used.
Then in the parquet task, define 1 to 10 transactions per `account_id` to be generated.

[Postgres account generation example task](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/jdbc/postgres/postgres-account-task.yaml)
[Parquet transaction generation example task](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/file/parquet/parquet-transaction-task.yaml)
[Plan](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/plan/scenario-based.yaml)
[Postgres account generation example task](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/jdbc/postgres/postgres-account-task.yaml)
[Parquet transaction generation example task](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/file/parquet/parquet-transaction-task.yaml)
[Plan](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/plan/scenario-based.yaml)

## Cloud storage

Expand Down
26 changes: 13 additions & 13 deletions docs/setup/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,24 @@ metadata gets saved.

These configurations are defined from within your Java or Scala class via `configuration` or for YAML file setup,
`application.conf` file as seen
[**here**](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/application.conf).
[**here**](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/application.conf).

## Flags

Flags are used to control which processes are executed when you run Data Caterer.

| Config | Default | Paid | Description |
|--------------------------------|---------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `enableGenerateData` | true | N | Enable/disable data generation |
| `enableCount` | true | N | Count the number of records generated. Can be disabled to improve performance |
| `enableFailOnError` | true | N | Whilst saving generated data, if there is an error, it will stop any further data from being generated |
| `enableSaveReports` | true | N | Enable/disable HTML reports summarising data generated, metadata of data generated (if `enableSinkMetadata` is enabled) and validation results (if `enableValidation` is enabled). Sample [**here**](generator/report.md) |
| `enableSinkMetadata` | true | N | Run data profiling for the generated data. Shown in HTML reports if `enableSaveSinkMetadata` is enabled |
| `enableValidation` | false | N | Run validations as described in plan. Results can be viewed from logs or from HTML report if `enableSaveSinkMetadata` is enabled. Sample [**here**](validation.md) |
| `enableGeneratePlanAndTasks` | false | Y | Enable/disable plan and task auto generation based off data source connections |
| `enableRecordTracking` | false | Y | Enable/disable which data records have been generated for any data source |
| `enableDeleteGeneratedRecords` | false | Y | Delete all generated records based off record tracking (if `enableRecordTracking` has been set to true) |
| `enableGenerateValidations` | false | Y | If enabled, it will generate validations based on the data sources defined. |
| Config | Default | Paid | Description |
|--------------------------------|---------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `enableGenerateData` | true | N | Enable/disable data generation |
| `enableCount` | true | N | Count the number of records generated. Can be disabled to improve performance |
| `enableFailOnError` | true | N | Whilst saving generated data, if there is an error, it will stop any further data from being generated |
| `enableSaveReports` | true | N | Enable/disable HTML reports summarising data generated, metadata of data generated (if `enableSinkMetadata` is enabled) and validation results (if `enableValidation` is enabled). Sample [**here**](report/html-report.md) |
| `enableSinkMetadata` | true | N | Run data profiling for the generated data. Shown in HTML reports if `enableSaveSinkMetadata` is enabled |
| `enableValidation` | false | N | Run validations as described in plan. Results can be viewed from logs or from HTML report if `enableSaveSinkMetadata` is enabled. Sample [**here**](validation.md) |
| `enableGeneratePlanAndTasks` | false | Y | Enable/disable plan and task auto generation based off data source connections |
| `enableRecordTracking` | false | Y | Enable/disable which data records have been generated for any data source |
| `enableDeleteGeneratedRecords` | false | Y | Delete all generated records based off record tracking (if `enableRecordTracking` has been set to true) |
| `enableGenerateValidations` | false | Y | If enabled, it will generate validations based on the data sources defined. |

=== "Java"

Expand Down
2 changes: 1 addition & 1 deletion docs/setup/connection.md
Original file line number Diff line number Diff line change
Expand Up @@ -410,7 +410,7 @@ found [**here**](https://spark.apache.org/docs/latest/structured-streaming-kafka

When defining your schema for pushing data to Kafka, it follows a specific top level schema.
An example can be
found [**here**](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/kafka/kafka-account-task.yaml)
found [**here**](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/kafka/kafka-account-task.yaml)
. You can define the key, value, headers, partition or topic by following the linked schema.

### JMS
Expand Down
8 changes: 4 additions & 4 deletions docs/setup/deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Two main ways to deploy and run Data Caterer:
## Docker

To package up your class along with the Data Caterer base image, you can follow
the [Dockerfile that is created for you here](https://github.com/pflooky/data-caterer-example/blob/main/Dockerfile).
the [Dockerfile that is created for you here](https://github.com/data-catering/data-caterer-example/blob/main/Dockerfile).

Then you can run the following:

Expand All @@ -19,13 +19,13 @@ docker build -t <my_image_name>:<my_image_tag> .

## Helm

[Link to sample helm on GitHub here](https://github.com/pflooky/data-caterer-example/tree/main/helm/data-caterer)
[Link to sample helm on GitHub here](https://github.com/data-catering/data-caterer-example/tree/main/helm/data-caterer)

Update
the [configuration](https://github.com/pflooky/data-caterer-example/blob/main/helm/data-caterer/templates/configuration.yaml)
the [configuration](https://github.com/data-catering/data-caterer-example/blob/main/helm/data-caterer/templates/configuration.yaml)
to your own data connections and configuration or own image created from above.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
helm install data-caterer ./data-caterer-example/helm/data-caterer
```
12 changes: 6 additions & 6 deletions docs/setup/guide/data-source/cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ for the tables you configure.
First, we will clone the data-caterer-example repo which will already have the base project setup required.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
```

If you already have a Cassandra instance running, you can skip to [this step](#plan-setup).
Expand Down Expand Up @@ -59,15 +59,15 @@ metadata information about tables and columns from the below tables.

Create a new Java or Scala class.

- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedCassandraJavaPlan.java`
- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedCassandraPlan.scala`
- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedCassandraJavaPlan.java`
- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedCassandraPlan.scala`

Make sure your class extends `PlanRun`.

=== "Java"

```java
import com.github.pflooky.datacaterer.java.api.PlanRun;
import io.github.datacatering.datacaterer.java.api.PlanRun;

public class MyAdvancedCassandraJavaPlan extends PlanRun {
}
Expand All @@ -76,7 +76,7 @@ Make sure your class extends `PlanRun`.
=== "Scala"

```scala
import com.github.pflooky.datacaterer.api.PlanRun
import io.github.datacatering.datacaterer.api.PlanRun

class MyAdvancedCassandraPlan extends PlanRun {
}
Expand Down Expand Up @@ -124,7 +124,7 @@ defined under`docker/data/cql/customer.cql`. This table should already be setup
[step](#cassandra-setup). We can check if the table is setup already via the following command:

```shell
docker exec host.docker.internal cqlsh -e 'describe account.accounts; describe account.account_status_history;'
docker exec docker-cassandraserver-1 cqlsh -e 'describe account.accounts; describe account.account_status_history;'
```

Here we should see some output that looks like the below. This tells us what schema we need to follow when generating
Expand Down
14 changes: 7 additions & 7 deletions docs/setup/guide/data-source/http.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Creating a data generator based on an [OpenAPI/Swagger](https://spec.openapis.or
First, we will clone the data-caterer-example repo which will already have the base project setup required.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
```

### HTTP Setup
Expand All @@ -44,15 +44,15 @@ docker ps

Create a new Java or Scala class.

- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedHttpJavaPlanRun.java`
- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedHttpPlanRun.scala`
- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedHttpJavaPlanRun.java`
- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedHttpPlanRun.scala`

Make sure your class extends `PlanRun`.

=== "Java"

```java
import com.github.pflooky.datacaterer.java.api.PlanRun;
import io.github.datacatering.datacaterer.java.api.PlanRun;
...

public class MyAdvancedHttpJavaPlanRun extends PlanRun {
Expand All @@ -66,7 +66,7 @@ Make sure your class extends `PlanRun`.
=== "Scala"

```scala
import com.github.pflooky.datacaterer.api.PlanRun
import io.github.datacatering.datacaterer.api.PlanRun
...

class MyAdvancedHttpPlanRun extends PlanRun {
Expand Down Expand Up @@ -266,7 +266,7 @@ want to alter this value, you can do so via the below configuration. The lowest
=== "Java"

```java
import com.github.pflooky.datacaterer.api.model.Constants;
import io.github.datacatering.datacaterer.api.model.Constants;

...
var httpTask = http("my_http", Map.of(Constants.ROWS_PER_SECOND(), "1"))
Expand All @@ -276,7 +276,7 @@ want to alter this value, you can do so via the below configuration. The lowest
=== "Scala"

```scala
import com.github.pflooky.datacaterer.api.model.Constants.ROWS_PER_SECOND
import io.github.datacatering.datacaterer.api.model.Constants.ROWS_PER_SECOND

...
val httpTask = http("my_http", options = Map(ROWS_PER_SECOND -> "1"))
Expand Down
10 changes: 5 additions & 5 deletions docs/setup/guide/data-source/kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ for the topics you configure.
First, we will clone the data-caterer-example repo which will already have the base project setup required.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
```

If you already have a Kafka instance running, you can skip to [this step](#plan-setup).
Expand All @@ -39,15 +39,15 @@ docker-compose up -d kafka

Create a new Java or Scala class.

- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedKafkaJavaPlan.java`
- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedKafkaPlan.scala`
- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedKafkaJavaPlan.java`
- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedKafkaPlan.scala`

Make sure your class extends `PlanRun`.

=== "Java"

```java
import com.github.pflooky.datacaterer.java.api.PlanRun;
import io.github.datacatering.datacaterer.java.api.PlanRun;

public class MyAdvancedKafkaJavaPlan extends PlanRun {
}
Expand All @@ -56,7 +56,7 @@ Make sure your class extends `PlanRun`.
=== "Scala"

```scala
import com.github.pflooky.datacaterer.api.PlanRun
import io.github.datacatering.datacaterer.api.PlanRun

class MyAdvancedKafkaPlan extends PlanRun {
}
Expand Down
10 changes: 5 additions & 5 deletions docs/setup/guide/data-source/marquez-metadata-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ follows [OpenLineage API](https://openlineage.io/)).
First, we will clone the data-caterer-example repo which will already have the base project setup required.

```shell
git clone [email protected]:pflooky/data-caterer-example.git
git clone [email protected]:data-catering/data-caterer-example.git
```

### Marquez Setup
Expand Down Expand Up @@ -48,15 +48,15 @@ docker exec marquez-db psql -Upostgres -c 'CREATE DATABASE food_delivery'

Create a new Java or Scala class.

- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedMetadataSourceJavaPlanRun.java`
- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedMetadataSourcePlanRun.scala`
- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedMetadataSourceJavaPlanRun.java`
- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedMetadataSourcePlanRun.scala`

Make sure your class extends `PlanRun`.

=== "Java"

```java
import com.github.pflooky.datacaterer.java.api.PlanRun;
import io.github.datacatering.datacaterer.java.api.PlanRun;
...

public class MyAdvancedMetadataSourceJavaPlanRun extends PlanRun {
Expand All @@ -70,7 +70,7 @@ Make sure your class extends `PlanRun`.
=== "Scala"

```scala
import com.github.pflooky.datacaterer.api.PlanRun
import io.github.datacatering.datacaterer.api.PlanRun
...

class MyAdvancedMetadataSourcePlanRun extends PlanRun {
Expand Down
Loading

0 comments on commit 9b8c607

Please sign in to comment.