Add in new section for report, add in alert configuration and externa…

…l validation source pages, clean up package names to use updated path of io.github.datacatering, ensure all links point to repos in data-catering
data-catering · Jan 4, 2024 · 9b8c607 · 9b8c607
1 parent cc08230
commit 9b8c607
Show file tree

Hide file tree

Showing 78 changed files with 10,906 additions and 1,388 deletions.
diff --git a/.cache/plugin/optimize/diagrams/slack_alert.png b/.cache/plugin/optimize/diagrams/slack_alert.png
diff --git a/.cache/plugin/optimize/manifest.json b/.cache/plugin/optimize/manifest.json
@@ -19,5 +19,6 @@
   "diagrams/solace_messages_queued.png": "1ee16a1829114c52ef05234d46f0c487669fd6a3",
   "diagrams/data_validation_report.png": "6b78f44b31983b53eea1bd3d7ae55cbebe7415b9",
   "diagrams/high_level_flow-basic-flow.png": "a71a9aac071aa5dbcdd112c76daa5f8dc321c3a7",
-  "diagrams/upstream_validation_report.png": "188f61be3a2b39f03019ef7f6bbd57bc3d3892c8"
+  "diagrams/upstream_validation_report.png": "188f61be3a2b39f03019ef7f6bbd57bc3d3892c8",
+  "diagrams/slack_alert.png": "7b02a01c40fcbeeb953fd9753913e2fd3ab49a86"
 }
diff --git a/docs/diagrams/basic_data_caterer_flow_medium.gif b/docs/diagrams/basic_data_caterer_flow_medium.gif
diff --git a/docs/diagrams/high_level_flow-external-source-validation.svg b/docs/diagrams/high_level_flow-external-source-validation.svg
diff --git a/docs/diagrams/slack_alert.png b/docs/diagrams/slack_alert.png
diff --git a/docs/get-started/docker.md b/docs/get-started/docker.md
@@ -5,7 +5,7 @@
 Ensure you have `docker` installed and running.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 cd data-caterer-example && ./run.sh
 #check results under docker/sample/report/index.html folder
 ```
@@ -23,7 +23,7 @@ Sample report can also be seen [**here**](../sample/report/html/index.html)
 1. Join the [Slack Data Catering Slack group here](https://join.slack.com/t/data-catering/shared_invite/zt-2664ylbpi-w3n7lWAO~PHeOG9Ujpm~~w)
 2. Get an API_KEY by using slash command `/token` in the Slack group (will only be visible to you)
 3. 
-        git clone [email protected]:pflooky/data-caterer-example.git
+        git clone [email protected]:data-catering/data-caterer-example.git
         cd data-caterer-example && export DATA_CATERING_API_KEY=<insert api key>
         ./run.sh
 

diff --git a/docs/index.md b/docs/index.md
@@ -8,7 +8,7 @@ testing tool that aids in creating production-like data across both batch and ev
 to ensure your systems have ingested it as expected, then clean up the data afterwards.</h1>
 
 <figure markdown>
-  ![Data Caterer generate and validate data flows](diagrams/high_level_flow-basic-flow.svg)
+  ![Data Caterer generate and validate data flows](diagrams/basic_data_caterer_flow_medium.gif)
 </figure>
 
 <h1 align="center">Simplify your data testing</h1>

diff --git a/docs/setup/advanced.md b/docs/setup/advanced.md
@@ -65,9 +65,9 @@ You can alter the `status` column in the account data to only generate `open` ac
 and define a foreign key between Postgres and parquet to ensure the same `account_id` is being used.  
 Then in the parquet task, define 1 to 10 transactions per `account_id` to be generated.
 
-[Postgres account generation example task](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/jdbc/postgres/postgres-account-task.yaml)  
-[Parquet transaction generation example task](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/file/parquet/parquet-transaction-task.yaml)  
-[Plan](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/plan/scenario-based.yaml)
+[Postgres account generation example task](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/jdbc/postgres/postgres-account-task.yaml)  
+[Parquet transaction generation example task](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/file/parquet/parquet-transaction-task.yaml)  
+[Plan](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/plan/scenario-based.yaml)
 
 ## Cloud storage
 

diff --git a/docs/setup/configuration.md b/docs/setup/configuration.md
@@ -5,24 +5,24 @@ metadata gets saved.
 
 These configurations are defined from within your Java or Scala class via `configuration` or for YAML file setup,
 `application.conf` file as seen 
-[**here**](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/application.conf).
+[**here**](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/application.conf).
 
 ## Flags
 
 Flags are used to control which processes are executed when you run Data Caterer.
 
-| Config                         | Default | Paid | Description                                                                                                                                                                                                               |
-|--------------------------------|---------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `enableGenerateData`           | true    | N    | Enable/disable data generation                                                                                                                                                                                            |
-| `enableCount`                  | true    | N    | Count the number of records generated. Can be disabled to improve performance                                                                                                                                             |
-| `enableFailOnError`            | true    | N    | Whilst saving generated data, if there is an error, it will stop any further data from being generated                                                                                                                    |
-| `enableSaveReports`            | true    | N    | Enable/disable HTML reports summarising data generated, metadata of data generated (if `enableSinkMetadata` is enabled) and validation results (if `enableValidation` is enabled). Sample [**here**](generator/report.md) |
-| `enableSinkMetadata`           | true    | N    | Run data profiling for the generated data. Shown in HTML reports if `enableSaveSinkMetadata` is enabled                                                                                                                   |
-| `enableValidation`             | false   | N    | Run validations as described in plan. Results can be viewed from logs or from HTML report if `enableSaveSinkMetadata` is enabled. Sample [**here**](validation.md)                                             |
-| `enableGeneratePlanAndTasks`   | false   | Y    | Enable/disable plan and task auto generation based off data source connections                                                                                                                                            |
-| `enableRecordTracking`         | false   | Y    | Enable/disable which data records have been generated for any data source                                                                                                                                                 |
-| `enableDeleteGeneratedRecords` | false   | Y    | Delete all generated records based off record tracking (if `enableRecordTracking` has been set to true)                                                                                                                   |
-| `enableGenerateValidations`    | false   | Y    | If enabled, it will generate validations based on the data sources defined.                                                                                                                                               |
+| Config                         | Default | Paid | Description                                                                                                                                                                                                                 |
+|--------------------------------|---------|------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `enableGenerateData`           | true    | N    | Enable/disable data generation                                                                                                                                                                                              |
+| `enableCount`                  | true    | N    | Count the number of records generated. Can be disabled to improve performance                                                                                                                                               |
+| `enableFailOnError`            | true    | N    | Whilst saving generated data, if there is an error, it will stop any further data from being generated                                                                                                                      |
+| `enableSaveReports`            | true    | N    | Enable/disable HTML reports summarising data generated, metadata of data generated (if `enableSinkMetadata` is enabled) and validation results (if `enableValidation` is enabled). Sample [**here**](report/html-report.md) |
+| `enableSinkMetadata`           | true    | N    | Run data profiling for the generated data. Shown in HTML reports if `enableSaveSinkMetadata` is enabled                                                                                                                     |
+| `enableValidation`             | false   | N    | Run validations as described in plan. Results can be viewed from logs or from HTML report if `enableSaveSinkMetadata` is enabled. Sample [**here**](validation.md)                                                          |
+| `enableGeneratePlanAndTasks`   | false   | Y    | Enable/disable plan and task auto generation based off data source connections                                                                                                                                              |
+| `enableRecordTracking`         | false   | Y    | Enable/disable which data records have been generated for any data source                                                                                                                                                   |
+| `enableDeleteGeneratedRecords` | false   | Y    | Delete all generated records based off record tracking (if `enableRecordTracking` has been set to true)                                                                                                                     |
+| `enableGenerateValidations`    | false   | Y    | If enabled, it will generate validations based on the data sources defined.                                                                                                                                                 |
 
 === "Java"
 

diff --git a/docs/setup/connection.md b/docs/setup/connection.md
@@ -410,7 +410,7 @@ found [**here**](https://spark.apache.org/docs/latest/structured-streaming-kafka
 
 When defining your schema for pushing data to Kafka, it follows a specific top level schema.  
 An example can be
-found [**here**](https://github.com/pflooky/data-caterer-example/blob/main/docker/data/custom/task/kafka/kafka-account-task.yaml)
+found [**here**](https://github.com/data-catering/data-caterer-example/blob/main/docker/data/custom/task/kafka/kafka-account-task.yaml)
 . You can define the key, value, headers, partition or topic by following the linked schema.
 
 ### JMS

diff --git a/docs/setup/deployment.md b/docs/setup/deployment.md
@@ -8,7 +8,7 @@ Two main ways to deploy and run Data Caterer:
 ## Docker
 
 To package up your class along with the Data Caterer base image, you can follow
-the [Dockerfile that is created for you here](https://github.com/pflooky/data-caterer-example/blob/main/Dockerfile).
+the [Dockerfile that is created for you here](https://github.com/data-catering/data-caterer-example/blob/main/Dockerfile).
 
 Then you can run the following:
 
@@ -19,13 +19,13 @@ docker build -t <my_image_name>:<my_image_tag> .
 
 ## Helm
 
-[Link to sample helm on GitHub here](https://github.com/pflooky/data-caterer-example/tree/main/helm/data-caterer)
+[Link to sample helm on GitHub here](https://github.com/data-catering/data-caterer-example/tree/main/helm/data-caterer)
 
 Update
-the [configuration](https://github.com/pflooky/data-caterer-example/blob/main/helm/data-caterer/templates/configuration.yaml)
+the [configuration](https://github.com/data-catering/data-caterer-example/blob/main/helm/data-caterer/templates/configuration.yaml)
 to your own data connections and configuration or own image created from above.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 helm install data-caterer ./data-caterer-example/helm/data-caterer
 ```
diff --git a/docs/setup/guide/data-source/cassandra.md b/docs/setup/guide/data-source/cassandra.md
@@ -20,7 +20,7 @@ for the tables you configure.
 First, we will clone the data-caterer-example repo which will already have the base project setup required.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 ```
 
 If you already have a Cassandra instance running, you can skip to [this step](#plan-setup).
@@ -59,15 +59,15 @@ metadata information about tables and columns from the below tables.
 
 Create a new Java or Scala class.
 
-- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedCassandraJavaPlan.java`
-- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedCassandraPlan.scala`
+- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedCassandraJavaPlan.java`
+- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedCassandraPlan.scala`
 
 Make sure your class extends `PlanRun`.
 
 === "Java"
 
     ```java
-    import com.github.pflooky.datacaterer.java.api.PlanRun;
+    import io.github.datacatering.datacaterer.java.api.PlanRun;
 
     public class MyAdvancedCassandraJavaPlan extends PlanRun {
     }
@@ -76,7 +76,7 @@ Make sure your class extends `PlanRun`.
 === "Scala"
 
     ```scala
-    import com.github.pflooky.datacaterer.api.PlanRun
+    import io.github.datacatering.datacaterer.api.PlanRun
 
     class MyAdvancedCassandraPlan extends PlanRun {
     }
@@ -124,7 +124,7 @@ defined under`docker/data/cql/customer.cql`. This table should already be setup
 [step](#cassandra-setup). We can check if the table is setup already via the following command:
 
 ```shell
-docker exec host.docker.internal cqlsh -e 'describe account.accounts; describe account.account_status_history;'
+docker exec docker-cassandraserver-1 cqlsh -e 'describe account.accounts; describe account.account_status_history;'
 ```
 
 Here we should see some output that looks like the below. This tells us what schema we need to follow when generating

diff --git a/docs/setup/guide/data-source/http.md b/docs/setup/guide/data-source/http.md
@@ -25,7 +25,7 @@ Creating a data generator based on an [OpenAPI/Swagger](https://spec.openapis.or
 First, we will clone the data-caterer-example repo which will already have the base project setup required.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 ```
 
 ### HTTP Setup
@@ -44,15 +44,15 @@ docker ps
 
 Create a new Java or Scala class.
 
-- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedHttpJavaPlanRun.java`
-- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedHttpPlanRun.scala`
+- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedHttpJavaPlanRun.java`
+- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedHttpPlanRun.scala`
 
 Make sure your class extends `PlanRun`.
 
 === "Java"
 
     ```java
-    import com.github.pflooky.datacaterer.java.api.PlanRun;
+    import io.github.datacatering.datacaterer.java.api.PlanRun;
     ...
 
     public class MyAdvancedHttpJavaPlanRun extends PlanRun {
@@ -66,7 +66,7 @@ Make sure your class extends `PlanRun`.
 === "Scala"
 
     ```scala
-    import com.github.pflooky.datacaterer.api.PlanRun
+    import io.github.datacatering.datacaterer.api.PlanRun
     ...
 
     class MyAdvancedHttpPlanRun extends PlanRun {
@@ -266,7 +266,7 @@ want to alter this value, you can do so via the below configuration. The lowest
 === "Java"
 
     ```java
-    import com.github.pflooky.datacaterer.api.model.Constants;
+    import io.github.datacatering.datacaterer.api.model.Constants;
 
     ...
     var httpTask = http("my_http", Map.of(Constants.ROWS_PER_SECOND(), "1"))
@@ -276,7 +276,7 @@ want to alter this value, you can do so via the below configuration. The lowest
 === "Scala"
 
     ```scala
-    import com.github.pflooky.datacaterer.api.model.Constants.ROWS_PER_SECOND
+    import io.github.datacatering.datacaterer.api.model.Constants.ROWS_PER_SECOND
 
     ...
     val httpTask = http("my_http", options = Map(ROWS_PER_SECOND -> "1"))

diff --git a/docs/setup/guide/data-source/kafka.md b/docs/setup/guide/data-source/kafka.md
@@ -20,7 +20,7 @@ for the topics you configure.
 First, we will clone the data-caterer-example repo which will already have the base project setup required.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 ```
 
 If you already have a Kafka instance running, you can skip to [this step](#plan-setup).
@@ -39,15 +39,15 @@ docker-compose up -d kafka
 
 Create a new Java or Scala class.
 
-- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedKafkaJavaPlan.java`
-- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedKafkaPlan.scala`
+- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedKafkaJavaPlan.java`
+- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedKafkaPlan.scala`
 
 Make sure your class extends `PlanRun`.
 
 === "Java"
 
     ```java
-    import com.github.pflooky.datacaterer.java.api.PlanRun;
+    import io.github.datacatering.datacaterer.java.api.PlanRun;
 
     public class MyAdvancedKafkaJavaPlan extends PlanRun {
     }
@@ -56,7 +56,7 @@ Make sure your class extends `PlanRun`.
 === "Scala"
 
     ```scala
-    import com.github.pflooky.datacaterer.api.PlanRun
+    import io.github.datacatering.datacaterer.api.PlanRun
 
     class MyAdvancedKafkaPlan extends PlanRun {
     }

diff --git a/docs/setup/guide/data-source/marquez-metadata-source.md b/docs/setup/guide/data-source/marquez-metadata-source.md
@@ -19,7 +19,7 @@ follows [OpenLineage API](https://openlineage.io/)).
 First, we will clone the data-caterer-example repo which will already have the base project setup required.
 
 ```shell
-git clone [email protected]:pflooky/data-caterer-example.git
+git clone [email protected]:data-catering/data-caterer-example.git
 ```
 
 ### Marquez Setup
@@ -48,15 +48,15 @@ docker exec marquez-db psql -Upostgres -c 'CREATE DATABASE food_delivery'
 
 Create a new Java or Scala class.
 
-- Java: `src/main/java/com/github/pflooky/plan/MyAdvancedMetadataSourceJavaPlanRun.java`
-- Scala: `src/main/scala/com/github/pflooky/plan/MyAdvancedMetadataSourcePlanRun.scala`
+- Java: `src/main/java/io/github/datacatering/plan/MyAdvancedMetadataSourceJavaPlanRun.java`
+- Scala: `src/main/scala/io/github/datacatering/plan/MyAdvancedMetadataSourcePlanRun.scala`
 
 Make sure your class extends `PlanRun`.
 
 === "Java"
 
     ```java
-    import com.github.pflooky.datacaterer.java.api.PlanRun;
+    import io.github.datacatering.datacaterer.java.api.PlanRun;
     ...
 
     public class MyAdvancedMetadataSourceJavaPlanRun extends PlanRun {
@@ -70,7 +70,7 @@ Make sure your class extends `PlanRun`.
 === "Scala"
 
     ```scala
-    import com.github.pflooky.datacaterer.api.PlanRun
+    import io.github.datacatering.datacaterer.api.PlanRun
     ...
 
     class MyAdvancedMetadataSourcePlanRun extends PlanRun {