Skip to content

Commit

Permalink
Add in YAML and UI for ODCS and Iceberg
Browse files Browse the repository at this point in the history
  • Loading branch information
pflooky committed Jun 11, 2024
1 parent 484362e commit 911c82b
Show file tree
Hide file tree
Showing 4 changed files with 754 additions and 219 deletions.
Binary file modified docs/sample/report/report_screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
84 changes: 80 additions & 4 deletions docs/setup/guide/data-source/file/iceberg.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Create a new Java or Scala class.

- Java: `src/main/java/io/github/datacatering/plan/MyIcebergJavaPlan.java`
- Scala: `src/main/scala/io/github/datacatering/plan/MyIcebergPlan.scala`
- YAML: `docker/data/customer/plan/my-iceberg.yaml`

Make sure your class extends `PlanRun`.

Expand All @@ -68,6 +69,22 @@ Make sure your class extends `PlanRun`.
}
```

=== "YAML"

In `docker/data/custom/plan/my-iceberg.yaml`:
```yaml
name: "my_iceberg_plan"
description: "Create account data in Iceberg table"
tasks:
- name: "iceberg_account_table"
dataSourceName: "customer_accounts"
enabled: true
```

=== "UI"

Check next section.

This class defines where we need to define all of our configurations for generating data. There are helper variables and
methods defined to make it simple and easy to use.

Expand Down Expand Up @@ -105,6 +122,30 @@ Within our class, we can start by defining the connection properties to read/wri

Additional options can be found [**here**](https://iceberg.apache.org/docs/1.5.0/spark-configuration/#catalog-configuration).

=== "YAML"

In `application.conf`:
```
iceberg {
customer_accounts {
path = "/opt/app/data/customer/iceberg"
path = ${?ICEBERG_WAREHOUSE_PATH}
catalogType = "hadoop"
catalogType = ${?ICEBERG_CATALOG_TYPE}
catalogUri = ""
catalogUri = ${?ICEBERG_CATALOG_URI}
}
}
```

=== "UI"

1. Go to `Connection` tab in the top bar
2. Select data source as `Iceberg`
1. Enter in data source name `customer_accounts`
2. Select catalog type `hadoop`
3. Enter warehouse path as `/opt/app/data/customer/iceberg`

#### Schema

Depending on how you want to define the schema, follow the below:
Expand Down Expand Up @@ -139,15 +180,50 @@ have unique values generated.
execute(myPlan, config, accountTask, transactionTask)
```

=== "YAML"

In `application.conf`:
```
flags {
enableUniqueCheck = true
}
folders {
generatedReportsFolderPath = "/opt/app/data/report"
}
```

=== "UI"

1. Click on `Advanced Configuration` towards the bottom of the screen
2. Click on `Flag` and click on `Unique Check`
3. Click on `Folder` and enter `/tmp/data-caterer/report` for `Generated Reports Folder Path`

### Run

Now we can run via the script `./run.sh` that is in the top level directory of the `data-caterer-example` to run the class we just
created.

```shell
./run.sh
#input class MyIcebergJavaPlan or MyIcebergPlan
```
=== "Java"

```shell
./run.sh MyIcebergJavaPlan
```

=== "Scala"

```shell
./run.sh MyIcebergPlan
```

=== "YAML"

```shell
./run.sh my-iceberg.yaml
```

=== "UI"

1. Click on `Execute` at the top

Congratulations! You have now made a data generator that has simulated a real world data scenario. You can check the
`IcebergJavaPlan.java` or `IcebergPlan.scala` files as well to check that your plan is the same.
Expand Down
Loading

0 comments on commit 911c82b

Please sign in to comment.