As a part of this tutorial, we will see how to transfer the relational model to the data vault model using Kolle without writing any single line of code.
For the purpose and details of data vault model will not cover in this tutorial, please ask chatGPT and looks on Wikipedia.
Domain: Insurance policy
Source data: Datasets
Relational model -> data contract -> Data vault model
-
Importing source models from semi structure document datasets
-
Flatten semi structure model
-
Remove duplicate data from sources data
-
Data profiling on the raw data
-
Apply data contract for data quality i.e selection, typecast, enrichment, reference data integration, etc
5a. Good data will move to refined model
5b. Bad data will move to refined error model
-
Data profiling on refined data
-
Apply patten to convert refined to data vault model automatically
-
View data as data vault model
- Json as a source for document data
- Kafka for event streaming to ingest and process data in real-time
- Postgres as a target for data vault storage
- Kolle for metadata repository and automation