This project provides a streamlined process for synchronizing schema and data from PostgreSQL to Kinetica using GitHub Actions. The synchronization process is performed in three steps: Schema Migration using RosettaDB, Data Load using KiSQL, and Data Quality Check using RosettaDB.
- Schema Migration: This step identifies and applies schema changes (deltas) from PostgreSQL to Kinetica.
- Data Load: The data from PostgreSQL is loaded into Kinetica using KiSQL.
- Data Quality Check: After loading, RosettaDB is used to perform data quality checks to ensure the integrity and accuracy of the data.
First, fork this repository to your GitHub account.
To use this project in your environment, update the repository secrets and variables in your GitHub repository settings. The following secrets and variables need to be set:
POSTGRES_HOST
: The hostname of your PostgreSQL server.POSTGRES_DB
: The name of your PostgreSQL database.POSTGRES_USER
: The username for accessing your PostgreSQL database.POSTGRES_PASSWORD
: The password for accessing your PostgreSQL database.SOURCE_SCHEMA
: The schema for the PostgresDBKINETICA_URL
: The URL of your Kinetica server.KINETICA_USER
: The username for accessing your Kinetica database.KINETICA_PASSWORD
: The password for accessing your Kinetica database.KINETICA_DB
: The name of your Kinetica database.TARGET_SCHEMA
: The schema for the KineticaGITHUBTOKEN
: The github token that has permission to your repo.
Once the repository secrets and variables are set, you can run the pipeline through GitHub Actions:
- Go to the Actions tab in your GitHub repository.
- Select the workflow you want to run.
- Click on Run workflow to start the synchronization process.
The project contains the following components:
- rosetta: Contains the RosettaDB scripts and configuration.
- jdbc-drivers: Directory with the required JDBC drivers for PostgreSQL and Kinetica.
- kisql: Directory with KiSQL.
- custom-translation-file: Custom translation file used during the schema translation.
- tests: CSV file that contains tests for performing data quality checks.
- GitHub account
- PostgreSQL database
- Kinetica database
- RosettaDB
- Kinetica, KiSQL
- Basic knowledge of GitHub Actions