FrostyBridge is a Python solution designed to export entire PostgreSQL databases to various storage systems in open data formats. It enables efficient and reliable data migration for analytical purposes.
FrostyBridge can be used to extract an entire Postgres database to an Iceberg data lake.
- Full Database Export: Seamlessly export all tables from a PostgreSQL database to supported storage systems, including local, S3, ADL, and GCS.
- Multiple Output Formats Support for multiple output formats, including Parquet, CSV, Iceberg, Feather, ORC, and IPC.
- Arrow Power: Utilizes Apache Arrow for efficient data processing, providing high performance and memory efficiency.
- Python 3.7 or later
- A PostgreSQL database
- Required Python packages: Install them with
pip install -r requirements.txt
- Clone the repository:
git clone https://github.com/TFMV/FrostyBridge.git
- Install required packages:
pip install -r requirements.txt
- Configure
config.yaml
:
Edit the config/config.yaml
file with your PostgreSQL and output details
Run the following command to export the PostgreSQL database to GCS:
./cli.sh
- Start the API server:
./api.sh
- Send an API request to trigger the export.
- Build the Docker image:
docker build -t frostybridge .
- Run the Docker container:
docker run -v /path/to/config:/app/config -v /path/to/local/parquet/files:/app/parquet-files frostybridge
This project is licensed under the MIT License. See the LICENSE
file for details.
Thomas F McGeehan V