This repository contains Apache Airflow DAG files, created and maintained by the NIHR Imperial BRC Genomics Facility, for the processing of genomics sequencing datasets.
How to setup Airflow on HPC? To set up Apache Airflow on an HPC cluster, refer to this blog post
- Docker
- Python3.9 (for HPC workers)
- Clone the repository
git clone https://github.com/imperial-genomics-facility/igf-airflow-hpc.git
- Build custom docker image
cd igf-airflow-hpc/docker/airflow
docker build -t airflow:v2.6.2 .
- Build PGBouncer image
cd igf-airflow-hpc/docker/pgbouncer
docker build -t pgbouncer:v1.19.0 .
-
Setup environment files
Copy following template files:
- igf-airflow-hpc/docker/env_template/airflow_db_env_template
- igf-airflow-hpc/docker/env_template/airflow_env_template
-
Setup PGBouncer connections and start Airflow
- Follow these instructions: README
-
Get python core library
git clone https://github.com/imperial-genomics-facility/data-management-python.git
- Clone repository and install Python environment
git clone https://github.com/imperial-genomics-facility/igf-airflow-hpc.git
git clone https://github.com/imperial-genomics-facility/data-management-python.git
pip install -r requirements_2.6.2.txt # For compatibility with Apache Airflow v2.6.2
export PYTHONPATH=/PATH/data-management-python
This project is licensed under the Apache-2.0 License. See the LICENSE file for details.