Bigdata Pipeline

In this repository, I deployed a simple data pipeline for processing, storing, and visualizing data. Hands-on Big Data solution with batch processing and stream processing.

Architecture diagram

📖Overview

Capture: Restful API, Flat file about stock
Ingest: Kafka
Store: Hadoop Hive
Compute: Spark, Flink, Trino
Visualize: Superset
Workflow:: Airflow
Container Orchestration: Docker

🛠️ Quick Start:

To deploy a cluster, run:

docker compose up
bash script/setup.sh

Spark calculates some returns like simple return, log return, and cumulative return.

Use Superset to visualize the report for Business Insight. Please connect to Trino with Superset:

Example:

Update:

~~Build stream processing~~
~~Build batch processing~~
Integrate Airflow
Integrate multisource

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Bigdata Pipeline

Architecture diagram

📖Overview

🛠️ Quick Start:

Update:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Bigdata Pipeline

Architecture diagram

📖Overview

🛠️ Quick Start:

Update: