Bigdata Pipeline

In this repository, I deployed a simple data pipeline for processing, storing, and visualizing data. Hands-on Big Data solution with batch processing and stream processing.

Architecture diagram

📖Overview

Capture: Restful API, Flat file about stock
Ingest: Kafka
Store: Hadoop Hive
Compute: Spark, Flink, Trino
Visualize: Superset
Workflow:: Airflow
Container Orchestration: Docker

🛠️ Quick Start:

To deploy a cluster, run:

docker compose up
bash script/setup.sh

Spark calculates some returns like simple return, log return, and cumulative return.

Use Superset to visualize the report for Business Insight. Please connect to Trino with Superset:

Example:

Update:

~~Build stream processing~~
~~Build batch processing~~
Integrate Airflow
Integrate multisource

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
data		data
flink		flink
hadoop		hadoop
hive		hive
image		image
kafka		kafka
script		script
spark		spark
superset		superset
trino		trino
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
flink.env		flink.env
hadoop-hive.env		hadoop-hive.env
hadoop.env		hadoop.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bigdata Pipeline

Architecture diagram

📖Overview

🛠️ Quick Start:

Update:

About

Releases

Packages

Contributors 2

Languages

dxpnkil/bigdata-pipeline

Folders and files

Latest commit

History

Repository files navigation

Bigdata Pipeline

Architecture diagram

📖Overview

🛠️ Quick Start:

Update:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages