Final Project: Fraud Transaction Pipeline

Data

You can download the dataset here

Bussiness Understanding

Problem Description

A Digital Wallet company has quite a large amount of online transaction data. The company wants to acknowledging data limitation and uncertainties such as inaccurate or missing crucial information data. On the other hand, the company also wants to use online transaction data to detect online payment fraud that harms their business.

Goals

Create a data pipeline that can be utilised for analysis and reporting to determine whether online transaction data has excellent data quality and can be used to detect fraud in online transactions.

Project Objectives

Create an automated pipeline that facilitates the batch and stream data processing from various data sources to data warehouses and data mart.
Create a visualization dashboard to obtain meaningful insights from the data, enabling informed business decisions.

Pipeline Architecture

Image 1. Pipeline Architecture

Tools

Orchestration: Airflow
Tranformation: Spark, dbt
Streaming: Kafka
Container: Docker
Storage: Google Cloud Storage
Warehouse: BigQuery
Data Visualization: Looker

Project Instruction

Clone this repository and enter the directory

git clone https://github.com/graceyudhaaa/final-project-fraud-transaction-pipeline.git && cd final-project-fraud-transaction-pipeline

Create a folder named service-account Create a GCP project. Then, create a service account with Editor role. Download the JSON credential rename it to service-account.json and store it on the service-account folder.

Cloud Resource Provisioning with Terraform

Install Terraform CLI
Change directory to terraform by executing
```
cd terraform
```
Initialize Terraform (set up environment and install Google provider)
```
terraform init
```
Create new infrastructure by applying Terraform plan
```
terraform apply
```
Check your GCP project for newly created resources (GCS Bucket and BigQuery Datasets)

Manually Create Resources

Alternatively you can create the resources manually:

Create a GCS bucket named final-project-lake, set the region to asia-southeast2
Create two datasets in BigQuery named onlinetransaction_wh and onlinetransaction_stream

Batch Processing

Streaming Processing

Enter the directory streaming pipeline

cd kafka

Create streaming pipeline with Docker Compose

docker-compose up

Install required Python packages

pip install -r requirements.txt

Setup Email for Notification

Copy the env.example file, rename it to .env
Fill the required information for the sender and receiver email

Run the producer to stream the data into the Kafka topic

python producer.py

Run the consumer to consume the data from Kafka topic and load them into BigQuery

python consumer.py

Data will be loaded into the record table for all transactions in BigQuery, and if any data is detected as fraud, it will be recorded in the detected_fraud table, and an automatic email notification indicating fraud will be sent.

Image 2. Streaming Process <br

Image 3. Fraud Detected Table

Image 4. Email Notification from Data that Detected Fraud

DEBUGGING: Schema Registry Exited

If you run into a problem where, the schema registry image was exited. with the message

INFO io.confluent.admin.utils.ClusterStatus - Expected 1 brokers but found only 0. Trying to query Kafka for metadata again

You might want to reset your firewall with running this on your command line with administrator permission

iisreset

Data Warehouse

In this project, we use star-schema to define the data warehouse. In the warehouse there are several tables, namely:
a. Dim Type
b. Dim Origin
c. Dim Dest
d. Fact Transaction

Here is the data warehouse schema that we developed. Image 5. Data Warehouse Schema

Analytic and Visualization

The outcome of this comprehensive data pipeline project is a dashboard that allows someone get insight for fraudulent transaction.

Our dashboard through the following link: Online Transaction Fraud Dashboard

Image 6. Dashboard Visualization

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
dags		dags
dbt		dbt
images		images
kafka		kafka
notebook		notebook
spark		spark
terraform		terraform
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
docker-compose.yml		docker-compose.yml
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Final Project: Fraud Transaction Pipeline

Data

Bussiness Understanding

Problem Description

Goals

Project Objectives

Pipeline Architecture

Tools

Project Instruction

Clone this repository and enter the directory

Cloud Resource Provisioning with Terraform

Manually Create Resources

Batch Processing

Streaming Processing

Enter the directory streaming pipeline

Create streaming pipeline with Docker Compose

Install required Python packages

Setup Email for Notification

Run the producer to stream the data into the Kafka topic

Run the consumer to consume the data from Kafka topic and load them into BigQuery

DEBUGGING: Schema Registry Exited

Data Warehouse

Analytic and Visualization

About

Releases

Packages

Languages

widiarsaf/final-project-fraud-transaction-pipeline

Folders and files

Latest commit

History

Repository files navigation

Final Project: Fraud Transaction Pipeline

Data

Bussiness Understanding

Problem Description

Goals

Project Objectives

Pipeline Architecture

Tools

Project Instruction

Clone this repository and enter the directory

Cloud Resource Provisioning with Terraform

Manually Create Resources

Batch Processing

Streaming Processing

Enter the directory streaming pipeline

Create streaming pipeline with Docker Compose

Install required Python packages

Setup Email for Notification

Run the producer to stream the data into the Kafka topic

Run the consumer to consume the data from Kafka topic and load them into BigQuery

DEBUGGING: Schema Registry Exited

Data Warehouse

Analytic and Visualization

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages