Skip to content

Fraud detection in Bitcoin transactions using GCNs. Uses C++ for data preprocessing and Python for training. Features include data processing, graph creation, and visualization with PyTorch Geometric.

License

Notifications You must be signed in to change notification settings

vdrvar/bitcoin_fraud_detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fraud Detection in Bitcoin Transactions using Graph Convolutional Networks (GCNs)


Project Description

bitcoin_fraud_detection is a project aimed at detecting fraudulent Bitcoin transactions using Graph Convolutional Networks (GCNs). The project leverages the Elliptic dataset and combines the strengths of C++ for data preprocessing and Python for implementing and training the GCN model. This hybrid approach ensures efficient data handling and powerful machine learning capabilities.


Features

  • Data Preprocessing in C++: Efficient parsing and cleaning of transaction data.
  • Graph Construction: Creation of a transaction graph using NetworkX.
  • Graph Neural Network (GNN): Implementation of a GNN using PyTorch Geometric for fraud detection.
  • Visualization: Visualization of transaction graphs and model performance metrics using Plotly.

Project Structure

bitcoin_fraud_detection/
│
├── data/
│   ├── filtered/
│   │   ├── filtered_classes.csv
│   │   ├── filtered_edgelist.csv
│   │   └── filtered_features.csv
│   └── unfiltered/
│       ├── elliptic_txs_classes.csv
│       ├── elliptic_txs_edgelist.csv
│       └── elliptic_txs_features.csv
│
├── src/
│   ├── data_preprocessing.cpp
│   └── CMakeLists.txt
│
├── training/
│   ├── data_preparation.ipynb
│   ├── gcn_model_weights.pth
│   ├── graph_data.pt
│   └── training.ipynb
│
├── visualization/
│   ├── data_plot.png
│   ├── data_predictions_plot.png
│   └── data_visualization.ipynb
│
├── README.md
└── LICENSE

Setup Instructions

C++ Environment Setup

  1. Compile the C++ Code:
    cd src
    mkdir build
    cd build
    cmake ..
    make
    ./data_preprocessing  
    

Python Environment Setup

  1. Install Python Dependencies:

    cd training
    pip install -r requirements.txt
  2. Required Libraries:

    • torch
    • torch-geometric
    • pandas
    • matplotlib
    • scipy
    • networkx
    • plotly

Running the Project

1. Data Preprocessing (C++):

Navigate to the src directory and run the data preprocessing script.

cd src/build
./data_preprocessing

This will generate filtered datasets in the data/filtered/ directory using the data/unfiltered/ directory. You might need to manually paste the data from Kaggle to data/unfiltered/, due to size limitations on Github.

2. Training the GNN Model (Python):

Navigate to the training directory and run the training.ipynb notebook.

cd training
jupyter notebook training.ipynb

This will train the GNN model and save the model weights to gcn_model_weights.pth.

3. Visualizing the Results (Python):

Navigate to the visualization directory and run the data_visualization.ipynb notebook.

cd visualization
jupyter notebook data_visualization.ipynb

Visualizations

Data Plot

Data Plot

Data Predictions Plot

Data Predictions Plot

Usage

  • Training: The GNN model can be trained using the training.ipynb notebook. Adjust hyperparameters as needed within the notebook.
  • Visualization: Use the data_visualization.ipynb notebook to generate visualizations of the transaction graph and model performance metrics.

Conclusion

In this model, each node aggregates information from its first-order neighbors in both GCN layers. Although the second GCN layer also considers first-order neighbors, these neighbors' features have already been influenced by their own neighbors in the previous layer. This way, each node indirectly incorporates second-order neighbor information as well. However, the direct aggregation occurs only from first-order neighbors in each GCN layer.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.

Acknowledgments

Contact

For any questions or suggestions, please open an issue or contact the project maintainers.

About

Fraud detection in Bitcoin transactions using GCNs. Uses C++ for data preprocessing and Python for training. Features include data processing, graph creation, and visualization with PyTorch Geometric.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published