bitcoin_fraud_detection
is a project aimed at detecting fraudulent Bitcoin transactions using Graph Convolutional Networks (GCNs). The project leverages the Elliptic dataset and combines the strengths of C++ for data preprocessing and Python for implementing and training the GCN model. This hybrid approach ensures efficient data handling and powerful machine learning capabilities.
- Data Preprocessing in C++: Efficient parsing and cleaning of transaction data.
- Graph Construction: Creation of a transaction graph using NetworkX.
- Graph Neural Network (GNN): Implementation of a GNN using PyTorch Geometric for fraud detection.
- Visualization: Visualization of transaction graphs and model performance metrics using Plotly.
bitcoin_fraud_detection/
│
├── data/
│ ├── filtered/
│ │ ├── filtered_classes.csv
│ │ ├── filtered_edgelist.csv
│ │ └── filtered_features.csv
│ └── unfiltered/
│ ├── elliptic_txs_classes.csv
│ ├── elliptic_txs_edgelist.csv
│ └── elliptic_txs_features.csv
│
├── src/
│ ├── data_preprocessing.cpp
│ └── CMakeLists.txt
│
├── training/
│ ├── data_preparation.ipynb
│ ├── gcn_model_weights.pth
│ ├── graph_data.pt
│ └── training.ipynb
│
├── visualization/
│ ├── data_plot.png
│ ├── data_predictions_plot.png
│ └── data_visualization.ipynb
│
├── README.md
└── LICENSE
- Compile the C++ Code:
cd src mkdir build cd build cmake .. make ./data_preprocessing
-
Install Python Dependencies:
cd training pip install -r requirements.txt
-
Required Libraries:
- torch
- torch-geometric
- pandas
- matplotlib
- scipy
- networkx
- plotly
Navigate to the src
directory and run the data preprocessing script.
cd src/build
./data_preprocessing
This will generate filtered datasets in the data/filtered/
directory using the data/unfiltered/
directory. You might need to manually paste the data from Kaggle to data/unfiltered/
, due to size limitations on Github.
Navigate to the training
directory and run the training.ipynb
notebook.
cd training
jupyter notebook training.ipynb
This will train the GNN model and save the model weights to gcn_model_weights.pth
.
Navigate to the visualization
directory and run the data_visualization.ipynb
notebook.
cd visualization
jupyter notebook data_visualization.ipynb
- Training: The GNN model can be trained using the
training.ipynb
notebook. Adjust hyperparameters as needed within the notebook. - Visualization: Use the
data_visualization.ipynb
notebook to generate visualizations of the transaction graph and model performance metrics.
In this model, each node aggregates information from its first-order neighbors in both GCN layers. Although the second GCN layer also considers first-order neighbors, these neighbors' features have already been influenced by their own neighbors in the previous layer. This way, each node indirectly incorporates second-order neighbor information as well. However, the direct aggregation occurs only from first-order neighbors in each GCN layer.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please fork the repository and submit a pull request for any enhancements or bug fixes.
- The Elliptic dataset: Kaggle
- PyTorch Geometric: PyTorch Geometric
For any questions or suggestions, please open an issue or contact the project maintainers.