This project implements a Naive Bayes Classifier from scratch for categorical datasets, using various approaches including single-threaded, multi-threaded, and distributed computing. The classifier predicts class labels based on input features and supports efficient parallel computation.
Uses Python's multiprocessing and threading libraries to distribute computation across multiple threads and processes. Demonstrates parameter estimation using counters and logical operations for efficient computations.
Leverages mpi4py for distributed computation. Splits data across multiple processes to compute probabilities independently and aggregates the results using MPI communication.
A basic implementation of the Naive Bayes algorithm. Provides a clean and simple approach to training and predicting class labels for categorical datasets.
-
Install necessary packages
sudo apt-get -y update && sudo apt-get -y install python3 git python3-pip python3-venv libmpich-dev
-
Create virtual environment
python3 -m venv /venv
-
Install pip packages
/venv/bin/pip install numpy pandas mpi4py
-
Clone the repository
git clone https://github.com/AvinashSubhash/Distributed-Naive-Bayes-Classifier.git /NBC
-
Run the bash script
./Output.sh