Naive Bayes Classifier from scratch in python3
What is Naive Bayes algorithm?
Naive Bayes is a classification technique based on Bayes’ Theorem(Probability theory) with an assumption that all the features that predicts the target value are independent of each other. In simple terms, a Naive Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature in determining the target value.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
A custom implementation of a Naive Bayes Classifier written from scratch in Python 3.
From Wikipedia:
In machine learning, naive Bayes classifiers are a family of simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independence assumptions between the features.
Bayes theorem provides a way of calculating posterior probability P(c|x) - (read as Probability of c given x), from P(c), P(x) and P(x|c). Look at the equation below:
$\mathbf{P} \left({x \mid c} \right) = \frac{\mathbf{P} \left ({c \mid x} \right) \mathbf{P} \left({c} \right)}{\mathbf{P} \left( {x} \right)}$
where,
- x is set of features
- c is set of classes
- P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
- P(c) is the prior probability of class c.
- P(x|c) is the observation density or likelihood which is the probability of predictor(the query x) given class.
- P(x) is the prior probability of predictor x, and it is also called as Evidence.
Why should we use Naive Bayes ?
- As stated above, It is easy to build and is particularly useful for very large data sets.
- It is extremely fast for both training and prediction.
- It provide straightforward probabilistic prediction.
- It is often very easily interpretable.
- It has very few (if any) tunable parameters.
- It perform well in case of categorical input variables compared to numerical variable(s). For numerical variable, normal distribution is assumed (bell curve, which is a strong assumption).
it is generally used for dimensionality reduction... In this data we are using PCA for finding important features which have more effect/weightage on finding posterior probability
By Applying PCA on our dataset, important features are 2+0, 2+1 i.e. 2, 3 index of our data which is radius_mean and texture_mean
Now split our Data into training and testing set
Posterior Conditional Probability ,
where,
And Mathematical Formula of Observation Distribution is
where,
-
$ \textstyle\sum $ is a covariance matrix
-
A is a vector which contains $ A= \left [ {\begin{array}{c} R_i - Mean(radius_mean) \ T_i - Mean(texture_mean) \ \end{array} } \right] $