Naive Bayes Classification
Probability of A if B:
Basic Probability Examples
Probability of snow if winter:
$$P(\text{snow in winter} | \text{winter}) = \frac{\text{Days with snow in winter}}{\text{Days in winter}} =
\frac{15}{90} =
16%$$
Probability of snow if summer:
$$P(\text{snow in summer} | \text{summer}) = \frac{\text{Days with snow in summer}}{\text{Days in summer}} =
\frac{0}{90} =
0%$$
Probability of winter if it snows:
$$P(\text{snow in winter} | \text{snow all year}) = \frac{\text{Days with snow in winter}}{\text{Days with snow all
year}} =
\frac{15}{17} =
88%$$
Notes:
Audience questions
Basic Probability Examples
Probability that it will snow:
$$P(\text{snow}) = \frac{\text{Days with snow per year}}{\text{Days in the year}} = \frac{17}{365} =
5%$$
Probability of winter:
$$P(\text{winter}) = \frac{\text{Days in winter per year}}{\text{Days in the year}} = \frac{90}{365} =
25%$$
Notes:
Audience questions
$$\begin{aligned}
P(A | B) &= \frac{P(A) \times P(B | A)}{P(B)}\\
\\
P(\text{winter} | \text{snow}) &= \frac{P(\text{winter}) \times P(\text{snow} | \text{winter})}{P(\text{snow})}\\
\\
&= \frac{0.25 \times 0.16}{0.05} = 0.8\\
\\
&\approx \frac{\text{Days with snow in winter}}{\text{Days with snow all year}} = \frac{15}{17} = 0.88
\end{aligned}$$
Bayes for text classification
Probability that documents belongs to $class$ if document contains
$term$ :
$$P(class | term)$$
Probability of spam if mail contains $\text{viagra}$ :
$$P(\text{spam} | \text{viagra})$$
Interactive Bayes Classificator
Probability of Spam if document
I have this document, is it Spam?
(Using the extended form of Bayes theorem because spam / ham is a binary variable)
$$\begin{aligned}
P(\text{spam} | doc) &= \frac{P(\text{spam}) \times P(doc | \text{spam})}{P(\text{spam}) \times P(doc | \text{spam}) + P(\text{ham}) \times P(doc | \text{ham})}\\\
\end{aligned}$$
Spam is one-of class membership so
$$P(\text{spam} | doc) + P(\text{ham} | doc) = 100%$$
What percentage of all mails is Spam?
$$P(\text{spam}) = \frac{\text{Number of spam mails}}{\text{Total number of mails}}$$
$$P(\text{spam}) + P(\text{ham}) = 100%$$
Probability of document if Spam
Looking at Spam, what are the chances to see this document?
\begin{aligned}
\\\\
P(doc | \text{spam}) &= \prod_{\text{term} \in \text{doc}} P(term | \text{spam}) = P(t_1 | \text{spam}) \times P(t_2 | \text{spam}) \times …
\end{aligned}
---
# Probability of term if Spam
Looking at Spam, what are the chances to see this term?
\begin{aligned}
\\\\
P(term | \text{spam}) &= \frac{\text{Document frequency of }term\text{ in Spam}}{\text{Number of Spam mails}}
\end{aligned}
Notes:
---
# Probability for unknown words
$P(\text{unknown spam word} | \text{spam}) = \frac{0}{\text{Number of Spam mails}} = 0$
so
$P(doc | \text{spam}) = P(\text{viagra} | \text{spam}) \times P(\text{unknown spam word} | \text{spam}) \times
\text{…} = 0$
But the probability should be $> 0%$ .
\begin{aligned}
P(term | \text{spam}) &= \text{Probability of mail containing }term\text{ if it is Spam}\\\\
&= \frac{\text{Document frequency of }term\text{ in Spam} + 1}{\text{Number of Spam mails} + 2}
\end{aligned}
Constants are added so that $0% < \text{probability} < 100%$ .
What is the percentage of mails containing this term?
$$P(term) = \frac{\text{Document frequency of }term + 1}{\text{Total number of mails} + 2}$$
Probability of spam if term
I have this term, is it Spam?
$$P(\text{spam} | term) = \frac{P(\text{spam}) \times P(term | \text{spam})}{P(\text{spam}) \times P(term |
\text{spam}) + P(\text{ham}) \times P(term | \text{ham})}$$
$$P(\text{spam} | term) + P(\text{ham} | term) = 100%$$
Notes:
How to calculate these probabilities?
If $P(\text{spam} | content) > P(\text{ham} | content)$ then mail is spam, so:
$$\text{Spam}: \frac{P(\text{spam} | content)}{P(\text{ham} | content)} > 1$$
Naive = Assumes features are independent
Features: ("rolex", "replica") are not independent
Predicts probability for class, not class
Robust to concept drift ("viagra" → "cialis")
Robust to noise ("unusual" documents)
Efficient and effective
Notes:
Example for dependent features?