Introduction to probability theory

Erika Duan 2/26/23

Introduction to probability
- Scenario 1
- Scenario 2
Set notations
- The power set
Set operations
General rules of probability
Acknowledgements

Introduction to probability

Probability is a numerical measure of how likely an event is to occur. The frequentist approach considers probability as the long term outcome of a large sampling experiment.
Probability can also be thought of as the size of a mathematical set , which can also be visualised in 2D as the occupation of a rectangle .

In probability theory:

The random experiment is a process that results in several possible outcomes, none of which are certain to occur. For example, a fair dice is rolled and the number on its top face is observed.
The sample point is one possible distinct outcome of the experiment. For example, a sample point is 1.
The sample space is the set of all possible outcomes. For example, $S = \{1, 2, 3, 4, 5, 6 \}$ as the events are mutually exclusive.
A simple event is a subset which only contains one sample point. For example, a simple event is A = {1}.
An event is any possible subset of the sample space. For example, an event is any number greater than 3 i.e. B = {4, 5, 6}.

To calculate the probability of an event, we can count all the ways that an event could have occurred out of all possibilities generated.

Scenario 1

Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice equals 5?

We know that the outcome of one dice throw is independent of the other.
We need to calculate all possible combinations of two independent dice rolls i.e. the sample space. As there are 6 faces on one dice and we are rolling two dice, the sample space is possible outcomes.
We then calculate all possible dice roll events which sum to 5. This is the total number of times event could have occurred.
The probability of event occurring is therefore the proportion of the number of times could have occurred over the sample space i.e. $P(E) = \frac{4}{36} \approx 0.11$ .

Scenario 2

Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice is less than 5 and an odd number?

The sample space is still the same, as the total number of possible dice roll combinations is fixed.
The event subset has changed as we are interested in the intersection of $E_1\subset\{2, 3, 4\}$ and $E_2\subset\{2, 4, 6\}$ i.e. $E=E_1\cap E_2=\{2, 4\}$ .
In this scenario, the probability that the sum of two dice is $\{2,4\}$ is $\frac{4}{36}$ or approximately 0.11.

Set notations

Sets are used to denote object belonging under a specific condition. The statement “the set of elements in the space such that condition holds” is represented by the notation $S = \{x\in X\vert f(x)>0\}$ .

Examples of sets include:

The set $A_1 = \{x\in\mathbb{N}\vert 1\leq x\leq 5\}$ is a finite set with a finite closed interval on the set of all natural numbers.
The set $A_2 = \{x\in\mathbb{R}\vert -5\leq x\leq 1\}$ is an infinite set with a finite closed interval on the set of all real numbers.
The set $F = \{f:\mathbb{R}\to\mathbb{R}\vert f(x)=ax+b,\;a,b\in\mathbb{R}\}$ is an infinite set of all straight lines in 2D as takes the specific form .

In probability theory, the event can be viewed as a subset within the set of the sample space, where the total number of possible event types (or total possible event combinations) is represented by the power set of the sample space.

The power set

When the elements inside a set are finite and countable, we can calculate the total number of possible event types in two elegant ways.

Consider the graphical approach drawn below. We can map all possible element combinations for every possible subset size. Doing so highlights the existence of graph symmetry where, for example, the smallest and largest subsets contain the same number of element combinations i.e. 1. The graphical approach, however, is cumbersome for large sets.

We can then consider the numerical approach. For each subset size, we must calculate how many unique element combinations exist. We do not care about element order and element repetition also cannot occur.

When a set has 3 elements, the combinations of a subset containing 2 elements is $\frac{3!}{(3-2)!\times 2!}$ or ${3\choose2}$ .
For the same set of 3 elements, the combinations of a subset containing 1 element is $\frac{3!}{(3-1)!\times 1!}$ or ${3\choose1}$ .

Note: The graph symmetry of a power set can be explained by the observation that $\frac{3!}{(3-2)!\times 2!}= \frac{3!}{(3-1)!\times 1!} = \frac{3!}{2!}$ .

The power set, or total number of possible event types, is therefore the sum of all possible subset combinations. A quick mathematical proof using the binomial theorem shows how the power set can be calculated as , where n is the size of the set.

Set operations

Set operations are methods for manipulating sets and are useful tools for describing the properties of the probability space.

The set complement is defined as all the elements that do not belong in the specified set. The set complement can be used to describe the probability that an event does not occur.
The union of two sets is defined as the set of elements that are included in either set. The union of two sets can be used to describe the probability of either event A or event B occurring.
The intersection of two sets is defined as the set of elements that are included in both sets. The intersection of two sets can be used to describe the probability that event A and event B both occurs.

# Perform set operations in R --------------------------------------------------
a <- c(1, 2, 3)
b <- c(2, 4) 

union(a, b)
#> [1] 1 2 3 4

intersect(a, b)  
#> [1] 2 

setdiff(a, b)
#> [1] 1 3 

setequal(a, b) 
#> [1] FALSE

# Perform set operations in Python ---------------------------------------------
# Variables in the R environment can be accessed in Python via R.variable
# Atomic vectors in R are automatically converted into Python lists

a = set(r.a)
b = set(r.b)

a.union(b) # Can also be evaluated as a | b
#> {1.0, 2.0, 3.0, 4.0}

a.intersection(b) # Can also be evaluated as a & b
#> {2.0}

a.difference(b)
#> {1.0, 3.0}

a.symmetric_difference(b) # Can also be evaluated as a ^ b
#> {1.0, 3.0, 4.0}

Revisiting scenario 2, the probability that the sum of two dice is less than 5 and an odd number is the intersection of $E_1\subset\{2,3,4\}$ and $E_2\subset\{2,4,6\}$ or $E=E_1\cap E_2=\{2,4\}$ .

If we were asked to find the probability that the sum of two dice is less than 5 or an odd number, this would be $E=E_1\cup E_2=\{2,3,4,6\}$ , which is a very different subset of the sample space.

General rules of probability

We can think of a function as a relation that associates a set of elements in the input space (subset A in X) to a set of elements in the output space (subset B in Y). A function induces this mapping of $A\to B$ by applying itself to each individual element in the input space. The act of being a relation implies that an inverse function also exists, which maps the set of elements in the output space back to the set of elements in the input space i.e. $A = f^{-1}(B)$ .

A probability distribution can be thought of as the function which maps a set of events to their set of probabilities.

Consider the set as a subset of the sample space X, where $P_\pi(A_1)$ is the probability assigned to by the probability distribution $\pi$ :

The probability of X is the probability of all events (or all possible subsets) occurring i.e. $P_{\pi}(X)=1$ . Since probability is the ratio of the event to the sample space, we can define $P_{\pi}(X)=1$ for simplicity.
The complement of $P_{\pi}(X)=1$ is the probability of an impossible event i.e. $P_{\pi}(X^C)=0$ .
The range of possible probabilities for is therefore $0\leq P_{\pi}(A_1)\leq1$ and $P_{\pi}(A_1)=1-P_{\pi}(A_1^C)$ .

Consider the set as a different subset of the sample space X:

If and are mutually exclusive (elements in and do not overlap), the intersection of and is 0 i.e. $P_\pi(A_1\cap A_2) = 0$ and the probability of or occurring is $P_\pi(A_1\cup A_2) = P_\pi(A_1) + P_\pi(A_2)$ .
If and are not mutually exclusive (elements in and overlap), the probability of or occurring is $P_\pi(A_1\cup A_2) = P_\pi(A_1) + P_\pi(A_2) - P_\pi(A_1\cap A_2)$ .

Note: The term mutually exclusive refers to the property of whether elements in two or more subsets overlap with each other.

Acknowledgements

The source materials for this tutorial are:

The Probability for Data Science textbook by Stanley H Chan, specifically Chapter 2 on probability
Introduction to probability theory GitHub resource by Michael Betancourt
Introduction to probability theory Youtube series from MIT
General probability rules from STAT800 from Penn State Eberly College of Science

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

probability-introduction.md

probability-introduction.md

Introduction to probability theory

Introduction to probability

Scenario 1

Scenario 2

Set notations

The power set

Set operations

General rules of probability

Acknowledgements

Files

probability-introduction.md

Latest commit

History

probability-introduction.md

File metadata and controls

Introduction to probability theory

Introduction to probability

Scenario 1

Scenario 2

Set notations

The power set

Set operations

General rules of probability

Acknowledgements