Skip to content

Latest commit

 

History

History
318 lines (268 loc) · 15.4 KB

probability-introduction.md

File metadata and controls

318 lines (268 loc) · 15.4 KB

Introduction to probability theory

Erika Duan 2/26/23

Introduction to probability

Probability is a numerical measure of how likely an event is to occur. The frequentist approach considers probability as the long term outcome of a large sampling experiment.
Probability can also be thought of as the size of a mathematical set S, which can also be visualised in 2D as the occupation of a rectangle S.

In probability theory:

  • The random experiment is a process that results in several possible outcomes, none of which are certain to occur. For example, a fair dice is rolled and the number on its top face is observed.
  • The sample point is one possible distinct outcome of the experiment. For example, a sample point is 1.
  • The sample space is the set of all possible outcomes. For example, S = \{1, 2, 3, 4, 5, 6 \} as the events are mutually exclusive.
  • A simple event is a subset which only contains one sample point. For example, a simple event is A = {1}.
  • An event is any possible subset of the sample space. For example, an event is any number greater than 3 i.e. B = {4, 5, 6}.

To calculate the probability of an event, we can count all the ways that an event could have occurred out of all possibilities generated.

Scenario 1

Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice equals 5?

  • We know that the outcome of one dice throw is independent of the other.
  • We need to calculate all possible combinations of two independent dice rolls i.e. the sample space. As there are 6 faces on one dice and we are rolling two dice, the sample space is 6^2 possible outcomes.
  • We then calculate all possible dice roll events which sum to 5. This is the total number of times event E could have occurred.
  • The probability of event E occurring is therefore the proportion of the number of times E could have occurred over the sample space i.e. P(E) = \frac{4}{36} \approx 0.11.

Scenario 2

Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice is less than 5 and an odd number?

  • The sample space is still the same, as the total number of possible dice roll combinations is fixed.
  • The event subset has changed as we are interested in the intersection of E_1\subset\{2, 3, 4\} and E_2\subset\{2, 4, 6\} i.e. E=E_1\cap E_2=\{2, 4\}.
  • In this scenario, the probability that the sum of two dice is \{2,4\} is \frac{4}{36} or approximately 0.11.

Set notations

Sets are used to denote object belonging under a specific condition. The statement “the set of elements x in the space X such that condition f(x)>0 holds” is represented by the notation S = \{x\in X\vert f(x)>0\}.

Examples of sets include:

  • The set A_1 = \{x\in\mathbb{N}\vert 1\leq x\leq 5\} is a finite set with a finite closed interval on the set of all natural numbers.
  • The set A_2 = \{x\in\mathbb{R}\vert -5\leq x\leq 1\} is an infinite set with a finite closed interval on the set of all real numbers.
  • The set F = \{f:\mathbb{R}\to\mathbb{R}\vert f(x)=ax+b,\;a,b\in\mathbb{R}\} is an infinite set of all straight lines in 2D as f takes the specific form f(x)=ax+b.

In probability theory, the event can be viewed as a subset within the set of the sample space, where the total number of possible event types (or total possible event combinations) is represented by the power set of the sample space.

The power set

When the elements inside a set are finite and countable, we can calculate the total number of possible event types in two elegant ways.

Consider the graphical approach drawn below. We can map all possible element combinations for every possible subset size. Doing so highlights the existence of graph symmetry where, for example, the smallest and largest subsets contain the same number of element combinations i.e. 1. The graphical approach, however, is cumbersome for large sets.

We can then consider the numerical approach. For each subset size, we must calculate how many unique element combinations exist. We do not care about element order and element repetition also cannot occur.

When a set has 3 elements, the combinations of a subset containing 2 elements is \frac{3!}{(3-2)!\times 2!} or {3\choose2}.
For the same set of 3 elements, the combinations of a subset containing 1 element is \frac{3!}{(3-1)!\times 1!} or {3\choose1}.

Note: The graph symmetry of a power set can be explained by the observation that \frac{3!}{(3-2)!\times 2!}= \frac{3!}{(3-1)!\times 1!} = \frac{3!}{2!}.

The power set, or total number of possible event types, is therefore the sum of all possible subset combinations. A quick mathematical proof using the binomial theorem shows how the power set can be calculated as 2^n, where n is the size of the set.

Set operations

Set operations are methods for manipulating sets and are useful tools for describing the properties of the probability space.

  • The set complement is defined as all the elements that do not belong in the specified set. The set complement can be used to describe the probability that an event does not occur.
  • The union of two sets is defined as the set of elements that are included in either set. The union of two sets can be used to describe the probability of either event A or event B occurring.
  • The intersection of two sets is defined as the set of elements that are included in both sets. The intersection of two sets can be used to describe the probability that event A and event B both occurs.
# Perform set operations in R --------------------------------------------------
a <- c(1, 2, 3)
b <- c(2, 4) 

union(a, b)
#> [1] 1 2 3 4

intersect(a, b)  
#> [1] 2 

setdiff(a, b)
#> [1] 1 3 

setequal(a, b) 
#> [1] FALSE
# Perform set operations in Python ---------------------------------------------
# Variables in the R environment can be accessed in Python via R.variable
# Atomic vectors in R are automatically converted into Python lists

a = set(r.a)
b = set(r.b)

a.union(b) # Can also be evaluated as a | b
#> {1.0, 2.0, 3.0, 4.0} 
a.intersection(b) # Can also be evaluated as a & b
#> {2.0} 
a.difference(b)
#> {1.0, 3.0}
a.symmetric_difference(b) # Can also be evaluated as a ^ b
#> {1.0, 3.0, 4.0}

Revisiting scenario 2, the probability that the sum of two dice is less than 5 and an odd number is the intersection of E_1\subset\{2,3,4\} and E_2\subset\{2,4,6\} or E=E_1\cap E_2=\{2,4\}.

If we were asked to find the probability that the sum of two dice is less than 5 or an odd number, this would be E=E_1\cup E_2=\{2,3,4,6\}, which is a very different subset of the sample space.

General rules of probability

We can think of a function as a relation that associates a set of elements in the input space (subset A in X) to a set of elements in the output space (subset B in Y). A function induces this mapping of A\to B by applying itself to each individual element in the input space. The act of being a relation implies that an inverse function also exists, which maps the set of elements in the output space back to the set of elements in the input space i.e. A = f^{-1}(B).

A probability distribution can be thought of as the function which maps a set of events to their set of probabilities.

Consider the set A_1 as a subset of the sample space X, where P_\pi(A_1) is the probability assigned to A_1 by the probability distribution \pi:

  • The probability of X is the probability of all events (or all possible subsets) occurring i.e. P_{\pi}(X)=1. Since probability is the ratio of the event to the sample space, we can define P_{\pi}(X)=1 for simplicity.
  • The complement of P_{\pi}(X)=1 is the probability of an impossible event i.e. P_{\pi}(X^C)=0.
  • The range of possible probabilities for A_1 is therefore 0\leq P_{\pi}(A_1)\leq1 and P_{\pi}(A_1)=1-P_{\pi}(A_1^C).

Consider the set A_2 as a different subset of the sample space X:

  • If A_1 and A_2 are mutually exclusive (elements in A_1 and A_2 do not overlap), the intersection of A_1 and A_2 is 0 i.e. P_\pi(A_1\cap A_2) = 0 and the probability of A_1 or A_2 occurring is P_\pi(A_1\cup A_2) = P_\pi(A_1) + P_\pi(A_2).
  • If A_1 and A_2 are not mutually exclusive (elements in A_1 and A_2 overlap), the probability of A_1 or A_2 occurring is P_\pi(A_1\cup A_2) = P_\pi(A_1) + P_\pi(A_2) - P_\pi(A_1\cap A_2).

Note: The term mutually exclusive refers to the property of whether elements in two or more subsets overlap with each other.

Acknowledgements

The source materials for this tutorial are: