Erika Duan 2/26/23
- Introduction to probability
- Set notations
- Set operations
- General rules of probability
- Acknowledgements
Probability is a numerical measure of how likely an event is to occur.
The frequentist approach considers probability as the long term outcome
of a large sampling experiment.
Probability can also be thought of as the size of a mathematical set
, which can also be
visualised in 2D as the occupation of a rectangle
.
In probability theory:
- The random experiment is a process that results in several possible outcomes, none of which are certain to occur. For example, a fair dice is rolled and the number on its top face is observed.
- The sample point is one possible distinct outcome of the experiment. For example, a sample point is 1.
- The sample space is the set of all possible outcomes. For
example,
as the events are mutually exclusive.
- A simple event is a subset which only contains one sample point. For example, a simple event is A = {1}.
- An event is any possible subset of the sample space. For example, an event is any number greater than 3 i.e. B = {4, 5, 6}.
To calculate the probability of an event, we can count all the ways that an event could have occurred out of all possibilities generated.
Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice equals 5?
- We know that the outcome of one dice throw is independent of the other.
- We need to calculate all possible combinations of two independent
dice rolls i.e. the sample space. As there are 6 faces on one dice
and we are rolling two dice, the sample space is
possible outcomes.
- We then calculate all possible dice roll events which sum to 5. This
is the total number of times event
could have occurred.
- The probability of event
occurring is therefore the proportion of the number of times
could have occurred over the sample space i.e.
.
Imagine that we simultaneously rolled two fair dice. What is the probability that the sum of two dice is less than 5 and an odd number?
- The sample space is still the same, as the total number of possible dice roll combinations is fixed.
- The event subset has changed as we are interested in the
intersection of
and
i.e.
.
- In this scenario, the probability that the sum of two dice is
is
or approximately 0.11.
Sets are used to denote object belonging under a specific condition. The
statement “the set of elements
in the space
such that condition
holds” is represented by the notation
.
Examples of sets include:
- The set
is a finite set with a finite closed interval on the set of all natural numbers.
- The set
is an infinite set with a finite closed interval on the set of all real numbers.
- The set
is an infinite set of all straight lines in 2D as
takes the specific form
.
In probability theory, the event can be viewed as a subset within the set of the sample space, where the total number of possible event types (or total possible event combinations) is represented by the power set of the sample space.
When the elements inside a set are finite and countable, we can calculate the total number of possible event types in two elegant ways.
Consider the graphical approach drawn below. We can map all possible element combinations for every possible subset size. Doing so highlights the existence of graph symmetry where, for example, the smallest and largest subsets contain the same number of element combinations i.e. 1. The graphical approach, however, is cumbersome for large sets.
We can then consider the numerical approach. For each subset size, we must calculate how many unique element combinations exist. We do not care about element order and element repetition also cannot occur.
When a set has 3 elements, the combinations of a subset containing 2
elements is
or
.
For the same set of 3 elements, the combinations of a subset containing
1 element is
or
.
Note: The graph symmetry of a power set can be explained by the
observation that
.
The power set, or total number of possible event types, is therefore the
sum of all possible subset combinations. A quick mathematical
proof using the binomial
theorem shows how the power set can be calculated as
, where n is the
size of the set.
Set operations are methods for manipulating sets and are useful tools for describing the properties of the probability space.
- The set complement is defined as all the elements that do not belong in the specified set. The set complement can be used to describe the probability that an event does not occur.
- The union of two sets is defined as the set of elements that are included in either set. The union of two sets can be used to describe the probability of either event A or event B occurring.
- The intersection of two sets is defined as the set of elements that are included in both sets. The intersection of two sets can be used to describe the probability that event A and event B both occurs.
# Perform set operations in R --------------------------------------------------
a <- c(1, 2, 3)
b <- c(2, 4)
union(a, b)
#> [1] 1 2 3 4
intersect(a, b)
#> [1] 2
setdiff(a, b)
#> [1] 1 3
setequal(a, b)
#> [1] FALSE# Perform set operations in Python ---------------------------------------------
# Variables in the R environment can be accessed in Python via R.variable
# Atomic vectors in R are automatically converted into Python lists
a = set(r.a)
b = set(r.b)
a.union(b) # Can also be evaluated as a | b
#> {1.0, 2.0, 3.0, 4.0} a.intersection(b) # Can also be evaluated as a & b
#> {2.0} a.difference(b)
#> {1.0, 3.0}a.symmetric_difference(b) # Can also be evaluated as a ^ b
#> {1.0, 3.0, 4.0}Revisiting scenario 2, the probability that the sum of
two dice is less than 5 and an odd number is the intersection of
and
or
.
If we were asked to find the probability that the sum of two dice is
less than 5 or an odd number, this would be
,
which is a very different subset of the sample space.
We can think of a function as a relation that associates a set of
elements in the input space (subset A in X) to a set of elements in the
output space (subset B in Y). A function induces this mapping of
by
applying itself to each individual element in the input space. The act
of being a relation implies that an inverse function also exists, which
maps the set of elements in the output space back to the set of elements
in the input space
i.e.
.
A probability distribution can be thought of as the function which maps a set of events to their set of probabilities.
Consider the set
as a subset of the sample space X, where
is the probability assigned to
by the
probability distribution
:
- The probability of X is the probability of all events (or all possible
subsets) occurring
i.e.
. Since probability is the ratio of the event to the sample space, we can define
for simplicity.
- The complement of
is the probability of an impossible event i.e.
.
- The range of possible probabilities for
is therefore
and
.
Consider the set
as a different subset of the sample space X:
- If
and
are mutually exclusive (elements in
and
do not overlap), the intersection of
and
is 0 i.e.
and the probability of
or
occurring is
.
- If
and
are not mutually exclusive (elements in
and
overlap), the probability of
or
occurring is
.
Note: The term mutually exclusive refers to the property of whether elements in two or more subsets overlap with each other.
The source materials for this tutorial are:
- The Probability for Data Science textbook by Stanley H Chan, specifically Chapter 2 on probability
- Introduction to probability theory GitHub resource by Michael Betancourt
- Introduction to probability theory Youtube series from MIT
- General probability rules from STAT800 from Penn State Eberly College of Science