You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A Markov chain is a stochastic model describing how a random variable
changes through time. Markov chains are memoryless, and future states
depend only on the current state of the chain and not on any past
states. This is known as the Markov property.
A discrete-time Markov chain (DTMC) is a sequence of random variables
generated by a stochastic process in which the value of the next
variable depends only on the value of the current variable. For example,
consider a simple Markovian weather model in which days are either dry
or wet. If today is dry, then tomorrow has a 90% chance of being dry and
a 10% chance of being wet. If today is wet, then tomorrow has a 30%
chance of being wet and a 70% chance of being dry. Whatever happens
tomorrow is conditionally independent of what happened yesterday. This
can be represented by a graphical model with two states and four edges.
An example graphical model. Each circle
represents a state. Each edge represents a state transitions with
associated weights.
This information can also be represented using a matrix describing the
transition from today to tomorrow:
Assuming time-homogeneity, the behavior of this chain can be encoded
into a single transition matrix, $P$, such that
$p_{ij} = \Pr(X_{t+1} = j | X_t = i) = \cdots = \Pr(X_{1} = j | X_0 = i)$.
Let $\pi^{(t)}$ be a (row) vector describing the probability of states
of the chain at time $t$, i.e. $\pi^{(t)}_i = \Pr(X_{t} = i)$, where
$(t)$ is an index and not a power. This vector can be calculated from
$\pi^{(t-1)}$ using matrix multiplication:
$$\pi^{(t)} = \pi^{(t-1)} \times P = \pi^{(t-2)} \times P \times P = \pi^{(0)} \times P^t$$
where $\pi^{(0)}$ represents the initial distribution of states at
time $t = 0$.
Stationary Probabilities
Markov chains can have a unique stationary probability vector that is
also the limiting probability vector. (This is true for the Markov
models we use.) Let $\pi$ be the stationary probability (row) vector.
Then the following properties are true.
This has two implications. One, if the chain’s current marginal
probability is $\pi$, then the marginal probability will not change.
Two, if it is not $\pi$, then it will approach $\pi$ asymptotically.
$\pi$ can be found by (1) raising $P$ to a large power, (2) through
spectral decomposition, or (3) solving a system of linear equations.
Time Reversibility
A Markov chain is reversible if going forwards in time is the same as
going backwards in time. Specifically,
Most phylogenetic models are reversible models because they allow
provide the same probabilities regardless to whether we are going up or
going down a tree.
Spectral decomposition
If $P$ is a $n \times n$ matrix with $n$ linearly independent
eigenvectors $u_i$, then $P$ can be factored as
$P = U \Lambda U^{-1}$, where $U$ is a $n \times n$ matrix whose
i-th column is $u_i$ and $\Lambda$ is a $n \times n$ diagonal
matrix where $\Lambda_{ii} = \lambda_i$ is the i-th eigenvalue of
$P$.
Spectral decomposition is useful for simplifying the calculation of
matrix powers.
$$P^t = (U \Lambda U^{-1})^t = U \Lambda^t U^{-1}$$
Since $\Lambda$ is diagonal, $\Lambda^t$ is easier to calculate than
$P^t$.
Application to Mutation
Consider a haploid DNA locus for four states $X \in \{1, 2, 3, 4\}$.
These states could represent different nucleotides or different alleles.
The transmission of this locus from one generation to the next can be
represented by a transition matrix
where the off diagonals represent mutation events and the diagonals
represent no mutation. If this locus is transmitted clonally
(e.g. mitochondrial DNA), then $\Pr(X_t = j | X_0 = i) = P^{(t)}_{ij}$
where $P^{(t)} = P^t$.
Probability of No Mutations
$\Pr(X_t = i | X_0 = i)$ does not measure the probability that no
mutation occurred, only that the state is the same after $t$
generations. To calculate
$\Pr(X_t = j \text{ and zero mutations} | X_0 = i)$, we will first
first split $P$ into two matrices, $P = Z + J$, such that $Z$
contains the diagonal elements of $P$ and $J$ contains the
off-diagonals. From here, it is straightforward to calculate
$$\left[\Pr(X_t = j \text{ and zero mutations} | X_0 = i)\right] = Z^t$$
Probability of One Mutation
If one and only one mutation occurred across $t$ generations, then
there are $t$ possible time points when the mutation occurred. The
probability of one and only one mutation occurring across $t$
generations can be calculated by summing over all possible time points.
$$\Pr(X_t = j \text{ and one mutation} | X_0 = i) =
\left[\sum_{k=1}^t Z^{k-1} J^1 Z^{t-k}\right]_{ij}$$
The expected number of mutations given $X_t = j$ and $X_0 = i$ can
be calculated using the property that every step is independent. Let
$K_s$ be the number of mutations that occurred at time $s$ and
$K = \sum_{s=0}^t K_s$ be the total number of mutations. Then
$$\begin{align*}
E\left[K | X_t, X_0 \right] = & \frac{\sum_{s=1}^t P^{s-1} J P^{t-s}}{P^t}\\\
= & \frac{t P^t - \sum_{s=1}^t P^{s-1} Z P^{t-s}}{P^t}\\\
= & t - \frac{\sum_{s=1}^t P^{s-1} Z P^{t-s}}{P^t}
\end{align*}$$
using element-wise division.
Calculating the numerator can be simplified if we decompose $P$ into
its diagonal form. Let $P = U \Lambda U^{-1} = U \Lambda U^{T}$ where
$\Lambda_{ii} = \lambda_i$ is the i-th eigenvalue of $P$ and column
$u_{\cdot j}$ is the j-th eigenvector of $P$. Therefore,