Skip to content

Latest commit

 

History

History
275 lines (215 loc) · 8.6 KB

File metadata and controls

275 lines (215 loc) · 8.6 KB

Discrete-Time Markov Chains

Introduction

A Markov chain is a stochastic model describing how a random variable changes through time. Markov chains are memoryless, and future states depend only on the current state of the chain and not on any past states. This is known as the Markov property.

A discrete-time Markov chain (DTMC) is a sequence of random variables generated by a stochastic process in which the value of the next variable depends only on the value of the current variable. For example, consider a simple Markovian weather model in which days are either dry or wet. If today is dry, then tomorrow has a 90% chance of being dry and a 10% chance of being wet. If today is wet, then tomorrow has a 30% chance of being wet and a 70% chance of being dry. Whatever happens tomorrow is conditionally independent of what happened yesterday. This can be represented by a graphical model with two states and four edges.

An example graphical model. Each circle represents a state. Each edge represents a state transitions with associated weights.

An example graphical model. Each circle represents a state. Each edge represents a state transitions with associated weights.

This information can also be represented using a matrix describing the transition from today to tomorrow:

$$P = \left[ \begin{matrix} 0.9 & 0.1\\\ 0.7 & 0.3\\\ \end{matrix} \right]$$

Here each row represents the current day and each column represents the next day.

Definition

A discrete-time Markov chain is a sequence of random variables $X_0$, $X_1$, $X_2$, etc. with the Markov property:

$$\Pr(X_{t+1} = x | X_0 = x_0, X_1 = x_1, \ldots , X_t = x_t) = \Pr(X_{t+1} = x | X_t = x_t)$$

Assuming time-homogeneity, the behavior of this chain can be encoded into a single transition matrix, $P$, such that $p_{ij} = \Pr(X_{t+1} = j | X_t = i) = \cdots = \Pr(X_{1} = j | X_0 = i)$.

Let $\pi^{(t)}$ be a (row) vector describing the probability of states of the chain at time $t$, i.e. $\pi^{(t)}_i = \Pr(X_{t} = i)$, where $(t)$ is an index and not a power. This vector can be calculated from $\pi^{(t-1)}$ using matrix multiplication:

$$\pi^{(t)} = \pi^{(t-1)} \times P = \pi^{(t-2)} \times P \times P = \pi^{(0)} \times P^t$$

where $\pi^{(0)}$ represents the initial distribution of states at time $t = 0$.

Stationary Probabilities

Markov chains can have a unique stationary probability vector that is also the limiting probability vector. (This is true for the Markov models we use.) Let $\pi$ be the stationary probability (row) vector. Then the following properties are true.

$$\pi P = \pi P^t = \pi \quad\text{and}\quad \sum_i \pi_i = 1$$ $$\lim_{t \to \infty} p^t_{ij} = \pi_j$$

This has two implications. One, if the chain’s current marginal probability is $\pi$, then the marginal probability will not change. Two, if it is not $\pi$, then it will approach $\pi$ asymptotically.

$\pi$ can be found by (1) raising $P$ to a large power, (2) through spectral decomposition, or (3) solving a system of linear equations.

Time Reversibility

A Markov chain is reversible if going forwards in time is the same as going backwards in time. Specifically,

$$\Pr(X_{t+1} = j , X_t = i) = \Pr(X_{t+1} = i , X_t = j)$$ $$\pi_i \Pr(X_{t+1} = j | X_t = i) = \pi_j \Pr(X_{t+1} = i | X_n = j)$$ $$\pi_i p_{ij} = \pi_j p_{ji}$$

Most phylogenetic models are reversible models because they allow provide the same probabilities regardless to whether we are going up or going down a tree.

Spectral decomposition

If $P$ is a $n \times n$ matrix with $n$ linearly independent eigenvectors $u_i$, then $P$ can be factored as $P = U \Lambda U^{-1}$, where $U$ is a $n \times n$ matrix whose i-th column is $u_i$ and $\Lambda$ is a $n \times n$ diagonal matrix where $\Lambda_{ii} = \lambda_i$ is the i-th eigenvalue of $P$.

Spectral decomposition is useful for simplifying the calculation of matrix powers.

$$P^t = (U \Lambda U^{-1})^t = U \Lambda^t U^{-1}$$

Since $\Lambda$ is diagonal, $\Lambda^t$ is easier to calculate than $P^t$.

Application to Mutation

Consider a haploid DNA locus for four states $X \in \{1, 2, 3, 4\}$. These states could represent different nucleotides or different alleles. The transmission of this locus from one generation to the next can be represented by a transition matrix

$$P = \begin{bmatrix} 1 - (u_{12} + u_{13} + u_{14}) & u_{12} & u_{13} & u_{14}\\\ u_{21} & 1 - (u_{21} + u_{23} + u_{24}) & u_{23} & u_{24}\\\ u_{31} & u_{32} & 1 - (u_{31} + u_{32} + u_{34}) & u_{34}\\\ u_{41} & u_{42} & u_{43} & 1 - (u_{41} + u_{42} + u_{43})\\\ \end{bmatrix}$$

where the off diagonals represent mutation events and the diagonals represent no mutation. If this locus is transmitted clonally (e.g. mitochondrial DNA), then $\Pr(X_t = j | X_0 = i) = P^{(t)}_{ij}$ where $P^{(t)} = P^t$.

Probability of No Mutations

$\Pr(X_t = i | X_0 = i)$ does not measure the probability that no mutation occurred, only that the state is the same after $t$ generations. To calculate $\Pr(X_t = j \text{ and zero mutations} | X_0 = i)$, we will first first split $P$ into two matrices, $P = Z + J$, such that $Z$ contains the diagonal elements of $P$ and $J$ contains the off-diagonals. From here, it is straightforward to calculate

$$\left[\Pr(X_t = j \text{ and zero mutations} | X_0 = i)\right] = Z^t$$

Probability of One Mutation

If one and only one mutation occurred across $t$ generations, then there are $t$ possible time points when the mutation occurred. The probability of one and only one mutation occurring across $t$ generations can be calculated by summing over all possible time points.

$$\Pr(X_t = j \text{ and one mutation} | X_0 = i) = \left[\sum_{k=1}^t Z^{k-1} J^1 Z^{t-k}\right]_{ij}$$

If $i = j$, this value is 0. Otherwise,

$$\begin{align} \Pr(X_t = j \text{ and one mutation} | X_0 = i) = & \sum_{k=1}^t p_{ii}^{k-1} p_{ij} p_{jj}^{t-k}\\\ = & \begin{cases}p_{ij} \frac{p_{ii}^t - p_{jj}^t}{p_{ii}-p_{jj}} : p_{ii} \ne p_{jj}\\\ t p_{ij} p_{ii}^{t-1} : p_{ii} = p_{jj} \end{cases} \end{align}$$

Expected number of mutations

The expected number of mutations given $X_t = j$ and $X_0 = i$ can be calculated using the property that every step is independent. Let $K_s$ be the number of mutations that occurred at time $s$ and $K = \sum_{s=0}^t K_s$ be the total number of mutations. Then

$$E\left[K | X_t = j, X_0 = i\right] = \sum_{s=1}^t E\left[K_s | X_n = j, X_0 = i\right]$$

Furthermore,

$$\begin{align*} E\left[K_s | X_n = j, X_0 = i\right] = & \frac{E\left[K_s | X_n = j, X_0 = i\right] \times \Pr\left( X_t = j | X_0 = i\right)}{\Pr\left( X_t = j | X_0 = i\right)}\\\ = & \frac{E\left[K_s \times [X_t = j] | X_0 = i\right]}{\Pr\left( X_t = j | X_0 = i\right)} \end{align*}$$

where $[X_t = j]$ uses Iverson bracket notation for an indicator function.

The denominator can be obtained from $P^t$ while the numerator is

$$\begin{align*} E\left[K_s \times [X_t = j] | X_0 = i\right] = & \sum_{k \in \{0,1\}} k \Pr\left(K_s = k, X_t = j | X_0 = i \right) \\\ = & \Pr\left(K_s = 1, X_t = j | X_0 = i \right)\\\ = & \left[ P^{s-1} J P^{t-s} \right]_{ij} \end{align*}$$

Therefore,

$$\begin{align*} E\left[K | X_t, X_0 \right] = & \frac{\sum_{s=1}^t P^{s-1} J P^{t-s}}{P^t}\\\ = & \frac{t P^t - \sum_{s=1}^t P^{s-1} Z P^{t-s}}{P^t}\\\ = & t - \frac{\sum_{s=1}^t P^{s-1} Z P^{t-s}}{P^t} \end{align*}$$

using element-wise division.

Calculating the numerator can be simplified if we decompose $P$ into its diagonal form. Let $P = U \Lambda U^{-1} = U \Lambda U^{T}$ where $\Lambda_{ii} = \lambda_i$ is the i-th eigenvalue of $P$ and column $u_{\cdot j}$ is the j-th eigenvector of $P$. Therefore,

$$p^t_{ij} = \sum_{k = 1}^n u_{i k} u_{j k} \lambda_k^t$$

and

$$\begin{align*}\\\ \left[\sum_{s=0}^t P^{s-1} Z P^{t-s}\right]_{ij} = & \sum_{s=0}^t \sum_{a=1}^n \sum_{b=1}^n \sum_{c=1}^n u_{ia} \lambda^{s-1}_{a} u^T_{ac} \cdot z_{cc} \cdot u_{cb} \lambda^{t-s}_{b} u^T_{bj}\\\ = & \sum_{a=1}^n \sum_{b=1}^n \sum_{c=1}^n z_{cc} u_{ia} u_{ca} u_{cb} u_{jb} \sum_{s=0}^t \lambda^{s-1}_{a} \lambda^{t-s}_{b}\\\ = & \sum_{a=1}^n \sum_{b=1}^n \sum_{c=1}^nz_{cc} u_{ia} u_{ca} u_{cb} u_{jb} \mathcal{J}_{ab}(t) \end{align*}$$

where

$$\mathcal{J}_{ab}(t) = \begin{cases} t \lambda^{t-1}_a & \text{if} & \lambda_a = \lambda_b\\\ \frac{\lambda^t_a - \lambda^t_b}{\lambda_a - \lambda_b} & \text{if} & \lambda_a \ne \lambda_b \end{cases}$$