A first look at applications of Markov chains

This post presents several interesting examples of Markov chain models. Previous posts on Markov chains are: an introduction, how to work with transition probabilities (here and here).

Examples

The following gives several examples of Markov chains, discussed at an introductory level. They will be further developed in subsequent posts.

Example 1 – The Ehrenfest Chain

The Ehrenfest chain, named for the physicist Paul Ehrenfest, is a simple, discrete model for the exchange of gas molecules contained in a volume divided into two regions A and B by a permeable membrane. Suppose that the gas has $N$ molecules. At each period of time, a molecule is chosen at random from the set of $N$ molecules and moved from the region that it is in to the other region.

Figure 1 – The Efrenfest Chain

The model has a simple mathematical description using balls and urns. Suppose that two urns, labeled A and B, contain $N$ balls, labeled $1, 2, 3, \cdots, N$. At each step (i.e. at each discrete time unit), an integer is selected at random from $1, 2, 3, \cdots, N$ independent of the past selections. Then the ball labeled by the selected integer is removed from its urn and placed in the other urn. The procedure is repeated indefinitely.

The state of the system at time $n=0,1,2,\cdots$ is the number of balls in urn A and is denoted by $X_n$. Then $\left\{X_n: n=0,1,2,\cdots \right\}$ is a Markov chain on the state space $\left\{0,1,\cdots,N \right\}$.

When the process is in state 0 (urn A is empty), the next state is 1. When the process is in state $N$ (urn A is full), the next state is $N-1$. When the process is in state $i$ where $0, the next state of the process is either $i-1$ or $i+1$. The following gives the one-step transition probabilities:

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle 1 &\ \ \ \ \ \ i=0 \text{ and } j=1 \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=N \text{ and } j=N-1 \\ \text{ } & \text{ } \\ \displaystyle \frac{i}{N} &\ \ \ \ \ \ 0

When urn A has $i$ balls, there is a probability of $\frac{i}{N}$ such that the randomly selected ball is from urn A and thus urn A loses a ball. On the other hand, there is a probability of $1-\frac{i}{N}$ such that the randomly selected ball is from urn B and thus urn A gains a ball. As an illustration, the following gives the transition probability matrix for $N=5$ balls.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 \cr 1 & \frac{1}{5} & 0 & \frac{4}{5} & 0 & 0 & 0 \cr 2 & 0 & \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0 \cr 3 & 0 & 0 & \frac{3}{5} & 0 & \frac{2}{5} & 0 \cr 4 & 0 & 0 & 0 & \frac{4}{5} & 0 & \frac{1}{5} \cr 5 & 0 & 0 & 0 & 0 & 1 & 0 \cr } \qquad$

An important problem in the Ehrenfest chain is the long-term, or equilibrium, distribution of $X_n$, the number of balls in urn A. Another interesting problem concerns the changes in the composition of the two regions over time. For example, if all molecules are in one region at the beginning, how long on average will it be before each region has half of the molecules? The theory of Markov chains provide a good method for answering such questions.

Example 2 – Random Walk

The simplest random walk is a Markov chain $\left\{X_n: n=0,1,2,\cdots \right\}$ such that each state is the result of a random one-unit up or down move from the previous state.

Figure 2 – Five Simulations of a Random Walk

In the random walk in Figure 1, each state is one unit above or below the preceding state with equal probability. Instead of a random one-unit up or down move, let’s give a more general description of a random walk. The moves in the random walk are determined by a predetermined discrete distribution. Let $Y_1, Y_2, Y_3 \cdots$ be integer-valued random variables that are independent and identically distributed. Let $P[Y=y]$ be the common probability function. Let $X_0$ be an integer-valued random variable that is independent of the random variables $Y_n$. The random variable $X_0$ will be the initial position of the random walk. The random variables $Y_n$ are the increments (they are the amounts added to the stochastic process as time increases).

For $n \ge 1$, define $X_n=X_0+Y_1+\cdots+Y_{n-1}+Y_n$. Then $\left\{X_n: n=0,1,2,\cdots \right\}$ is called a random walk. It is a Markov chain with the state space the set of all integers. Wherever the process is at any given time, the next step in the process is always a walk in a distance and direction that is dictated by the independent random variables $Y_n$. If the random variables $Y_n$ are independent Bernoulli variables with $P(Y_n=1)=p$ and $P(Y_n=-1)=1-p$, then we revert back to the simple random walks described in Figure 1.

To transition from state $i$ to state $j$, the increment $Y_n$ must be $j-i$. Thus the one-step transition probabilities $P_{ij}$ are defined by $P_{ij}=P[Y=j-i]$. Recall that $P[Y=y]$ is the common probability function for the sequence $Y_1, Y_2, Y_3 \cdots$. For a more detailed description, see this previous post.

Example 3 – Gambler’s Ruin

A gambler’s ruin is a simple random walk that can take on only finitely many states. At each time period, the state is one-unit higher or lower than the preceding state as a result of a random bet. More specifically, suppose that a gambler starts out with $d$ units in capital (in dollars or other monetary units) and makes a series of one-unit bets against the house. For the gambler, the probabilities of winning and losing each bet are $p$ and $1-p$, respectively. Whenever the capital reaches zero, the gambler is in ruin and his capital remains zero thereafter. On the other hand, if the capital of the gambler increases to $m$ where $d, then the gambler quits playing. Let $X_n$ be the capital of the gambler after the $n$th bet. Then $\left\{X_n: n=0,1,2,3,\cdots \right\}$ is a random walk. The starting state is $X_0=d$. The states 0 and $m$ are absorbing states. The following gives the transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle p &\ \ \ \ \ \ j=i+1,i \ne 0, i \ne m \\ \text{ } & \text{ } \\ \displaystyle 1-p &\ \ \ \ \ \ j=i-1,i \ne 0, i \ne m \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=0,j=0 \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=m,j=m \\ \text{ } & \text{ } \\ 0 &\ \ \ \ \ \ \text{otherwise} \end{array} \right.$

Whenever the gambler reaches the absorbing state of 0, the gambler is in ruin. If the process reaches the absorbing state of $m$, the gambler strikes gold. If the initial capital of the gambler is small relative to the casino, there is a virtually 100% chance that the gambler will be in ruin even if each bet has even odds (see here for the calculation). One interesting questions: on average how long does it take for the gambler to be in ruin? Such questions will be answered after necessary tools are built in subsequent posts.

An alternative interpretation of the gamblerâ€™s ruin random walk is this. Assume that two players make a series of one-unit bets against each other and that the total capital between the two players is $m$ units with the first player having $d$ units in capital. Further suppose that the first player has probability $p$ of winning a bet and the second player has probability $1-p$ of winning a bet. Then the two gamblers play until one of them goes broke (is in ruin). Let $X_n$ be the capital of the first player after the $n$th play. Then $\left\{X_n: n=0,1,2,3,\cdots \right\}$ is a Markov chain that is a gambler’s ruin. The following graph shows five simulations of a gambler’s ruin from this previous post.

Figure 3 – Five Simulations of a Gambler’s Ruin

Example 4 – Discrete Birth and Death Chain

In a birth and death chain, the current state $i$ in the process is transitioned to $i+1$ (a birth), $i-1$ (a death) or $i$. The state space is either $\left\{0,1,2,3,\cdots \right\}$ or $\left\{0,1,2,3,\cdots, m \right\}$. The following gives the one-step transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle q_i &\ \ \ \ \ \ j=i-1 \\ \text{ } & \text{ } \\ \displaystyle r_i &\ \ \ \ \ \ j=i \\ \text{ } & \text{ } \\ \displaystyle p_i &\ \ \ \ \ \ j=i+1 \\ \text{ } & \text{ } \\ 0 &\ \ \ \ \ \ \text{otherwise} \end{array} \right.$

where for each state $i$, $q_i$ (the probability of down), $r_i$ (the probability of same) and $p_i$ (the probability of up) are non-negative real numbers such that $q_i+r_i+p_i=1$. The following gives the transition probability matrix for an illustrative case of $m=5$.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & r_0 & p_0 & 0 & 0 & 0 & 0 \cr 1 & q_1 & r_1 & p_1 & 0 & 0 & 0 \cr 2 & 0 & q_2 & r_2 & p_2 & 0 & 0 \cr 3 & 0 & 0 & q_3 & r_3 & p_3 & 0 \cr 4 & 0 & 0 & 0 & q_4 & r_4 & p_4 \cr 5 & 0 & 0 & 0 & 0 & q_5 & r_5 \cr } \qquad$

In the above matrix, $q_0=0$ and $p_5=0$. If $r_0=1$, then state 0 is absorbing. If $p_0=1$, then state 0 is reflecting. On the other hand, if $r_5=1$, then state 5 is absorbing. If $q_5=1$, then state 5 is reflecting.

The Ehrenfest chain is an example of a birth and death chain (with the boundary states being reflecting). When the probability of down $q_i$, the probability of same $r_i$ and the probability of up $p_i$ do not vary according to the current state (i.e. they are constant), the resulting birth and death chain is a random walk. Thus the gambler’s ruin chain is also an example of a birth and death chain. The birth and death chain is suitable model for applications in which the state of the process is the population of a living system.

Example 5 – Discrete Queueing Markov Chain

Though continuous-time queueing models are much more realistic, a simple discrete-time queueing model is introduced here in order to illustrate applications of discrete Markov chains. Suppose that time is measured in a convenient time interval such as one minute or some appropriate group of minutes. The model assumes that during one period of time, only one customer is served if at least one customer is present. The model also assumes that a random number of new customers arrive at any given time period and that the numbers of customers arriving in the time periods form an independent identically distributed sequence of random variables.

More specifically, let $Y_1,Y_2,Y_3,\cdots$ be independent and identically distributed random variables that take on non-negative integers. Let $P[Y=k]$, $k=0,1,2,\cdots$, be the common probability function. The random variable $Y_n$ is the number of customers that arrive into the system during the $n$th period. Let $X_0$ be the number of customers present initially in the system. For $n \ge 1$, let $X_n$ be the number of customers in the system during the $n$th period. If the current state is $X_n$, then the next state is:

$\displaystyle X_{n+1} = \left\{ \begin{array}{ll} \displaystyle Y_{n+1} &\ \ \ \ \ \ X_n=0 \\ \text{ } & \text{ } \\ \displaystyle (X_n-1)+Y_{n+1} &\ \ \ \ \ \ X_n>0 \\ \end{array} \right.$

where $n \ge 0$. In essence, the number of customers in the system at any given time period is the number of new customers arriving plus the customers already in the system less one (if the system is not empty). If the system is empty, the number of customers is simply the number of new customers. It follows that $\left\{X_n: n \ge 0 \right\}$ is a Markov chain with the state space $\left\{0,1,2,\cdots \right\}$, the set of all non-negative integers. The one-step transition probabilities $P_{ij}$ are given by:

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle P[Y=j] &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\ \text{ } & \text{ } \\ \displaystyle P[Y=j-(i-1)] &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\ \end{array} \right.$

where $P[Y=k]$ is the common probability function for the independent numbers of arrivals of customers $Y_1,Y_2,Y_3,\cdots$. If the $Y_n$ have a common binomial distributions with parameters 3 and $p=0.5$. Then the following gives the transition probability matrix.

$\displaystyle \mathbf{P} = \bordermatrix{ & 0 & \text{ } & 1 & \text{ } & 2 & \text{ } & 3 & \text{ } & 4 & \text{ } & 5 & \text{ } & 6 & \text{ } & 7 & \text{ } & 8 & \cdots \cr 0 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 1 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 2 & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 3 & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 4 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 5 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \cdots \cr 6 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \cdots \cr 7 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \cdots \cr 8 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \cdots \cr \vdots & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } \cr } \qquad$

If the customer arrivals have a common Poisson distribution with parameter $\lambda=0.2$ (per period), then the following gives the transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle \frac{e^{-0.2} \ 0.2^j}{j!} &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\ \text{ } & \text{ } \\ \displaystyle \frac{e^{-0.2} \ 0.2^{j-(i-1)}}{[j-(i-1)]!} &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\ \end{array} \right.$

Example 6 – Discrete Branching Chain

Suppose that we have a system of particles that can generate new particles of the same type, e.g. particles such as neutrons and bacteria.

In such a system, we assume that a particle is replaced by a ransom number of new particles (regarded as offspring of the original particle). Furthermore, we assume that each particle generates offspring independently and offspring generation follows the same distribution. Let $Y$ be the random variable that is the common distribution for offspring generation across all particles in the system. Let $P[Y=k]$ be the common probability function of the number of offspring of a particle.

For $n \ge 0$, let $X_n$ be the number of particles in the $n$th generation in the system. It follows from the assumptions that with $X_n=i$

$\displaystyle X_{n+1}=\sum \limits_{k=1}^{i} Y_k$

where $Y_1,Y_2, \cdots, Y_i$ are independent random variables with common probability function $P[Y=k]$. Then $\left\{X_n: n \ge 0 \right\}$ is a Markov chain with the state space the set of all non-negative integers. The one-step transition probabilities $P_{ij}$ are

$\displaystyle P_{ij}=P \biggl[\sum \limits_{k=1}^{i} Y_k = j \biggr]$

In other words, the transition probability $P_{ij}$ is defined by the probability that the independent sum $Y_1+\cdots+Y_i$ taking on the next state $j$.

Note that state 0 is an absorbing state, which means extinction. An interesting problems in branching chains is the computation of the probability of extinction. One approach is to compute the probability of extinction for a branching chain with a single particle, i.e. the probability $\rho$ of starting in state 1 and being absorbed into state 0. Then the probability of extinction for a branching chain starting with $k$ particles is then $\rho^k$ since the particles are assumed to generate new particles independently.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

Chapman-Kolmogorov Equations

Stochastic processes and Markov chains are introduced in this previous post. Transition probabilities are an integral part of the theory of Markov chains. The post preceding this one is a beginning look at transition probabilities. This post shows how to calculate the $n$-step transition probabilities. The Chapman-Kolmogorov equations are also discussed and derived.

Introductory Example

Suppose $\left\{X_n: n=0,1,2,\cdots \right\}$ is a Markov chain with the transition probability matrix $\mathbf{P}$. The entries of the matrix are the one-step transition probabilities $P_{ij}$. The number $P_{ij}$ is the probability that the Markov chain will move to state $j$ at time $m+1$ given that it is in state $i$ at time $m$, independent of where the chain was prior to time $m$. Thus $P_{ij}$ can be expressed as the following conditional probability.

$(1) \ \ \ \ \ P_{ij}=P[X_{m+1}=j \lvert X_m=i]$

Thus the future state only depends on the period immediately preceding it. This is called the Markov property. Also note that $P_{ij}$ is independent of the time period $m$. Any Markov chain with this property is called a time-homogeneous Markov chain or stationary Markov chain.

The focus of this post is to show how to calculate the probability that the Markov chain will move to state $j$ at time $n+m$ given that it is in state $i$ at time $m$. This probability is denoted by $P_{ij}^n$. In other words, $P_{ij}^n$ is the probability that the chain will be in state $j$ after the Markov chain goes through $n$ more periods given that it is in state $i$ in the current period. The probability $P_{ij}^n$, as a conditional probability, can be notated as follows:

$(2) \ \ \ \ \ P_{ij}^n=P[X_{m+n}=j \lvert X_m=i]$

In (2), the $n$ in the $n$-step transition probabilities satisfies $n \ge 0$. Note that when $n=0$, $P_{ij}^0=1$ for $i=j$ and $P_{ij}^0=0$ for $i \ne j$. Including the case for $n=0$ will make the Chapman-Kolmogorov equations work better.

Before discussing the general method, we use examples to illustrate how to compute 2-step and 3-step transition probabilities. Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.6 & 0.2 & 0.2 \cr 1 & 0.3 & 0.5 & 0.2 \cr 2 & 0.4 & 0.3 & 0.3 \cr } \qquad$

Calculate the two-step transition probabilities $P_{02}^2$, $P_{12}^2$ and $P_{22}^2$. Then calculate the three-step transition probability $P_{02}^3$ using the two-step transition probabilities.

First, let’s handle $P_{02}^2$. We can condition on the first steps. To go from state 0 to state 2 in two steps, the chain must first go to an interim state and then from that state go to state 2.

\displaystyle \begin{aligned}P_{02}^2&=\sum \limits_{k=0}^2 P_{0k} \ P_{k2} \\&=P_{00} \ P_{02}+P_{01} \ P_{12}+P_{02} \ P_{22} \\&=0.6 \times 0.2+0.2 \times 0.2+0.2 \times 0.3 \\&=0.22 \end{aligned}

Note that the above calculation lists out the three possible paths to go from state 0 to state 2 in two steps – from state 0 to state 0 and then from state 0 to state 2, from state 0 to state 1 and then from state 1 to state 2 and from state 0 to state 2 and then from state 2 to state 2. Looking at the calculation more closely, the above calculation is basically the first row of $\mathbf{P}$ (the row corresponding to state 0) multiplying the third column of $\mathbf{P}$ (the column corresponding to state 2).

$P_{02}^2= \left[\begin{array}{ccc} 0.6 & 0.2 & 0.2 \\ \end{array}\right] \left[\begin{array}{c} 0.2 \\ 0.2 \\ 0.3 \end{array}\right]= \left[\begin{array}{c} 0.22 \end{array}\right]$

By the same idea, the following gives the other two 2-step transition probabilities.

\displaystyle \begin{aligned}P_{12}^2&=\sum \limits_{k=0}^2 P_{1k} \ P_{k2} \\&=P_{10} \ P_{02}+P_{11} \ P_{12}+P_{12} \ P_{22} \\&=0.3 \times 0.2+0.5 \times 0.2+0.2 \times 0.3 \\&=0.22 \end{aligned}

\displaystyle \begin{aligned}P_{22}^2&=\sum \limits_{k=0}^2 P_{2k} \ P_{k2} \\&=P_{20} \ P_{02}+P_{21} \ P_{12}+P_{22} \ P_{22} \\&=0.4 \times 0.2+0.3 \times 0.2+0.3 \times 0.3 \\&=0.23 \end{aligned}

As discussed, the above two calculations can be viewed as the sum of all the possible paths to go from the beginning state to the end state (conditioning on the interim state in the middle) or as a row in the transition probability $\mathbf{P}$ multiplying a column in $\mathbf{P}$. The following shows all three calculations in terms of matrix calculation.

$P_{02}^2= \left[\begin{array}{ccc} 0.6 & 0.2 & 0.2 \\ \end{array}\right] \left[\begin{array}{c} 0.2 \\ 0.2 \\ 0.3 \end{array}\right]= \left[\begin{array}{c} 0.22 \end{array}\right]$

$P_{12}^2= \left[\begin{array}{ccc} 0.3 & 0.5 & 0.2 \\ \end{array}\right] \left[\begin{array}{c} 0.2 \\ 0.2 \\ 0.3 \end{array}\right]= \left[\begin{array}{c} 0.22 \end{array}\right]$

$P_{22}^2= \left[\begin{array}{ccc} 0.4 & 0.3 & 0.3 \\ \end{array}\right] \left[\begin{array}{c} 0.2 \\ 0.2 \\ 0.3 \end{array}\right]= \left[\begin{array}{c} 0.23 \end{array}\right]$

The view of matrix calculation will be crucial in understanding Chpan-Kolmogorov equations discussed below. To conclude the example, consider the 3-step transition probability $P_{02}^3$. We can also condition on the first step. The chain goes from state 0 to the next step (3 possibilities) and then goes from that state to state 2 in two steps.

\displaystyle \begin{aligned} P_{02}^3&=\sum \limits_{k=0}^2 P_{0k} \ P_{k2}^2 \\&=P_{00} \ P_{02}^2+P_{01} \ P_{12}^2+P_{02} \ P_{22}^2 \\&=0.6 \times 0.22+0.2 \times 0.22+0.2 \times 0.23 \\&=0.222 \end{aligned}

The example shows that the calculation of a 3-step transition probability is based 2-step probabilities. Thus longer length transition probabilities can be built up from shorter length transition probabilities. However, it pays to focus on a more general framework before carrying out more calculation. The view of matrix calculation demonstrated above will help in understanding the general frame work.

Chapman-Kolmogorov Equations

The examples indicate that finding $n$-step transition probabilities involve matrix calculation. Let $\mathbf{Q}_n$ be the $n$-step transition probability matrix. The goal now is to have a systematic way to compute the entries in the matrix $\mathbf{Q}_n$. The computation is based on the Chapman-Kolmogorov equations. These equations are:

$\displaystyle (3) \ \ \ \ \ P_{ij}^{n+m}=\sum \limits_{k=0}^\infty P_{ik}^n \times P_{kj}^m$

for $n, m \ge 0$ and for all $i,j$. For a finite-state Markov chain, the summation in (3) does not go up to infinity and would have an upper limit. The number $P_{ij}^{n+m}$ is the probability that the chain will be in state $j$ after taking $n+m$ steps if it is currently in state $i$. The following gives the derivation of (3).

\displaystyle \begin{aligned} P_{ij}^{n+m}&=P[X_{n+m}=j \lvert X_0=i] \\&=\sum \limits_{k=0}^\infty P[X_{n+m}=j, X_n=k \lvert X_0=i]\\&=\sum \limits_{k=0}^\infty \frac{P[X_{n+m}=j, X_n=k, X_0=i]}{P[X_0=i]} \\&=\sum \limits_{k=0}^\infty \frac{P[X_n=k,X_0=i]}{P[X_0=i]} \times \frac{P[X_{n+m}=j, X_n=k, X_0=i]}{P[X_n=k,X_0=i]} \\&=\sum \limits_{k=0}^\infty P[X_n=k \lvert X_0=i] \times P[X_{n+m}=j \lvert X_n=k, X_0=i] \\&=\sum \limits_{k=0}^\infty P[X_n=k \lvert X_0=i] \times P[X_{n+m}=j \lvert X_n=k] \ * \\&=\sum \limits_{k=0}^\infty P_{ik}^n \times P_{kj}^m \end{aligned}

Here’s the idea behind the derivation. The path from state $i$ to state $j$ in $n+m$ steps can be broken down into two paths, one from state $i$ to an intermediate state $k$ in the first $n$ transitions, and the other from state $k$ to state $j$ in the remaining $m$ transitions. Summing over all intermediate states $k$ yields the probability that the process will go from state $i$ to state $j$ in $n+m$ transitions.

The entries in the matrix $\mathbf{Q}_n$ can be then computed by (3). There is an interesting an useful interpretation of (3). The following is another way to state the Chapman-Kolmogorov equations:

$(4) \ \ \ \ \ \mathbf{Q}_{n+m}=\mathbf{Q}_n \times \mathbf{Q}_m$.

A typical element of the matrix $\mathbf{Q}_n$ is $P_{ik}^n$ and a typical element of the matrix $\mathbf{Q}_m$ is $P_{kj}^m$. Note that $P_{ik}^n$ with $k$ varying is a row in $\mathbf{Q}_n$ (the row corresponding to the state $i$) and that $P_{kj}^n$ with $k$ varying is a column in $\mathbf{Q}_m$ (the column corresponding to the state $j$).

$\left[\begin{array}{cccccc} P_{i0}^n & P_{i1}^n & \cdots & P_{ik}^n & \cdots & P_{iw}^n \\ \end{array}\right] \left[\begin{array}{c} P_{0j}^m \\ \text{ } \\ P_{1j}^m \\ \vdots \\ P_{kj}^m \\ \vdots \\ P_{wj}^m \end{array}\right]$

The product of the above row and column is the transition probability $P_{ij}^{n+m}$.

Powers of the One-Step Transition Probability Matrix

Let $\mathbf{P}$ be the one-step transition probability matrix of a Markov chain. Let $\mathbf{Q}_n$ be the $n$-step transition probability matrix, which can be computed by using the Chapman-Kolmogorov equations. We now derive another important fact.

The $n$-step transition probability matrix $\mathbf{Q}_n$ is obtained by multiplying the one-step transition probability matrix $\mathbf{P}$ by itself $n$ times, i.e. $\mathbf{Q}_n$ is the $n$th power of $\mathbf{P}$.

This fact is important in terms of calculation of transition probabilities. Compute $\mathbf{P}^n$, the $n$th power of $\mathbf{P}$ (in terms of matrix calculation). Then $P_{ij}^n$ is simply an entry in $\mathbf{P}^n$ (in the row that corresponds to state $i$ and in the column that corresponds to state $j$). If the matrix size is large and/or $n$ is large, the matrix multiplication can be done using software or by using an online matrix calculator (here’s one matrix calculator).

Of course, the above fact is also important Markov chain theory in general. Almost all mathematical properties of Markov chains are at root based on this basic fact.

We can establish this basic fact using an induction argument. Clearly $\mathbf{Q}_1=\mathbf{P}^1=\mathbf{P}$. Suppose that the fact is true for $n-1$. Based on (4), $\mathbf{Q}_{n}=\mathbf{Q}_{n-1} \times \mathbf{Q}_1$. Continuing the derivation: $\mathbf{Q}_{n}=\mathbf{Q}_{n-1} \times \mathbf{Q}_1=\mathbf{P}^{n-1} \times \mathbf{P}=\mathbf{P}^n$.

The Row Vector and the Column Vector

As demonstrated in the preceding section, $\mathbf{P}^n$, the $n$th power of the $\mathbf{P}$, is precisely the $n$-step transition probability matrix. Let’s examine the matrix $\mathbf{P}^n$ more closely. Suppose that the Markov chain has a state space $\left\{0,1,2,\cdots,w \right\}$. The following shows the matrix $\mathbf{P}^n$ with special focus on a generic row and a generic column.

$\mathbf{P}^n= \left[\begin{array}{ccccccc} P_{0,0}^n & P_{0,1}^n & \cdots & P_{0,k}^n & \cdots & P_{0,w}^n \\ P_{1,0}^n & P_{1,1}^n & \cdots & P_{1,k}^n & \cdots & P_{1,w}^n \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ P_{i,0}^n & P_{i,1}^n & \cdots & P_{i,k}^n & \cdots & P_{i,w}^n \\ \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\ P_{w,0}^n & P_{w,1}^n & \cdots & P_{w,k}^n & \cdots & P_{w,w}^n \\ \end{array}\right]$

Now look at the row corresponding to the state $i$ and call it $R_i$. Also look at the column corresponding to the state $k$ and call it $C_k$. They are separated from the matrix $\mathbf{P}^n$ below.

$R_i=\left[\begin{array}{cccccc} P_{i,0}^n & P_{i,1}^n & \cdots & P_{i,k}^n & \cdots & P_{i,w}^n \\ \end{array}\right] \ \ \ \ \ \text{row vector}$

$C_k=\left[\begin{array}{c} P_{0,k}^n \\ \text{ } \\ P_{1,k}^n \\ \vdots \\ P_{i,k}^n \\ \vdots \\ P_{w,k}^n \end{array}\right] \ \ \ \ \ \text{column vector}$

The row vector $R_i$ is a conditional distribution. It gives the probabilities of the process transitioning (after $n$ transitions) into one of the states given the process starts in state $i$. If it is certain that the process begins in state $i$, then $R_i$ gives the probability function for the random variable $X_n$ since $P[X_n=k]=P_{i,k}^n$.

On the other hand, if the initial state is not certain, then the column vector $C_k$ gives the probabilities of the process transitioning (after $n$ transitions) into state $k$ from any one of the possible initial states. Thus a column from the matrix $\mathbf{P}^n$ gives the probability of the process in a given state at the $n$th period from all possible initial states. The following gives the probability distribution for $X_n$ (the state of the Markov chain at time $n$).

$\displaystyle (5) \ \ \ \ \ P[X_n=k]=\sum \limits_{i=0}^\infty P_{i,k}^n \times P[X_0=i]$

The calculation in (5) can be viewed as matrix calculation. If we put the probabilities $P[X_0=i]$ in a row vector, then the product of this row vector with the column vector $C_k$ would be $P[X_n=k]$. The following is (5) in matrix multiplication.

$(6) \ \ \ \ \ P[X_n=k]=\left[\begin{array}{cccccc} P(X_0=0) & P(X_0=1) & \cdots & P(X_0=j) & \cdots & P(X_0=w) \\ \end{array}\right] \left[\begin{array}{c} P_{0k}^n \\ \text{ } \\ P_{1k}^n \\ \vdots \\ P_{jk}^n \\ \vdots \\ P_{wk}^n \end{array}\right]$

Even though the matrix calculation for $P_{ij}^n$ should be done using software, it pays to understand the orientation of the matrix $\mathbf{P}^n$. In the preceding section, we learn that a row of $\mathbf{P}^n$ is the conditional distribution $X_n \lvert X_0$ (the state of the process at time $n$ given the initial state). In the preceding section, we also learn that a column in the matrix $\mathbf{P}^n$ gives the probabilities of the process landing in a fixed state $k$ from any one of the initial states.

The Chapman-Kolmogorov equations in (3) tells us that an entry in the matrix $\mathbf{P}^{n+m}$ is simply the product of a row in $\mathbf{P}^n$ and a column in $\mathbf{P}^m$. This observation makes it possible to focus just on the transition probability that is asked in a given problem rather than calculating the entire matrix. For example, to calculate an entry $P_{ij}^2$ of the matrix $\mathbf{P}^2$, multiply a row of $\mathbf{P}$ (the row corresponding to the state $i$) and a column of $\mathbf{P}$ (the row corresponding to state $j$). There is no need to calculate the entire matrix $\mathbf{P}^2$ if the goal is just one entry of $\mathbf{P}^2$.

To calculate an entry of $\mathbf{P}^n$, there is a considerable amount of variation in multiplication approaches. For example, multiple a row of $\mathbf{P}$ and a column of $\mathbf{P}^{n-1}$ or multiple a row of $\mathbf{P}^{n-1}$ and a column of $\mathbf{P}$. Or multiple a row of $\mathbf{P}^2$ and a column of $\mathbf{P}^{n-2}$.

If a row of transition probabilities multiplies a transition matrix, the result is a row in a higher transition probability matrix. For example, if a row in $\mathbf{P}$ multiplies the matrix $\mathbf{P}^n$, the result is a row in $\mathbf{P}^{n+1}$. More specifically, the first row in $\mathbf{P}$ multiplying the matrix $\mathbf{P}$ gives the first row of $\mathbf{P}^2$. The first row of $\mathbf{P}$ multiplying the matrix $\mathbf{P}^2$ gives the first row in $\mathbf{P}^3$, etc.

On the other hand, when a transition probability matrix multiplies a column of transition probabilities, the result is a column in a higher transition probability matrix. For example, the matrix $\mathbf{P}^n$ multiplies a column in $\mathbf{P}$, the result is a column in $\mathbf{P}^{n+1}$. More specifically, when the matrix $\mathbf{P}$ multiples the first column in the matrix $\mathbf{P}$, the result is the first column of $\mathbf{P}^2$. When the matrix $\mathbf{P}^2$ multiples a column in the matrix $\mathbf{P}$, the result is the first column in $\mathbf{P}^3$, etc.

Examples

We now give some examples on how to calculate transition probabilities.

Example 1
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.6 & 0.2 & 0.2 \cr 1 & 0.3 & 0.5 & 0.2 \cr 2 & 0.4 & 0.3 & 0.3 \cr } \qquad$

Determine $P[X_3=1 \lvert X_0=0]$ and $P[X_5=2 \lvert X_2=0]$.

Of course, the most straightforward way would be to calculate $\mathbf{P}^3$. Then $P[X_3=1 \lvert X_0=0]=P_{01}^3$ would be the (0,1)th entry of $\mathbf{P}^3$ and $P[X_5=2 \lvert X_2=0]=P_{02}^3$ would be the (0,1)th entry of $\mathbf{P}^3$. In fact, it is a good practice to use an online matrix calculator for this task. Doing so produces the following marrices.

$\mathbf{P}^2 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.50 & 0.28 & 0.22 \cr 1 & 0.41 & 0.37 & 0.22 \cr 2 & 0.45 & 0.32 & 0.23 \cr } \qquad \ \ \ \ \ \ \mathbf{P}^3 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.472 & 0.306 & 0.222 \cr 1 & 0.445 & 0.333 & 0.222 \cr 2 & 0.458 & 0.319 & 0.223 \cr } \qquad$

From the matrix $\mathbf{P}^3$, we see that $P_{01}^3=0.306$ and $P_{02}^3=0.222$. Note that $P_{02}$ is calculated in the introductory example earlier.

Example 2
For the transition matrix $\mathbf{P}$ in Example 1, use the ideas discussed in the section Additional Comments on Chapman-Kolmogorov to compute the first row and the first column in the transition matrix $\mathbf{P}^4$.

$\left[\begin{array}{ccc} 0.6 & 0.2 & 0.2 \\ \end{array}\right] \left[\begin{array}{ccc} 0.472 & 0.306 & 0.222 \\ 0.445 & 0.333 & 0.222 \\ 0.458 & 0.319 & 0.223 \end{array}\right] = \left[\begin{array}{ccc} 0.4638 & 0.314 & 0.2222 \end{array}\right]= \left[\begin{array}{ccc} P_{00}^4 & P_{01}^4 & P_{02}^4 \end{array}\right]$

$\left[\begin{array}{ccc} 0.472 & 0.306 & 0.222 \\ 0.445 & 0.333 & 0.222 \\ 0.458 & 0.319 & 0.223 \end{array}\right] \left[\begin{array}{c} 0.6 \\ 0.3 \\ 0.4 \end{array}\right] = \left[\begin{array}{c} 0.4638 \\ 0.4557 \\ 0.4597 \end{array}\right]= \left[\begin{array}{c} P_{00}^4 \\ P_{10}^4 \\ P_{20}^4 \end{array}\right]$

As discussed in earlier, the first row of $\mathbf{P}^4$ is obtained by multiplying the first row of $\mathbf{P}$ with the matrix $\mathbf{P}^3$. The first column of $\mathbf{P}^4$ is obtained by multiplying the matrix $\mathbf{P}^3$ with the first column of the matrix $\mathbf{P}$.

Example 3
A particle moves among states 0, 1 and 2 according to a Markov process with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.6 & 0.3 & 0.1 \cr 1 & 0.3 & 0.3 & 0.4 \cr 2 & 0.3 & 0.2 & 0.5 \cr } \qquad$

Let $X_n$ be the position of the particle after the $n$th move. Suppose that at the beginning, the particle is in state 1. Determine the probability $P[X_2=k]$ where $k=0,1,2$.

Since the initial state of the process is certain, $P[X_2=k]=P[X_2=k \lvert X_0=1]=P_{1k}^2$. Thus the problem is to find the second row of $\mathbf{P}^2$.

$\left[\begin{array}{ccc} 0.3 & 0.3 & 0.4 \\ \end{array}\right] \left[\begin{array}{ccc} 0.6 & 0.3 & 0.1 \\ 0.3 & 0.3 & 0.4 \\ 0.3 & 0.2 & 0.5 \end{array}\right] = \left[\begin{array}{ccc} 0.39 & 0.26 & 0.35 \end{array}\right]= \left[\begin{array}{ccc} P_{10}^2 & P_{11}^2 & P_{12}^2 \end{array}\right]$

Example 4
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.3 & 0.2 & 0.5 \cr 1 & 0.1 & 0.3 & 0.6 \cr 2 & 0.4 & 0.3 & 0.3 \cr } \qquad$

Suppose that initially the process is equally likely to be in state 0 or state 1. Determine $P[X_3=0]$ and $P[X_3=1]$.

To find $P[X_3=0]$, we need the first column of $\mathbf{P}^3$. To find $P[X_3=1]$, we need the second column of $\mathbf{P}^3$. Using an online calculator produces $\mathbf{P}^3$.

$\mathbf{P}^3 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.288 & 0.269 & 0.443 \cr 1 & 0.283 & 0.270 & 0.447 \cr 2 & 0.295 & 0.273 & 0.432 \cr } \qquad$

The following calculation gives the answers.

$P[X_3=0]=\left[\begin{array}{ccc} 0.5 & 0.5 & 0 \\ \end{array}\right] \left[\begin{array}{c} 0.288 \\ 0.283 \\ 0.295 \end{array}\right]=0.2855$

$P[X_3=1]=\left[\begin{array}{ccc} 0.5 & 0.5 & 0 \\ \end{array}\right] \left[\begin{array}{c} 0.269 \\ 0.270 \\ 0.273 \end{array}\right]=0.2695$

$P[X_3=2]=1-P[X_3=0]-P[X_3=1]=0.445$

Example 5
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.6 & 0.2 & 0.2 \cr 1 & 0.2 & 0.2 & 0.6 \cr 2 & 0 & 0 & 1 \cr } \qquad$

At time 0, the Markov process is in state 0. Let $T=\text{min}\left\{n \ge 0:X_n=2 \right\}$. Note that state 2 is called an absorbing state. The Markov process will eventually reach and be absorbed in state 2 (it stays there forever whenever the process reaches state 2). Thus $T$ is the first time period in which the process reaches state 2. Suppose that the Markov process is being observed and that absorption has not taken place. Then we would be interested in this conditional probability: the probability that the process is in state 0 or state 1 given that the process has not been absorbed. Determine $P[X_4=0 \lvert T>4]$.

The key idea here is that for the event $T>4$ to happen, $X_4=0$ or $X_4=1$. Thus

$P[T>4]=P[X_4=0 \text{ or } X_4=1]=P[X_4=0]+P[X_4=1]=P_{00}^4+P_{01}^4$.

Note that since the initial state is certain to be state 0, $P[X_4=0]=P_{00}^4$ and $P[X_4=1]=P_{01}^4$. Using an online calculator gives the following matrix.

$\mathbf{P}^4 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.1856 & 0.0768 & 0.7376 \cr 1 & 0.0768 & 0.032 & 0.8912 \cr 2 & 0 & 0 & 1 \cr } \qquad$

The following gives the desired result.

$\displaystyle P[X_4=0 \lvert T>4]=\frac{P_{00}^4}{P_{00}^4+P_{01}^4}=\frac{0.1856}{0.1856+0.0768}=0.7073$

Thus if absorption has not taken place in the 4th period, the process is in state 0 about 71% of the time.

Example 6
Continue with the Markov chain in Example 5. Determine $P[T=4]$.

Note that $P[T=4]=P[T>3]-P[T>4]$. The probability $P[T>4]$ is already determined using $\mathbf{P}^4$. To determine $P[T>3]$, we need the top row of $\mathbf{P}^3$.

$\mathbf{P}^3 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.272 & 0.112 & 0.616 \cr 1 & 0.112 & 0.048 & 0.84 \cr 2 & 0 & 0 & 1 \cr } \qquad$

Thus $P[T>3]=P_{00}^3+P_{01}^3=0.272+0.112=0.384$. Thus $P[T=4]=0.384-0.2624=0.1216$. Thus about 12% of the time, absorption takes place in the 4th period about 12% of the time.

Exercises

We give a few practice problems to reinforce the concepts and calculation discussed here. Further practice problems are in a companion blog (here and here).

Exercise 1
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.1 & 0.2 & 0.7 \cr 1 & 0.2 & 0.2 & 0.6 \cr 2 & 0.6 & 0.1 & 0.3 \cr } \qquad$

Determine these conditional probabilities.

• $P[X_3=2 \lvert X_1=1]$
• $P[X_3=2 \lvert X_0=1]$
• $P[X_4=2 \lvert X_0=1]$

Exercise 2
A particle moves among states 1, 2 and 3 according to a Markov process with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 1 & 2 & 3 \cr 1 & 0 & 0.6 & 0.4 \cr 2 & 0.6 & 0 & 0.4 \cr 3 & 0.6 & 0.4 & 0 \cr } \qquad$

Let $X_n$ be the position of the particle after the $n$th move. Determine the probability $P[X_3=1 \lvert X_0=1]$ and $P[X_4=1 \lvert X_0=1]$.

Exercise 3
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.1 & 0.2 & 0.7 \cr 1 & 0.9 & 0.1 & 0 \cr 2 & 0.1 & 0.8 & 0.1 \cr } \qquad$

Suppose that initially the process is equally likely to be in state 0 or state 1 or state 2. Determine the distribution for $X_3$.

Exercise 4
Continue with Example 5 and Example 6. Work these two examples assuming that the Markov chain starts in state 1 initially.

$\text{ }$

$\text{ }$

$\text{ }$

Exercise 1

• $P[X_3=2 \lvert X_1=1]=P_{12}^2=0.13$
• $P[X_3=2 \lvert X_0=1]=P_{12}^3=0.16$
• $P[X_4=2 \lvert X_0=1]=P_{12}^4=0.1473$

Exercise 2

• $P[X_3=1 \lvert X_0=1]=P_{11}^3=0.24$
• $P[X_4=1 \lvert X_0=1]=P_{11}^4=0.456$

Exercise 3

• $\displaystyle P[X_3=0]=\frac{1.076}{3}=0.3587$
• $\displaystyle P[X_3=1]=\frac{1.013}{3}=0.3377$
• $\displaystyle P[X_3=2]=\frac{0.911}{3}=0.3037$

Exercise 4

• $\displaystyle P[X_4=0 \lvert T>4]=\frac{P_{10}^4}{P_{10}^4+P_{11}^4}=\frac{0.0768}{0.0768+0.0.032}=0.70588$
• $P[T=4]=P[T>3]-P[T>4]=0.16-0.1088=0.0512$

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

A beginning look at transition probabilities

The previous post introduces the notion of Markov chains, more specifically discrete Markov chains. This and the next post focus on calculating transition probabilities.

Suppose that $\left\{X_n: n \ge 0 \right\}$ is a Markov chain with transition probability matrix $\mathbf{P}$. Elements of $\mathbf{P}$ are the one-step transition probabilities $P_{ij}$. As the Markov process moves through the states over time, the probabilities in the matrix $\mathbf{P}$ shows how likely the process will transition to state $j$ in the next time period if the process is currently in state $i$. Elements of $\mathbf{P}$ are conditional probabilities:

$(1) \ \ \ \ \ P_{ij}=P[X_{n+1}=j \lvert X_n=i]$

The transition probabilities described in (1) are independent of the time period $n$. These are called stationary transition probabilities. Markov chains with stationary transition probabilities are called stationary Markov chains or time-homogeneous Markov chains. One immediate goal is to develop the $n$-step transition probabilities $P_{ij}^n$ from the one-step transition probabilities $P_{ij}$. This is to be developed in the next post. This post focuses on basic calculations with $P_{ij}$.

Examples of Conditional Probabilities

We use examples to demonstrate conditional probabilities that can be calculated from the one-step transition probabilities $P_{ij}$. The calculation is based on the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.6 & 0.2 & 0.2 \cr 1 & 0.3 & 0.5 & 0.2 \cr 2 & 0.4 & 0.3 & 0.3 \cr } \qquad$

Example 1
Determine the following conditional probabilities:

• $P[X_1=0, X_2=1 \lvert X_0=0]$
• $P[X_2=0, X_3=1 \lvert X_1=0]$
• $P[X_1=2, X_2=1,X_3=0 \lvert X_0=1]$

Note that the first two are identical since the transition probabilities are stationary. To calculate the first and third probabilities, it is a matter of linking up the one-step transition probabilities from one state to the next.

\displaystyle \begin{aligned} P[X_1=0, X_2=1 \lvert X_0=0]&=P[X_1=0 \lvert X_0=0] \times P[X_2=1 \lvert X_1=0] \\&=P_{00} \times P_{01} \\&=0.6 \times 0.2=0.12 \end{aligned}

$\displaystyle P[X_2=0, X_3=1 \lvert X_1=0]=P[X_1=0, X_2=1 \lvert X_0=0]=0.12$

\displaystyle \begin{aligned} P[X_1=2, X_2=1,X_3=0 \lvert X_0=1]&=P[X_1=2 \lvert X_0=1] \times P[X_2=1 \lvert X_1=2] \times P[X_3=0 \lvert X_2=1] \\&=P_{12} \times P_{21} \times P_{10} \\&=0.2 \times 0.3 \times 0.3=0.018 \end{aligned}

Example 2
Example 1 shows that the probability of a chain moving from along a path conditional on a starting state is simply the product of the transition probabilities on that path. We now calculate unconditional probability of a path. For this to be possible, we need to assume a distribution for the initial state.

Assume that $P[X_0=0]=0.4$, $P[X_0=1]=0.3$ and $P[X_0=2]=0.3$. Determine the following unconditional probabilities:

• $P[X_0=0,X_1=0,X_2=1]$
• $P[X_1=0,X_2=0,X_3=1]$
• $P[X_0=1,X_1=2, X_2=1,X_3=0]$

To calculate these unconditional probabilities, we multiply the transition probabilities on the path along with the probability of the first state in the path.

\displaystyle \begin{aligned} P[X_0=0,X_1=0,X_2=1]&=P[X_0=0] \times P[X_1=0 \lvert X_0=0] \times P[X_2=1 \lvert X_1=0] \\&=P[X_0=0] \times P_{00} \times P_{01} \\&=0.4 \times 0.6 \times 0.2=0.048 \end{aligned}

To calculate the second probability, we need to first compute $P[X_1=0]$. This can be done by conditioning on the initial state $X_0$.

\displaystyle \begin{aligned} P[X_1=0]&=P[X_0=0] \times P_{00}+ P[X_0=1] \times P_{10} +P[X_0=2] \times P_{20} \\&=0.4 \times 0.6 +0.3 \times 0.3+0.3 \times 0.4=0.45 \end{aligned}

\displaystyle \begin{aligned} P[X_1=0,X_2=0,X_3=1]&=P[X_1=0] \times P[X_2=0 \lvert X_1=0] \times P[X_3=1 \lvert X_2=0]\\&=P[X_1=0] \times P_{00} \times P_{01} \\&=0.45 \times 0.6 \times 0.2 \\&=0.054 \end{aligned}

The following gives the third probability.

\displaystyle \begin{aligned} P[X_0=1,X_1=2, X_2=1,X_3=0]&=P[X_0=1] \times P[X_1=2 \lvert X_0=1] \times P[X_2=1 \lvert X_1=2] \times P[X_3=0 \lvert X_2=1] \\&=P[X_0=1] \times P_{12} \times P_{21} \times P_{10} \\&=0.4 \times 0.2 \times 0.3 \times 0.3 \\&=0.0072 \end{aligned}

Note that each unconditional probability in this example is obtained by multiplying the corresponding conditional probability in Example 1 with the probability of the conditioned event.

Example 3
Sometimes we know for certain that the Markov chain starts in a given state. Then the unconditional probability of a path is simply the conditional probability of that path (conditioned on the initial state). Determine the three probabilities in Example 2 given that the chain starts in state 0 in the first two probabilities and the chain starts in state 1 in the third probability. The probabilities are:

• $P[X_0=0,X_1=0,X_2=1]$
• $P[X_1=0,X_2=0,X_3=1]$
• $P[X_0=1,X_1=2, X_2=1,X_3=0]$

For the first probability, $P[X_0=0,X_1=0,X_2=1]=P[X_0=0] \times P[X_1=0,X_2=1 \lvert X_0=0]$. With $P[X_0=0]=1$, $P[X_0=0,X_1=0,X_2=1]=P[X_1=0,X_2=1 \lvert X_0=0]=0.12$, the answer from Example 1.

For the second probability, we need to first calculate $P[X_1=0]$. Since $P[X_0=0]=1$, $P[X_1=0]=P[X_0=0] \times P_{00}=0.6$. Thus $P[X_1=0,X_2=0,X_3=1]=P[X_1=0] \times P_{00} \times P_{01}=0.6 \times 0.6 \times 0.2=0.072$, which is 0.6 times the corresponding conditional probability in Example 1.

Similarly, the third probability is

\displaystyle \begin{aligned} P[X_0=1,X_1=2, X_2=1,X_3=0]&=P[X_1=2, X_2=1,X_3=0 \lvert X_0=1] \\& =0.018\end{aligned}

Conditional Probabilities of Paths in Markov Chains

The conditional probabilities in the above examples are not difficult to do as long as the basic idea of conditional probabilities and the Markov property are understood. First, the idea of conditional probability. The idea is that the probability of the intersection of two events can be obtained by conditioning on one of the events. For example, given two events $A$ and $B$, we have

$(2a) \ \ \ \ \ P[A \cap B]=P[A] \times P[B \lvert A]$

$\displaystyle (2b) \ \ \ \ \ P[B \lvert A]=\frac{P[A \cap B]}{P[A]}$

The probability of the intersection of two events is the product of a conditional probability of one event given another event times the probability of the event being conditioned. Relation (2b) is the probability of the event $B$ given that event $A$ has occurred. Of course, the conditioned event has to have positive probability, i.e. $P[A]>0$.

Suppose that the event $A$ is $X=x$, the discrete random variable $X$ taking on the value $x$ and that the event $B$ is $Y=y$. Then we have the following relations.

$(3a) \ \ \ \ \ P[X=x,Y=y]=P[X=x] \times P[Y=y \lvert X=x]$

$\displaystyle (3b) \ \ \ \ \ P[Y=y \lvert X=x]=\frac{P[X=x,Y=y]}{P[X=x]}$

We can also write the conditional probability of $X=x$ conditioning on the event $Y=y$. Suppose we have a path in a Markov chain, $X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n$. This is the event that the chain starts in state $x_0$ initially, then move to state $x_1$ in time 1 and so on and at time $n$ the chain is in state $x_0$. Conditioning on the initial state, consider the following conditional probability.

$\displaystyle (4) \ \ \ \ \ P[X_1=x_1,X_2=x_2,\cdots,X_n=x_n \lvert X_0=x_0]=\frac{P[X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n]}{P[X_0=x_0]}$

Relation (4) can be further simplified since the random variables $X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n$ satisfy the Markov property.

\displaystyle \begin{aligned} (5) \ \ \ \ \ P[X_1=x_1,X_2=x_2,\cdots,X_n=x_n \lvert X_0=x_0]&=P[X_1=x_1 \lvert X_0=x_0] \times P[X_2=x_2 \lvert X_1=x_1] \\& \ \ \ \times \cdots \times P[X_n=x_n \lvert X_{n-1}=x_{n-1}] \\&=P_{x_0,x_1} \times P_{x_1,x_2} \times \cdots \times P_{x_{n-1},x_n} \end{aligned}

Relation (5) says that the probability of a path in a Markov chain conditioning on the initial state is simply the product of the transition probabilities along the path. This is derived using the Markov property. Relation (5) can be established by an induction argument on the subscript $n$, i.e. the length of a path.

Clearly $P[X_1=j \lvert X_0=i]=P_{ij}$. This is the fundamental assumption in Markov chains. Now consider $P[X_1=j,X_2=k \lvert X_0=i]$.

\displaystyle \begin{aligned} P[X_1=j,X_2=k \lvert X_0=i]&=\frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i]} \\&=\frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i,X_1=j]} \times \frac{P[X_0=i,X_1=j]}{P[X_0=i]} \\&=P[X_2=k \lvert X_0=i,X_1=j] \times P[X_1=j \lvert X_0=i] \\&=P[X_2=k \lvert X_1=j] \times P[X_1=j \lvert X_0=i] \\&=P_{jk} \times P_{ij} \\&=P_{ij} \times P_{jk} \end{aligned}

Note that $P[X_2=k \lvert X_0=i,X_1=j]=P[X_2=k \lvert X_1=j]$ in the above derivation. This is Markov property in action since the probability of the future state depends only on the state immediately preceding it. Now that relation (5) is true for a path of length 2, the following shows how it is derived for length 3.

\displaystyle \begin{aligned} P[X_1=j,X_2=k,X_3=l \lvert X_0=i]&=\frac{P[X_0=i,X_1=j,X_2=k,X_3=l]}{P[X_0=i]} \\&=\frac{P[X_0=i,X_1=j,X_2=k,X_3=l]}{P[X_0=i,X_1=j,X_2=k]} \times \frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i]} \\&=P[X_3=l \lvert X_0=i,X_1=j,X_2=k] \times P[X_1=j,X_2=k \lvert X_0=i] \\&=P[X_3=l \lvert X_2=k] \times P[X_1=j,X_2=k \lvert X_0=i] \\&=P_{ij} \times P_{jk} \times P_{kl} \end{aligned}

Note that going from the third line to the fourth line the Markov property is used. Relation (5) for length 2 is used at the end. The inductive proof keeps going and relation (5) is true for any path of arbitrary length.

The conditional probability in (4) by itself may not be special. With the additional assumption of Markov property, the conditional probability can be made to be a “chain” of one-step transition probabilities.

Using the idea of (3a), the unconditional probability of a path is:

\displaystyle \begin{aligned} (6) \ \ \ \ \ P[X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n]&=P[X_0=x_0] \times P_{x_0,x_1} \\& \ \ \times P_{x_1,x_2} \times \cdots \times P_{x_{n-1},x_n} \end{aligned}

Relation (6) says that the unconditional probability of a path in a Markov chain is simply the conditional probability of the path conditioning on the initial state times the probability of the initial state.

More Examples

The earlier examples demonstrate relation (5) and relation (6). We now work a few more examples.

Example 4
There is a slightly different way to look at one unconditional probability $P[X_1=0,X_2=0,X_3=1]$ in Example 2. The first state in this path is $X_1=0$. We can condition on the initial state $X_0$.

$\displaystyle P[X_1=0,X_2=0,X_3=1]=\sum \limits_{k=0}^2 P[X_0=k,X_1=0,X_2=0,X_3=1]$

Then every one of the unconditional probability within the summation sign can be computed according to the idea in (6). Note that Example 2 gives the distribution of the initial state.

\displaystyle \begin{aligned} P[X_1=0,X_2=0,X_3=1]&=P[X_0=0,X_1=0,X_2=0,X_3=1]+ \\& \ \ P[X_0=1,X_1=0,X_2=0,X_3=1]+P[X_0=2,X_1=0,X_2=0,X_3=1] \\&=P[X_0=0] \times P_{00} \times P_{00} \times P_{01} +\\&\ \ P[X_0=1] \times P_{10} \times P_{00} \times P_{01}+P[X_0=2] \times P_{20} \times P_{00} \times P_{01} \\&=0.4 \times 0.6 \times 0.6 \times 0.2+\\&\ \ 0.3 \times 0.3 \times 0.6 \times 0.2 + 0.3 \times 0.4 \times 0.6 \times 0.2\\&=0.054 \end{aligned}

In calculating probabilities in a Markov chain, conditioning on the first step is a useful technique that will be used often in the subsequent posts.

Example 5
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 \cr 0 & 0.5 & 0.2 & 0.2 & 0.1 \cr 1 & 0.3 & 0.3 & 0.2 & 0.2 \cr 2 & 0.3 & 0.2 & 0.2 & 0.3 \cr 3 & 0 & 0 & 0 & 1 } \qquad$

State 3 is called an absorbing state since $P_{33}=1$. In other words, when the chain reaches state 3, it will never leave. Suppose that initially the Markov chain is in state 0 50% of the time, in state 1 30% of the time and in state 2 20% of the time. Calculate the probability $P[X_1=0,X_2=1,X_3=0,X_4=2]$.

We can condition on the initial state as in Example 4.

\displaystyle \begin{aligned} P[X_1=0,X_2=1,X_3=0,X_4=2]&=\sum \limits_{k=0}^2 P[X_0=k,X_1=0,X_2=1,X_3=0,X_4=2] \\&=P[X_0=0,X_1=0,X_2=1,X_3=0,X_4=2]+ \\&\ \ P[X_0=1,X_1=0,X_2=1,X_3=0,X_4=2]+\\&\ \ P[X_0=2,X_1=0,X_2=1,X_3=0,X_4=2] \\&=P[X_0=0] \times P_{00} \times P_{01} \times P_{10} \times P_{02}+ \\& \ \ P[X_0=1] \times P_{10} \times P_{01} \times P_{10} \times P_{02}+\\&\ \ P[X_0=2] \times P_{20} \times P_{01} \times P_{10} \times P_{02} \\&=0.0048\end{aligned}

Example 6
This is Example 1 in this previous post.

When a binary digit, 0 or 1, is transmitted through a communication system, it passes through several stages. At each stage, there is a probability $p$ that the digit is transmitted in error. Let $X_n$ be the digit that is the output at the $n$th stage. Then $\left\{X_n: n \ge 0 \right\}$ is a two-state Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & \text{ } & 1 \cr 0 & 1-p & \text{ } & p \cr \text{ } & \text{ } & \text{ } &\text{ } \cr 1 & p & \text{ } & 1-p \cr }$

Determine the following:

• If the digit to be sent is 0, determine the probability that no error occurs up to the second stage.
• If the digit to be sent is 0, determine the probability that a correct message is received at stage 2 through stage 4.

The first probability is $P[X_1=0,X_2=0 \lvert X_0=0]$. This is $P_{00} \times P_{00}=(1-p)^2$. To calculate the second probability, we can condition on the state in time 1. We first calculate $P[X_1=0]$ and $P[X_1=1]$. Since we know for certain that the initial state is 0, $P[X_1=0]=P_{00}=1-p$ and $P[X_1=1]=P_{01}=p$.

\displaystyle \begin{aligned} P[X_2=0,X_3=0,X_4=0]&=P[X_1=0,X_2=0,X_3=0,X_4=0]+P[X_1=1,X_2=0,X_3=0,X_4=0] \\&=P[X_2=0,X_3=0,X_4=0 \lvert X_1=0] \times P[X_1=0]+ \\&\ \ P[X_2=0,X_3=0,X_4=0 \lvert X_1=1] \times P[X_1=1] \\&=P_{00} \times P_{00} \times P_{00} \times (1-p)+ \\&\ \ P_{10} \times P_{00} \times P_{00} \times p\\&=(1-p)^4+p^2 \times (1-p)^2 \end{aligned}

Remarks

The calculation in this post involves very basic calculation with one-step transition probabilities, i.e. the elements of the transition probability matrix given in a Markov chain. The $n$-step transition probabilities are key to the development of the theory of Markov chains. The next post begins the discussion of Chapman-Kolmogorov equations.

Practice problems are in this companion blog.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma