# Last step in an absorbing Markov chain

The preceding three posts are devoted to a problem involving absorbing Markov chains (finding the mean time to absorption and the probability of absorption). The links to these three posts are here, here and here. One method in solving this problem is to use the fundamental matrix. The same method can be used to solve a problem involving the last step before being absorbed. Here are two examples.

Example 1
Six coins are tossed all at once. The coins randomly fall heads or tails. For the coins that fall tails, you pick up and toss again. The process continues in such a way that after each toss, you pick up the coins that fall tails and toss again. The process stops until all coins show heads. Let $Y$ be the number of coins in the last toss. Determine the probability distribution of the random variable $Y$.

Note that $Y$ can range from 6 down to 1. If the first toss results in 6 heads, there is no need to toss again, meaning that $Y=6$, which is not very likely. The problem is to determine how likely it is for $Y$ to be a given value.

Example 2
A maze has 7 areas with two of the areas being food sources (area 5 and area 7)

A maze with two food sources (areas 5 and 7)

A mouse is placed into this maze. When the mouse is in an area with food, it stays there and does not leave. When the mouse is in an area with no food, it moves into the adjacent areas at random. In other words, if the area has $k$ exits, the mouse moves to one of these $k$ areas with probability $1/k$.

If the mouse is in one of the non-food areas, then the mouse will be absorbed into one of the food areas. We are interested in identifying the non-food area before reaching a food area. The areas adjacent to a food area would be 2, 4 and 6. These would be the entry points for food. When the mouse finds food, how likely is the entry point to be area 2, area 4 or area 6? More specifically, if the mouse is in area $i$ where $i=1,2,3,4,6$, determine the probability that the mouse transitions into a food area from area $k$ where $k=2,4,6$.

A common idea for these two examples and others like them is that we are interested in locating the transient state from which the process enters an absorbing state. We would like to find the probabilities of the various transient states being the entry point to absorption given an initial state. We discuss these two examples after demonstrating the method using a smaller example.

Illustrative Example

A Markov chain has 4 states (0, 1, 2 and 3). The process cycles among these states according to the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 \cr 0 & 1 & 0 & 0 & 0 \cr 1 & 0.3 & 0.4 & 0.2 & 0.1 \cr 2 & 0.1 & 0.2 & 0.3 & 0.4 \cr 3 & 0 & 0 & 0 & 1 \cr } \qquad$

There are two absorbing states (0 and 3) and two transient states (1 and 2). If the process starts at a transient state, the process will eventually be absorbed (be transitioned into an absorbing state). The question we want to answer is this. What is the transient state from which the process enters an absorbing state? In this example, the process can enter an absorbing state from state 1 or state 2. Given that the process is in state 1 initially, what is the probability that the process enters absorption from state 1? Likewise from state 2? The same question can be asked if the process starts at state 2 initially.

Let $\{ X_n,n=0,1,2,\cdots \}$ be the chain described by the above matrix. Let $T=\text{min}\{ n \ge 0: X_n=0 \text{ or } X_n=3 \}$. The quantity $T$ is the random time to absorption. The quantity $T-1$ is the time period before absorption. Thus $X_{T-1}$ is the last transient state before absorption. We calculate the following probabilities.

$P(X_{T-1}=1 \lvert X_0=1)$

$P(X_{T-1}=2 \lvert X_0=1)$

$P(X_{T-1}=1 \lvert X_0=2)$

$P(X_{T-1}=2 \lvert X_0=2)$

The first two probabilities form the conditional distribution of the last transient state given that the initial state is 1. The last two probabilities form the conditional distribution of the last transient state given that the initial state is 2. To find these probabilities, it will be helpful to perform calculations involving the fundamental matrix.

$Q=\left[\begin{array}{cc} 0.4 & 0.2 \\ 0.2 & 0.3 \\ \end{array}\right] \ \ \ \ \ \ \ \ I-Q=\left[\begin{array}{cc} 1 & 0 \\ 0 & 1 \\ \end{array}\right]- \left[\begin{array}{cc} 0.4 & 0.2 \\ 0.2 & 0.3 \\ \end{array}\right]= \left[\begin{array}{cc} 0.6 & -0.2 \\ - 0.2 & 0.7 \\ \end{array}\right]$

$W=(I-Q)^{-1}= \left[\begin{array}{cc} 0.6 & -0.2 \\ - 0.2 & 0.7 \\ \end{array}\right]^{-1}= \left[\begin{array}{cc} 35/19 & 10/19 \\ 10/19 & 30/19 \\ \end{array}\right]= \left[\begin{array}{cc} W_{11} & W_{12} \\ W_{21} & W_{22} \\ \end{array}\right]$

See here for more information on the calculation. The matrix $Q$ is the part of $\mathbf{P}$ consisting of the transient states. The matrix $I-Q$ is the difference between the identity matrix and $Q$. The matrix $W=(I-Q)^{-1}$, the inverse of $I-Q$, is the fundamental matrix for the absorbing Markov chain.

The matrix $W=(I-Q)^{-1}$ gives the mean times spent in transient states. For example, if the process starts in state 1, it spend 35/19 = 1.842 periods in state 1 and 10/19 = 0.526 periods in state 2. In total, it spends on average 45/19 = 2.368 periods in transient states before being absorbed (if the initial state is 1). The following shows the calculation for the case of initial state being 1.

$\displaystyle P(X_{T-1}=1 \lvert X_0=1)=W_{11} \times P_{10}+W_{11} \times P_{13}=\frac{35}{19} (0.3+0.1)=\frac{14}{19}=0.7368$

$\displaystyle P(X_{T-1}=2 \lvert X_0=1)=W_{12} \times P_{20}+W_{12} \times P_{23}=\frac{10}{19} (0.1+0.4)=\frac{5}{19}=0.2632$

The calculation shows that if the initial state is 1, it is much more likely for the process to enter an absorbing state from state 1 than from state 2 (almost 74% chance). Before giving an indication why the calculation works, here’s the calculation for the initial state being 2.

$\displaystyle P(X_{T-1}=1 \lvert X_0=2)=W_{21} \times P_{10}+W_{21} \times P_{13}=\frac{10}{19} (0.3+0.1)=\frac{4}{19}=0.2105$

$\displaystyle P(X_{T-1}=2 \lvert X_0=2)=W_{22} \times P_{20}+W_{22} \times P_{23}=\frac{30}{19} (0.1+0.4)=\frac{15}{19}=0.7895$

If the process starts from state 2 instead, it is more likely (almost 79% chance) that the process enters an absorbing state from state 2.

Here’s the thought process for obtaining these probabilities. Consider $P(X_{T-1}=1 \lvert X_0=1)$. The starting transient state is 1 and the ending transient state is also 1. Then we look at the mean time in transient state $W_{11}$. From the ending transient state 1, the process can go into the absorbing state 0 or 3. Thus the desired probability is $W_{11} \times P_{10}+W_{11} \times P_{13}=W_{11} (P_{10}+P_{13})$.

If the starting state is 1 and the ending transient state is 2, we would take the sum of mean time $W_{12}$ times $P_{2k}$ for $k=0,3$. Thus $P(X_{T-1}=2 \lvert X_0=2)$ would be $W_{12} \times P_{20}+W_{12} \times P_{23}=W_{12} (P_{20}+P_{23})$.

In general $P(X_{T-1}=j \lvert X_0=i)$ is the sum of $W_{ij} \times P_{jk}$ over all absorbing states $k$ from which it is possible to go from $j$ to $k$. To get a sense of why this idea works, let’s look at the the probabilities of absorption calculated from the fundamental matrix $W$.

$Q \times R= \left[\begin{array}{cc} W_{11} & W_{12} \\ W_{21} & W_{22} \\ \end{array}\right] \times \left[\begin{array}{cc} P_{10} & P_{13} \\ P_{20} & P_{23} \\ \end{array}\right]= \left[\begin{array}{ccc} W_{11} \cdot P_{10}+W_{12} \cdot P_{20} & \text{ } & W_{11} \cdot P_{13}+W_{12} \cdot P_{23} \\ W_{21} \cdot P_{10}+W_{22} \cdot P_{20} & \text{ } & W_{21} \cdot P_{13}+W_{22} \cdot P_{23} \\ \end{array}\right]$

To make it easier to see, we rewrite the last matrix on the right as follows.

$Q \times R = \bordermatrix{ & 0 & \text{ } & 3 \cr 1 & W_{11} \cdot P_{10}+W_{12} \cdot P_{20} & \text{ } & W_{11} \cdot P_{13}+W_{12} \cdot P_{23} \cr 2 & W_{21} \cdot P_{10}+W_{22} \cdot P_{20} & \text{ } & W_{21} \cdot P_{13}+W_{22} \cdot P_{23} \cr } \qquad$

This is a 2 x 2 matrix indicating the probability of absorption based on the initial states (the rows) and the absorbing states (the columns). The first column gives the probabilities of absorption into state 0 and the second column gives the probabilities of absorption into state 3. Each of the four elements is a sum of two quantities. We can rearrange these 8 quantities to get the desired probabilities. For example, $P(X_{T-1}=1 \lvert X_0=1)$ would be the quantities corresponding to the initial transient state 1 and ending transient state 1, which would be $W_{11} \cdot P_{10}+W_{11} \cdot P_{13}$. The probability $P(X_{T-1}=1 \lvert X_0=2)$ would be the sum of the quantities corresponding to the initial transient state 2 and ending transient state 1, which would be $W_{21} \cdot P_{10}+W_{21} \cdot P_{13}$.

Summary

Before tackling the examples stated earlier, we summarize the ideas in the preceding section. Suppose that $\mathbf{P}$ is the transition probability matrix for a given absorbing Markov chain. For conceptual clarity, we can rewrite $\mathbf{P}$ in the following form.

$\mathbf{P}=\left[\begin{array}{cc} Q & R \\ \mathbf{0} & I \\ \end{array}\right]$

In this formulation, $Q$ consists of the one-step transition probabilities from transient states into transient states, $R$ consists of the one-step transition probabilities from transient states into absorbing states. Furthermore, all entries of $\mathbf{0}$ are zero and $I$ is the identity matrix of an appropriate size.

The next step is to obtain the fundamental matrix $W=(I-Q)^{-1}$, which is the inverse matrix of $I-Q$. The entries of $W$ are mean times spent in transient states. For example, the entry $W_{ij}$ is the mean time spent in state $j$ given that the initial state is $i$. As mentioned, the entries of the matrix $R$ are one-step probabilities from a transient state into an absorbing state. A typical entry of $R$ is $P_{ik}$ where $i$ is a transient state and $k$ is an absorbing state. As the above example demonstrates, the entries of $W$ and $R$ are used in finding the probability of the transient state from which the process enters an absorbing state.

Let $T=\text{min} \{ n \ge 0: X_n \text{ is an absorbing state} \}$, which is the random time to absorption. The goal is to find the probability of the form

$P(X_{T-1}=j \lvert X_0=i)$

where $i$ is the initial transient state and $j$ is the ending transient state (the last transient state before absorption). This probability is obtained by combining the appropriate entries of the matrices $W$ and $R$ in the following way:

$\displaystyle P(X_{T-1}=j \lvert X_0=i)=\sum \limits_{k} \biggl( W_{ij} \cdot P_{jk} \biggr)=W_{ij} \sum \limits_{k} P_{jk}$

where $k$ is over all relevant absorbing states, i.e. all absorbing states $k$ for which it is possible to transition from $j$ to $k$.

Examples

We now solve the two examples indicated at the beginning.

Example 1
Let $X_0=6$, i.e. the 0th toss involves 6 coins. Let $X_n$ be the number of coins in the $(n+1)$st toss. This is indeed a Markov chain since the next state depends only on the current state. The states decrease as more and more tosses are made. Thus the states of the chain are 6, 5, 4, 3, 2, 1 and 0 with state 0 being the absorbing state.

In each step in the chain, we observe the number of tails in tossing a number of coins. Thus each row in the transition probability matrix consists of the probabilities from a binomial distribution. Thus the following gives the transition probability matrix $\mathbf{P}$.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \cr 1 & 1/2 & 1/2 & 0 & 0 & 0 & 0 & 0 \cr 2 & 1/4 & 2/4 & 1/4 & 0 & 0 & 0 & 0 \cr 3 & 1/8 & 3/8 & 3/8 & 1/8 & 0 & 0 & 0 \cr 4 & 1/16 & 4/16 & 6/16 & 4/16 & 1/16 & 0 & 0 \cr 5 & 1/32 & 5/32 & 10/32 & 10/32 & 5/32 & 1/32 & 0 \cr 6 & 1/64 & 6/64 & 15/64 & 20/64 & 15/64 & 6/64 & 1/64 \cr } \qquad$

The following shows the calculation to obtain the fundamental matrix $W$.

$Q= \left[\begin{array}{cccccc} 1/2 & 0 & 0 & 0 & 0 & 0 \\ 2/4 & 1/4 & 0 & 0 & 0 & 0 \\ 3/8 & 3/8 & 1/8 & 0 & 0 & 0 \\ 4/16 & 6/16 & 4/16 & 1/16 & 0 & 0 \\ 5/32 & 10/32 & 10/32 & 5/32 & 1/32 & 0 \\ 6/64 & 15/64 & 20/64 & 15/64 & 6/64 & 1/64 \end{array}\right] \ \ \ \ \ \ \ \ \ R = \bordermatrix{ & 0 \cr 1 & 1/2 \cr 2 & 1/4 \cr 3 & 1/8 \cr 4 & 1/16 \cr 5 & 1/32 \cr 6 & 1/64 \cr } \qquad$

$I-Q= \left[\begin{array}{cccccc} 1/2 & 0 & 0 & 0 & 0 & 0 \\ -2/4 & 3/4 & 0 & 0 & 0 & 0 \\ -3/8 & -3/8 & 7/8 & 0 & 0 & 0 \\ -4/16 & -6/16 & -4/16 & 15/16 & 0 & 0 \\ -5/32 & -10/32 & -10/32 & -5/32 & 31/32 & 0 \\ -6/64 & -15/64 & -20/64 & -15/64 & -6/64 & 63/64 \end{array}\right]$

$W=(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 \cr 1 & 2 & 0 & 0 & 0 & 0 & 0 \cr 2 & 4/3 & 4/3 & 0 & 0 & 0 & 0 \cr 3 & 10/7 & 4/7 & 8/7 & 0 & 0 & 0 \cr 4 & 152/105 & 72/105 & 32/105 & 112/105 & 0 & 0 \cr 5 & 942/651 & 472/651 & 272/651 & 112/651 & 672/651 & 0 \cr 6 & 2820/1953 & 1428/1953 & 928/1953 & 528/1953 & 192/1953 & 1984/1953 \cr } \qquad$

To solve the problem at hand, first define the time to absorption $T=\text{min} \{ n \ge 0: X_n=0 \}$. The following calculation gives the probabilities of the last step.

$\displaystyle P(Y=1)=P(X_{T-1}=1 \lvert X_0=6)=W_{61} \times P_{10}=\frac{2820}{1953} \times \frac{1}{2}=\frac{1410}{1953}=0.7220$

$\displaystyle P(Y=2)=P(X_{T-1}=2 \lvert X_0=6)=W_{62} \times P_{20}=\frac{1428}{1953} \times \frac{1}{4}=\frac{357}{1953}=0.1828$

$\displaystyle P(Y=3)=P(X_{T-1}=3 \lvert X_0=6)=W_{63} \times P_{30}=\frac{928}{1953} \times \frac{1}{8}=\frac{116}{1953}=0.0594$

$\displaystyle P(Y=4)=P(X_{T-1}=4 \lvert X_0=6)=W_{64} \times P_{40}=\frac{528}{1953} \times \frac{1}{16}=\frac{33}{1953}=0.0169$

\displaystyle \begin{aligned}P(Y=5)&=P(X_{T-1}=5 \lvert X_0=6)=W_{65} \times P_{50}=\frac{192}{1953} \times \frac{1}{32}=\frac{6}{1953}=0.003072 \end{aligned}

\displaystyle \begin{aligned}P(Y=6)&=P(X_{T-1}=6 \lvert X_0=6)=W_{66} \times P_{60}=\frac{1984}{1953} \times \frac{1}{64}=\frac{31}{1953}=0.01587 \end{aligned}

Starting a 6 coins, the reduction in coins in tosses is gradual. There is a 72% chance that the coins will be reduced to one coin in the last toss. The most likely way to end the game is from tossing the last single coin. However, the calculation does not indicate how long it will take to reach the last step of one coin.

Example 2
The diagram of the maze is repeated here.

A maze with two food sources (areas 5 and 7)

There are 7 states in this Markov chain, representing the areas in the maze. The following gives the transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 & 7 \cr 1 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr 2 & 0 & 0 & 1/2 & 0 & 0 & 0 & 1/2 \cr 3 & 0 & 1/3 & 0 & 1/3 & 0 & 1/3 & 0 \cr 4 & 0 & 0 & 1/2 & 0 & 1/2 & 0 & 0 \cr 5 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \cr 6 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 1/3 \cr 7 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The states 5 and 7 are the areas with food, thus are absorbing states. The next step is the calculation surrounding the fundamental matrix $W$.

$Q = \bordermatrix{ & 1 & 2 & 3 & 4 & 6 \cr 1 & 0 & 1 & 0 & 0 & 0 \cr 2 & 0 & 0 & 1/2 & 0 & 0 \cr 3 & 0 & 1/3 & 0 & 1/3 & 1/3 \cr 4 & 0 & 0 & 1/2 & 0 & 0 \cr 6 & 0 & 0 & 1/3 & 0 & 0 \cr } \qquad \ \ \ \ \ \ \ \ \ R = \bordermatrix{ & 5 & 7 \cr 1 & 0 & 0\cr 2 & 0 & 1/2 \cr 3 & 0 & 0 \cr 4 & 1/2 & 0 \cr 6 & 1/3 & 1/3 \cr } \qquad$

$I-Q= \left[\begin{array}{ccccc} 1 & -1 & 0 & 0 & 0 \\ 0 & 1 & -1/2 & 0 & 0 \\ 0 & -1/3 & 1 & -1/3 & -1/3 \\ 0 & 0 & -1/2 & 1 & 0 \\ 0 & 0 & -1/3 & 0 & 1 \\ \end{array}\right]$

$W=(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 6 \cr 1 & 1 & 13/10 & 9/10 & 3/10 & 3/10 \cr 2 &0 & 13/10 & 9/10 & 3/10 & 3/10 \cr 3 & 0 & 3/5 & 9/5 & 3/5 & 3/5 \cr 4 & 0 & 3/10 & 9/10 & 13/10 & 3/10 \cr 6 & 0 & 1/5 & 3/5 & 1/5 & 6/5 \cr } \qquad$

The matrices $W$ and $R$ are all we need to finish the problem. Suppose that we place the mouse initially at either area 2, area 3 and area 6. We find the probabilities of the last transient state being 2, 4 and 6. Once again $T$ is the random time to absorption and is defined by $T=\text{min} \{ n \ge 0: X_n=5 \text{ or } X_n=7 \}$.

First, the starting state is 2.

$\displaystyle P(X_{T-1}=2 \lvert X_0=2]=W_{22} \cdot P_{27}=\frac{13}{10} \cdot \frac{1}{2}=\frac{6.5}{10}=0.65$

$\displaystyle P(X_{T-1}=4 \lvert X_0=2]=W_{24} \cdot P_{45}=\frac{3}{10} \cdot \frac{1}{2}=\frac{1.5}{10}=0.15$

$\displaystyle P(X_{T-1}=6 \lvert X_0=2]=W_{26} \cdot (P_{65}+P_{67})=\frac{3}{10} \cdot \frac{2}{3}=\frac{2}{10}=0.2$

Next, the starting state is 3.

$\displaystyle P(X_{T-1}=2 \lvert X_0=3]=W_{32} \cdot P_{27}=\frac{3}{5} \cdot \frac{1}{2}=\frac{1.5}{5}=0.3$

$\displaystyle P(X_{T-1}=4 \lvert X_0=3]=W_{34} \cdot P_{45}=\frac{3}{5} \cdot \frac{1}{2}=\frac{1.5}{5}=0.3$

$\displaystyle P(X_{T-1}=6 \lvert X_0=3]=W_{36} \cdot (P_{65}+P_{67})=\frac{3}{5} \cdot \frac{2}{3}=\frac{2}{5}=0.4$

Lastly, the starting state is 6.

$\displaystyle P(X_{T-1}=2 \lvert X_0=6]=W_{62} \cdot P_{27}=\frac{1}{5} \cdot \frac{1}{2}=\frac{1}{10}=0.1$

$\displaystyle P(X_{T-1}=4 \lvert X_0=6]=W_{64} \cdot P_{45}=\frac{1}{5} \cdot \frac{1}{2}=\frac{1}{10}=0.1$

$\displaystyle P(X_{T-1}=6 \lvert X_0=6]=W_{66} \cdot (P_{65}+P_{67})=\frac{6}{5} \cdot \frac{2}{3}=\frac{4}{5}=0.8$

The results agree with the intuition. If the mouse is placed in area 2 initially, it is most likely that the mouse enters food area from area 2 (65% chance). Because the mouse makes the moves at random, there is a small but not negligible chance that the entry to food is from area 4 or area 6.

The initial state of area 3 is more neutral. If the mouse is placed in area 3 at first, the entry to food is roughly equally likely to be from areas 2, 4 and 6, with area 6 having a slight edge. Note that area 6 has food areas on two sides.

If the mouse is placed in area 6 initially, the entry to food is mostly likely to be from area 6 since there are foods on two sides of area 6.

Practice Problems

Several practice problems to reinforce the concepts discussed here are available in a companion blog. See the problems at the end of this problem set.

Dan Ma math

Daniel Ma mathematics

Dan Ma Markov chains

Daniel Ma Markov chains

$\copyright$ 2018 – Dan Ma

# Absorbing Markov chains

The preceding two posts are devoted to solving the problem of determining mean time to absorption and the problem of determining the probability of absorption (using first step analysis here and using fundamental matrix here). The Markov chains in these problems are called absorbing Markov chains. This post summarizes the properties of such chains.

What are Absorbing Markov Chains?

A state in a Markov chain is said to be an absorbing state if the process will never leave that state once it is entered. That is, if the state $i$ is an absorbing state, then $P_{ii}=1$. A Markov chain is said to be an absorbing Markov chain if it has at least one absorbing state and if any state in the chain, with a positive probability, can reach an absorbing state after a number of steps. The following transition probability matrix represents an absorbing Markov chain.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr 1 & 0.5 & 0 & 0.5 & 0 & 0 & 0 \cr 2 & 0 & 0.5 & 0 & 0.5 & 0 & 0 \cr 3 & 0 & 0 & 0.5 & 0 & 0.5 & 0 \cr 4 & 0 & 0 & 0 & 0.5 & 0 & 0.5 \cr 5 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$ …………………………………………………………. (1)

The above matrix can be interpreted as a small scaled random walk or a small scaled gambler’s ruin. It has two absorbing states, 0 and 5. Once the process reaches these states, it does not leave these states. The other states (1, 2, 3 and 4) can reach an absorbing state in a finite number of steps with a positive probability. For example, state 1 can reach state 0 in one step with probability 0.5. State 1 can reach state 5 in four steps with probability 1/16.

As we will see below, any non-absorbing state in an absorbing Markov chain will eventually reach an absorbing state will probability 1. Thus any non-absorbing state in an absorbing Markov chain is a transient state. That is, when starting in a non-absorbing state, the process will only spend finite amount of time in non-absorbing states.

In order to be an absorbing Markov chain, it is not sufficient for a Markov chain to have an absorbing state. It must also have the property that all non-absorbing states must eventually reach an absorbing state. The following is an example of a chain that is not an absorbing Markov chain.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 0.5 & 0.3 & 0.2 & 0 & 0 & 0 \cr 1 & 0.4 & 0.3 & 0.3 & 0 & 0 & 0 \cr 2 & 0.3 & 0.4 & 0.3 & 0 & 0 & 0 \cr 3 & 0 & 0 & 0.3 & 0.3 & 0.4 & 0 \cr 4 & 0 & 0 & 0 & 0.4 & 0.3 & 0.3 \cr 5 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$ …………………………………………………………. (2)

Even though the above chain has one absorbing state (state 5), it is not an absorbing chain. Note that the states 0, 1 and 2 can never reach state 5.

Reaching Absorbing States with Certainty

Now we discuss several properties of such chains. The first one is that it does not matter what state the process is in initially, eventually the process reach an absorbing state, i.e. the process will be absorbed at some point in time. Suppose that the absorbing Markov chain has $t$ non-absorbing states and $r$ absorbing states. The transition probability matrix $\mathbf{P}$ can be written in the following format

$\mathbf{P}=\left[\begin{array}{rr} Q & R \\ \mathbf{0} & I \\ \end{array}\right]$ ……………………………………………………………………………………….. (3)

where $Q$ is a $t \times t$ matrix representing the transition probabilities from the non-absorbing states into the non-absorbing states and $R$ is a $t \times r$ matrix representing the transition probabilities from the non-absorbing states into the absorbing states. Furthermore, $\mathbf{0}$ is an $r \times t$ matrix consisting of zeros in its entries and $I$ is the $r \times r$ identity matrix. The following is the matrix in (1) arranged in the new format.

$\mathbf{P} = \bordermatrix{ & 1 & 2 & 3 & 4 & \text{ } & 0 & 5 \cr 1 & 0 & 0.5 & 0 & 0 & \text{ } & 0.5 & 0 \cr 2 & 0.5 & 0 & 0.5 & 0 & \text{ } & 0 & 0 \cr 3 & 0 & 0.5 & 0 & 0.5 & \text{ } & 0 & 0 \cr 4 & 0 & 0 & 0.5 & 0 & \text{ } & 0 & 0.5 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 0 & 0 & 0 & 0 & 0 & \text{ } & 1 & 0 \cr 5 & 0 & 0 & 0 & 0 & \text{ } & 0 & 1 \cr } \qquad$ …………………………………………………….. (4)

Decomposing $\mathbf{P}$ into the form in (3) is a useful technique. It makes certain calculations simple to perform. We will get to that in a minute. First we prove an important fact showing that the non-absorbing states in an absorbing chain are indeed transient.

Theorem 1
In an absorbing Markov chain, the probability that the process will reach an absorbing state is 1. In other words, the probability of being in a non-absorbing state approaches zero as $n$ goes to infinity, i.e. $Q^n \rightarrow \mathbf{0}$ as $n \rightarrow \infty$.

Here’s a proof. For each non-absorbing state $k$, let $C_k$ be the minimum number of steps to transition from state $k$ to an absorbing state. For example, in the matrix (1), the minimum number of steps to go from state 1 to an absorbing state (state 0) is 1 with probability 0.5. In the same example, the minimum number of steps to go from state 3 to an absorbing state (state 5) is 2 with probability 0.25.

For each non-absorbing state $k$, let $A_k$ be the probability that, starting from state $k$, the process will take more than $C_k$ steps to reach an absorbing state. The probability $A_k$ must be less than 1. If $A_k=1$, the process for certain will not reach an absorbing state in $C_k$ steps, contradicting the property of $C_k$. Thus $A_k<1$.

Let $C$ be the maximum of all $C_k$ and let $A$ be the maximum of all $A_k$. Taking maximum is possible here since the Markov chain has a finite state space. The probability of the process taking more than $C$ steps to reach an absorbing state is less than or equal to $A$. This follows from the property of $C_k$ and $A_k$.

Furthermore, the probability of the process taking more than $2 \times C$ steps to reach an absorbing state is less than or equal to $A^2$. This follows from the fact that the Markov chain is memoryless. After taking $C$ steps and if the process still not in an absorbing state, the probability of the process taking additional $C$ or more steps to reach an absorbing state is less than or equal to $A$. Thus multiply the two probabilities gives $A^2$. By an inductive reasoning, the probability of the process taking more than $n \times C$ steps to reach an absorbing state is less than or equal to $A^n$. Since $A<1$, the numbers $A^n$ approach zero as $n$ goes to infinity.

It follows that as the integer $n$ increases without bound, the probability that the process will take more than $n$ steps to reach an absorbing state approaches zero. Turning it around, the probability that the process will reach an absorbing state in $n$ or fewer steps approaches 1. As a result, for any non-absorbing states $i$ and $j$, the transition probabilities $P_{ij}^n$ approaches zero as $n$ goes to infinity. Thus $Q^n \rightarrow \mathbf{0}$ as $n \rightarrow \infty$. $\square$

The above theorem shows that in an absorbing Markov chain, the process will eventually land in an absorbing state (it cannot cycle among the non-absorbing states indefinitely). Thus in an absorbing Markov chain, the time spent in the non-absorbing states is finite. As a result, non-absorbing states in an absorbing Markov chain are called transient states. Since the time spent in transient states is finite, one of the problems of interest is the mean time spent in transient states. Of course, the answer depends on the initial state.

More Properties of Absorbing Markov Chains

Another important property is the fundamental matrix that can be computed from the transition probability matrix of an absorbing chain. Given the transition probability matrix $\mathbf{P}$, decompose $\mathbf{P}$ according to (3). Recall that $Q$ consists of the transition probabilities from the transient states to the transient states. Derive $I-Q$ where $I$ is the identity matrix of the same dimension as $Q$. The following three theorems capture more important properties of absorbing Markov chains.

Theorem 2
In an absorbing Markov chain with transition probability matrix $\mathbf{P}$, the matrix $I-Q$ has an inverse $W$, i.e. $W=(I-Q)^{-1}$. The matrix $W$ has the following properties:

• $W=I+Q+Q^2+Q^3+\cdots$
• An entry $W_{ij}$ in the matrix $W$ is the mean time spent in the transient state $j$ given that the process starts at the transient state $i$. More specifically, $W_{ij}=E[\text{number of times state } j \text{ is visited} \lvert X_0=i]$.

For an absorbing Markov chain with transition probability matrix $\mathbf{P}$, the matrix $W=(I-Q)^{-1}$ is called the fundamental matrix of the absorbing Markov chain. A proof of Theorem 2 is sketched out in the preceding post. So we will not repeat it here.

Once the fundamental matrix is obtained by taking the inverse of $I-Q$, we can use it to solve two problems, stated in the following two theorems. The ideas of a proof are also sketched out in the preceding post.

Theorem 3 – mean time to absorption
In an absorbing Markov chain with transition probability matrix $\mathbf{P}$, consider the fundamental matrix $W=(I-Q)^{-1}$. The following calculation is of interest.

• Given that the process starts in the transient state $i$, consider the row of the matrix $W$ that corresponds to state $i$. The sum of all entries of $W$ on that row is the mean time spent in transient states given that the process start in state $i$. The sum is the mean time to absorption.
Theorem 4 – probabilities of absorption
In an absorbing Markov chain with transition probability matrix $\mathbf{P}$, consider the fundamental matrix $W=(I-Q)^{-1}$. The following calculation is of interest.

• Consider the matrix $R$ in the decomposition of $\mathbf{P}$. The matrix $W \times R$, the product of $W$ and $R$, gives the probabilities of absorption.
• More specifically, the row in $W$ corresponding to transient state $i$ multiplying the column in $R$ corresponding to the absorbing state $k$ results in the probability of being absorbed into state $k$ given that the process start in state $i$.
• Even more specifically, assuming that the transient states are $0,1,2,\cdots,t-1$, the following gives the probability of absorption into state $k$ given that the process starts in state $i$.
$\displaystyle \sum \limits_{j=0}^{t-1} \biggl( E[\text{number of times state } j \text{ is visited} \lvert X_0=i] \times P_{jk} \biggr)$

Examples

Examples are available in the preceding posts (here and here). We work three examples.

Example 1
An urn initially contains 7 blue balls and 5 red balls. One ball is selected at random one at a time from the urn. If the selected ball is not red, the ball is put back into the urn. If the selected ball is red, it is removed from the urn. The process continues until all red balls are removed. Determine the expected number of selections in order to have all red balls removed.

Let $X_n$ be the number of red balls in the urn after the $n$th selection. The states are 0, 1, 2, 3, 4 and 5. The initial state is $X_0=5$. The goal is to reach the absorbing state of 0 (all red balls are removed). The following is the transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr 1 & 1/8 & 7/8 & 0 & 0 & 0 & 0 \cr 2 & 0 & 2/9 & 7/9 & 0 & 0 & 0 \cr 3 & 0 & 0 & 3/10 & 7/10 & 0 & 0 \cr 4 & 0 & 0 & 0 & 4/11 & 7/11 & 0 \cr 5 & 0 & 0 & 0 & 0 & 5/12 & 7/12 \cr } \qquad$

There is only one absorbing state, the state 0. The states 1, 2, 3, 4 and 5 are transient states. Identify the matrix $Q$, which is the transition probabilities from the transient states into the transient states. Derive the matrix $I-Q$ where $I$ is a 5 x 5 identity matrix. The following is the inverse of $I-Q$.

$(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 \cr 1 & 8 & 0 & 0 & 0 & 0 \cr 2 & 8 & 9/2 & 0 & 0 & 0 \cr 3 & 8 & 9/2 & 10/3 & 0 & 0 \cr 4 & 8 & 9/2 & 10/3 & 11/4 & 0 \cr 5 & 8 & 9/2 & 10/3 & 11/4 & 12/5 \cr } \qquad$

The 5th row of $(I-Q)^{-1}$ gives the mean time spent in the transient states. The sum of the entries on the 5th row is 1259/60 = 20.9833. Thus it takes about 21 steps on average to remove all red balls from the urn.

Example 2
An urn initially contains 7 blue balls and 5 red balls. One ball is selected at random one at a time from the urn. If the selected ball is not red, the ball is put back into the urn. If the selected ball is red, it is removed from the urn and then a blue ball is put into the urn. The process continues until all red balls are removed. Determine the expected number of selections in order to have all red balls removed. Compare the expected value with the expected value in Example 1. Give an explanation for the difference.

As in Example 1, let $X_n$ be the number of red balls in the urn after the $n$th selection. The states are 0, 1, 2, 3, 4 and 5. The initial state is $X_0=5$. The goal is to reach the absorbing state of 0 (all red balls are removed). The following is the transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr 1 & 1/12 & 11/12 & 0 & 0 & 0 & 0 \cr 2 & 0 & 2/12 & 10/12 & 0 & 0 & 0 \cr 3 & 0 & 0 & 3/12 & 9/12 & 0 & 0 \cr 4 & 0 & 0 & 0 & 4/12 & 8/12 & 0 \cr 5 & 0 & 0 & 0 & 0 & 5/12 & 7/12 \cr } \qquad$

One big difference from Example 1 is that each red ball selected is replaced by a blue ball. Thus the urn always has 12 balls whereas in Example 1 the number of balls is dwindling. As before, identify $Q$ and derive $I-Q$. The following is the inverse of $I-Q$.

$(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 \cr 1 & 12 & 0 & 0 & 0 & 0 \cr 2 & 12 & 6 & 0 & 0 & 0 \cr 3 & 12 & 6 & 4 & 0 & 0 \cr 4 & 12 & 6 & 4 & 3 & 0 \cr 5 & 12 & 6 & 4 & 3 & 12/5 \cr } \qquad$

Since the initial state is 5, we focus on the 5th row of $(I-Q)^{-1}$. The sum of that row is 137/5 = 27.4. Thus it takes on average 27.4 selections to remove all red balls. Note that the process takes longer than in Example 1. By replacing each selected red ball with a blue ball, it becomes harder to remove red balls especially when the process is in the lower states.

Example 3
An urn initially contains 4 blue balls and 2 red balls. One ball is selected at random one at a time from the urn. Each selected ball is replaced by a ball of the opposite color. The process continues until all balls in the urn are of the same color.

• Determine the probability that the process ends with the urn containing only red balls.
• Determine the expected number of selections in order for the urn to consist of balls of the same color.

Let $X_n$ be the number of red balls in the urn after the $n$th selection. There are 7 states in this chain – 0, 1, 2, 3, 4, 5 and 6. The initial state is $X_0=2$. The process will eventually reach either state 0 or state 6. We wish to find the probability of being absorbed into state 6 first. States 0 and 6 are the absorbing states. States 1, 2, 3, 4, and 5 are transient states. The following is the transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 \cr 1 & 1/6 & 0 & 5/6 & 0 & 0 & 0 & 0 \cr 2 & 0 & 2/6 & 0 & 4/6 & 0 & 0 & 0 \cr 3 & 0 & 0 & 3/6 & 0 & 3/6 & 0 & 0 \cr 4 & 0 & 0 & 0 & 4/6 & 0 & 2/6 & 0 \cr 5 & 0 & 0 & 0 & 0 & 5/6 & 0 & 1/6 \cr 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

As before, identify $Q$ and derive $I-Q$. The following is the inverse of $I-Q$.

$W=(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 \cr 1 & 48/13 & 105/13 & 10 & 90/13 & 30/13 \cr 2 & 42/13 & 126/13 & 12 & 108/13 & 36/13 \cr 3 & 3 & 9 & 13 & 9 & 3 \cr 4 & 36/13 & 108/13 & 12 & 126/13 & 42/13 \cr 5 & 30/13 & 90/13 & 10 & 105/13 & 48/13 \cr } \qquad$

To answer the probability question, multiply the matrices $W$ and $R$. The following is the result.

$W \times R=\left[\begin{array}{rrrrr} 48/13 & 105/13 & 10 & 90/13 & 30/13 \\ 42/13 & 126/13 & 12 & 108/13 & 36/13 \\ 3 & 9 & 13 & 9 & 3 \\ 36/13 & 108/13 & 12 & 126/13 & 42/13 \\ 30/13 & 90/13 & 10 & 105/13 & 48/13 \end{array}\right] \times \left[\begin{array}{rr} 1/6 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 0 \\ 0 & 1/6 \end{array}\right]=\left[\begin{array}{rr} 8/13 & 5/13 \\ 7/13 & 6/13 \\ 1/2 & 1/2 \\ 6/13 & 7/13 \\ 5/13 & 8/13 \end{array}\right]$

With the initial state being 2, the probability of being absorbed into state 6 (all red balls) is 6/13 = 0.4615. The expected time to absorption (all balls of the same color) is the sum of the second row of $(I-Q)^{-1}$, which is 36. Thus on average it will take 36 selections of balls for the urn to consist of balls of the same color.

Practice Problems

Practice problems to reinforce the concepts discussed here are available in a companion blog. There are two problem sets. The first one is here and the second one is here.

Dan Ma math

Daniel Ma mathematics

Dan Ma Markov chains

Daniel Ma Markov chains

$\copyright$ 2018 – Dan Ma

# First step analysis and fundamental matrix

A great number of problems involving Markov chains can be evaluated by a technique called first step analysis. The general idea of the method is to break down the possibilities resulting from the first step (first transition) in the Markov chain. Then use the law of total probability and Markov property to derive a set of relationship among the unknown variables. This technique is demonstrated in this previous post in several examples. In this post, the technique is further discussed. An alternative approach based on the fundamental matrix is also emphasized.

This previous post shows how to use first step analysis to solve two problems: to determine the probability of absorption from transient states and to determine the expected time spent in transient states. This post shows that first step analysis leads to an effective approach of using fundamental matrix for solving the same two problems. We illustrate using an example.

Illustrative Example

We start with Example 3 discussed in the previous post. It involves the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 1 & 0 & 0 & 0 & 0 \cr 1 & 0.4 & 0.3 & 0.1 & 0.1 & 0.1 \cr 2 & 0.1 & 0.1 & 0.5 & 0.2 & 0.1 \cr 3 & 0.1 & 0.2 & 0.1 & 0.5 & 0.1 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The Markov chain represented by $\mathbf{P}$ cycles through 5 states. Whenever the process reaches state 0 or state 4, it stays there and not move. These two states are called absorbing states. The other states (1, 2 and 3) are called transient states because the process stays in each of these states a finite amount of time. When the process starts at one of these states, it will eventually transition (be absorbed) into one of the absorbing states (0 or 4). Thus the matrix $\mathbf{P}$ is called an absorbing Markov chain.

A natural question: how likely that the process is absorbed into state 0 versus state 4? Of course the answer to this question depends on the starting state.

Another question: if starting at state 1, 2 or 3, how many steps on average it will take for the process to be absorbed? In other words, what is the mean time to absorption? Of course, the answer will depend on the starting state.

Let $T$ the time it takes the process to be absorbed into 0 or 4. More specifically let $T$ be defined by $T=\{ n \ge 0: X_n=0 \text{ or } X_n=4 \}$. If the starting state is $i$, let $U_i$ be the probability of the process being absorbed into state 0 and let $V_i$ be the expected time to absorption (into state 0 or state 4). The quantities $U_i$ and $V_i$ can also be defined as follows:

$U_i=P[X_T=0 \lvert X_0=i] \ \ \ \ i=1,2,3$

$V_i=E[T \lvert X_0=i] \ \ \ \ \ \ \ \ \ \ i=1,2,3$

The quantities $U_i$ are the probabilities of absorption into state 0 and the quantities $V_i$ are the expected times to absorption (from each transient state). Applying the first step analysis produces the following two sets of equations.

\displaystyle \begin{aligned}&U_1=0.4+0.3 \times U_1+0.1 \times U_2+0.1 \times U_3 \\&U_2=0.1+0.1 \times U_1+0.5 \times U_2+0.2 \times U_3 \\&U_3=0.1+0.2 \times U_1+0.1 \times U_2+0.5 \times U_3 \end{aligned}

\displaystyle \begin{aligned}&V_1=1+0.3 \times V_1+0.1 \times V_2+0.1 \times V_3 \\&V_2=1+0.1 \times V_1+0.5 \times V_2+0.2 \times V_3 \\&V_3=1+0.2 \times V_1+0.1 \times V_2+0.5 \times V_3 \end{aligned}

We rearrange the equations by putting the variables $U_i$ or $V_i$ on one side and the constant terms on the other side.

$\displaystyle \begin{array}{rcr} \displaystyle 0.7 \times U_1-0.1 \times U_2-0.1 \times U_3 & = & 0.4 \\ \displaystyle -0.1 \times U_1+0.5 \times U_2-0.2 \times U_3 & = & 0.1 \\ \displaystyle -0.2 \times U_1-0.1 \times U_2+0.5 \times U_3 & = & 0.1 \end{array} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1a)$

$\displaystyle \begin{array}{rcr} \displaystyle 0.7 \times V_1-0.1 \times V_2-0.1 \times V_3 & = & 1 \\ \displaystyle -0.1 \times V_1+0.5 \times V_2-0.2 \times V_3 & = & 1 \\ \displaystyle -0.2 \times V_1-0.1 \times V_2+0.5 \times V_3 & = & 1 \end{array} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2a)$

The probabilities of absorption into state 0 are found by solving the system of equations (1a). The mean times to absorption are found by solving the system of equations (2a). It will be informative to write (1) and (2) in matrix notations.

$\left[\begin{array}{rrr} 0.7 & -0.1 & -0.1 \\ -0.1 & 0.5 & -0.2 \\ -0.2 & -0.1 & 0.5 \end{array}\right] \left[\begin{array}{c} U_1 \\ U_2 \\ U_3 \end{array}\right] = \left[\begin{array}{c} 0.4 \\ 0.1 \\ 0.1 \end{array}\right] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1b)$

$\left[\begin{array}{rrr} 0.7 & -0.1 & -0.1 \\ -0.1 & 0.5 & -0.2 \\ -0.2 & -0.1 & 0.5 \end{array}\right] \left[\begin{array}{c} V_1 \\ V_2 \\ V_3 \end{array}\right] = \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2b)$

Note that the same matrix appears on the left side of both (1b) and (2b). The quantities $U_i$ and $V_i$ can be found by multiplying both sides by the following inverse matrix.

$\left[\begin{array}{rrr} 0.7 & -0.1 & -0.1 \\ -0.1 & 0.5 & -0.2 \\ -0.2 & -0.1 & 0.5 \end{array}\right]^{-1}=\left[\begin{array}{rrr} 230/141 & 60/141 & 70/141 \\ 90/141 & 330/141 & 150/141 \\ 110/141 & 90/141 & 340/141 \end{array}\right]$

Multiplying (1b) and (2b) by the inverse matrix produces the following.

$\left[\begin{array}{c} U_1 \\ U_2 \\ U_3 \end{array}\right] = \left[\begin{array}{rrr} 230/141 & 60/141 & 70/141 \\ 90/141 & 330/141 & 150/141 \\ 110/141 & 90/141 & 340/141 \end{array}\right] \times \left[\begin{array}{c} 0.4 \\ 0.1 \\ 0.1 \end{array}\right] =\left[\begin{array}{c} 35/47 \\ 28/47 \\ 29/47 \end{array}\right]=\left[\begin{array}{c} 0.745 \\ 0.596 \\ 0.617 \end{array}\right] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1c)$

$\left[\begin{array}{c} V_1 \\ V_2 \\ V_3 \end{array}\right] = \left[\begin{array}{rrr} 230/141 & 60/141 & 70/141 \\ 90/141 & 330/141 & 150/141 \\ 110/141 & 90/141 & 340/141 \end{array}\right] \times \left[\begin{array}{c} 1 \\ 1 \\ 1 \end{array}\right] =\left[\begin{array}{c} 360/141 \\ 570/141 \\ 540/141 \end{array}\right]=\left[\begin{array}{c} 2.55 \\ 4.04 \\ 3.83 \end{array}\right] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2c)$

The systems of equations (1a) and (2a) can certainly be solved by using other methods (e.g. Gaussian elimination). The inverse matrix approach is clean and can be implemented using software. More importantly, using the inverse matrix leads to a general description of the algorithm based on fundamental matrix.

How is the matrix in (1b) and (2b) obtained? Recall that the equations in (1a) and (2a) are obtained by rearranging the equations from the first step analysis. The first step analysis is of course based on the original transition probability matrix $\mathbf{P}$. The matrix in question is obtained from the transition probability matrix $\mathbf{P}$. The next two sections show how.

Before stating the algorithm, we continue to use the above example as illustration. The following is a rewrite of the transition probability matrix $\mathbf{P}$. Note that the transient states are grouped together.

$\mathbf{P} = \bordermatrix{ & 1 & 2 & 3 & \text{ } & 0 & 4\cr 1 & 0.3 & 0.1 & 0.1 & \text{ } & 0.4 & 0.1 \cr 2 & 0.1 & 0.5 & 0.2 & \text{ } & 0.1 & 0.1 \cr 3 & 0.2 & 0.1 & 0.5 & \text{ } & 0.1 & 0.1 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 0 & 0 & 0 & 0 & \text{ } & 1 & 0 \cr 4 & 0 & 0 & 0 & \text{ } & 0 & 1 \cr } \qquad=\left[\begin{array}{rr} Q & R \\ \mathbf{0} & I \\ \end{array}\right]$

The matrix $Q$ is the 3 x 3 matrix of transition probabilities for the 3 transient states 1, 2 and 3. The matrix $Q$ describes the transition probabilities between transient states. The matrix $R$ is a 3 x 2 matrix that shows the transition probabilities from transient to absorbing states. The matrix $\mathbf{0}$ is a 2 x 3 matrix consisting of 0’s. The matrix $I$ is an identity matrix describing the absorbing states (in this case 2 x 2).

The key matrix is the matrix $I-Q$ where $I$ is a 3 x 3 identity matrix.

$I-Q=\left[\begin{array}{rrr} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right]-\left[\begin{array}{rrr} 0.3 & 0.1 & 0.1 \\ 0.1 & 0.5 & 0.2 \\ 0.2 & 0.1 & 0.5 \end{array}\right]=\left[\begin{array}{rrr} 0.7 & -0.1 & -0.1 \\ -0.1 & 0.5 & -0.2 \\ -0.2 & -0.1 & 0.5 \end{array}\right]$

The matrix $W=(I-Q)^{-1}$, the inverse of $I-Q$, is called the fundamental matrix for the absorbing chain in question. As shown above, the fundamental matrix is used in finding the probabilities of absorption and the mean times to absorption.

The Algorithm

Recall that an absorbing Markov chain has at least one absorbing state such that every non-absorbing state will eventually transition into an absorbing state (i.e. will eventually be absorbed). The non-absorbing states in such a chain are called transient states.

Consider an absorbing Markov chain with transition probability matrix $\mathbf{P}$. Suppose that the chain has $N+1$ states with transient states $0,1,2,\cdots,y-1$ and absorbing states $y,y+1,\cdots,N$. Such labeling is for convenience. As the above example shows, the rearranging of the states is not necessary as long as one can keep the transient states apart from the absorbing states.

Write the matrix $\mathbf{P}$ in the following form

$\mathbf{P}=\left[\begin{array}{rr} Q & R \\ \mathbf{0} & I \\ \end{array}\right]$

where $Q$ consists of $P_{ij}$ where $0 \le i,j \le y-1$ and $R$ consists of $P_{ij}$ where $0 \le i \le y-1$ and $y \le j \le N$. In other words, the matrix $Q$ contains the one-step transition probabilities between transient states and $R$ contains the one-step transition probabilities from transient states into absorbing states. The matrix $\mathbf{0}$ is a $(N+1-y) \times y$ matrix consisting of 0’s. The matrix $I$ is an $(N+1-y) \times (N+1-y)$ identity matrix describing the absorbing states.

Consider the matrix $I-Q$ where $I$ is the $y \times y$ identity matrix. Then $W=(I-Q)^{-1}$, the inverse of $I-Q$, is the fundamental matrix for the absorbing Markov chain.

Here is one important property of the matrix $W=[ W_{ij} ]$.

An element $W_{ij}$ of $W$ represents the total time the chain spends in the transient state $j$ if the starting state is the transient state $i$.

This leads to the following observation.

The sum of all entries in a row of $W$ is the total time the chain is expected to spend in all the transient states if the starting state is the transient state corresponding to that row.

Here’s the property concerning probability of absorption.

The matrix $U=W \times R$, the product of the matrix $W$ and the matrix $R$, gives the probabilities of absorption. For example, the element $U_{ij}$ of $U$ is the probability of being absorbed into the state $j$ if the starting state is $i$.

To make the mathematical properties even more precise, consider $T=\{ n \ge 0: y \le X_n \le N \}$. The variable $T$ is the random number of steps the process takes to reach an absorbing state. Define the following quantities.

$U_{ij}=P[T=j \lvert X_0=i]$

$V_{i}=E[T \lvert X_0=i]$

where $i$ is any transient state ($0 \le i \le y$) and $j$ is any absorbing state ($y \le j \le N$). Here’s how to tie $U_{ij}$ and $V_i$ to the above stated facts.

The quantity $V_i$ is simply the sum of all entries of the row corresponding to the transient state $i$ in the matrix $W$. Equivalently let $\mathbf{V}$ be the column vector with entries $V_i$. Then $\mathbf{V}=W \times \mathbf{1}$ where $\mathbf{1}$ is the column vector consisting of all 1’s.

The quantities $U_{ij}$ are simply the entries in the matrix $U=W \times R$.

To further illustrate the technique, the next section has examples. It is critical that calculator or software be available for computing inverse matrix. Here is a useful online matrix calculator.

Examples

Example 1
This is another calculation example to illustrate the technique. Use the following transition probability matrix to compute the quantities $U_{ij}=P[T=j \lvert X_0=i]$ (the probabilities of being absorbed into states 0 and 4) and $V_{i}=E[T \lvert X_0=i]$ (the expected time spent in transient states 1,2 and 3 given the starting state is $i$). Of course $T$ is the random time to absorption defined by $T=\{ n \ge 0: X_n=0 \text{ or } X_n=4 \}$.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 1 & 0 & 0 & 0 & 0 \cr 1 & 0.3 & 0.45 & 0.1 & 0.1 & 0.05 \cr 2 & 0.05 & 0.1 & 0.2 & 0.1 & 0.55 \cr 3 & 0.1 & 0.1 & 0.1 & 0.1 & 0.6 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The following shows the matrices that are used in calculation.

$Q = \bordermatrix{ & 1 & 2 & 3 \cr 1 & 0.45 & 0.1 & 0.1 \cr 2 & 0.1 & 0.2 & 0.1 \cr 3 & 0.1 & 0.1 & 0.1 \cr } \qquad \ \ \ \ R = \bordermatrix{ & 0 & 4 \cr 1 & 0.3 & 0.05 \cr 2 & 0.05 & 0.55 \cr 3 & 0.1 & 0.6 \cr } \qquad$

$I-Q=\left[\begin{array}{rrr} 0.55 & -0.1 & -0.1 \\ -0.1 & 0.8 & -0.1 \\ -0.1 & -0.1 & 0.9 \end{array}\right]$

$W=(I-Q)^{-1}=\left[\begin{array}{rrr} 1420/743 & 200/743 & 180/743 \\ 200/743 & 970/743 & 130/743 \\ 180/743 & 130/743 & 860/743 \end{array}\right]$

Here’s the matrix calculation to obtain the desired quantities.

$\left[\begin{array}{r} V_1 \\ V_2 \\ V_3 \end{array}\right]=\left[\begin{array}{rrr} 1420/743 & 200/743 & 180/743 \\ 200/743 & 970/743 & 130/743 \\ 180/743 & 130/743 & 860/743 \end{array}\right] \times \left[\begin{array}{r} 1 \\ 1 \\ 1 \end{array}\right]=\left[\begin{array}{r} 2.4226 \\ 1.7497 \\ 1.5747 \end{array}\right]$

$U=W \times R=\left[\begin{array}{rr} 908/1486 & 578/1486 \\ 243/1486 & 1243/1486 \\ 293/1486 & 1193/1486 \end{array}\right]=\left[\begin{array}{rr} 0.6110 & 0.3890 \\ 0.1635 & 0.8365 \\ 0.1972 & 0.8028 \end{array}\right]=\left[\begin{array}{rr} U_{1,0} & U_{1,4} \\ U_{2,0} & U_{2,4} \\ U_{2,0} & U_{3,4} \end{array}\right]$

Example 2
A fair coin is tossed repeatedly until the appearance of 4 consecutive heads. Determine the mean number of tosses required.

Consider the 5-state Markov chain with states 0, H, HH, HHH and HHHH. The state 0 means that the last toss is not a head. These 5 states are labeled 0, 1, 2, 3, 4 and 5, respectively.

$\mathbf{P} = \bordermatrix{ & 0 & \text{1=H} & \text{2=HH} & \text{3=HHH} & \text{4=HHHH} \cr 0 & 0.5 & 0.5 & 0 & 0 & 0 \cr 1 & 0.5 & 0 & 0.5 & 0 & 0 \cr 2 & 0.5 & 0 & 0 & 0.5 & 0 \cr 3 & 0.5 & 0 & 0 & 0 & 0.5 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

Note that when a toss results in a tail, the process bounces back to the state 0. Otherwise, it advances to the next state (e.g. HH to HHH). The transient states are 0, 1, 2 and 3. State 4 is the absorbing state. The natural question is: on average how many tosses does it take to reach state 4 (to get 4 consecutive heads)? It is informative to obtain the matrix $I-Q$ and to find its inverse.

$I-Q=\left[\begin{array}{rrrr} 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array}\right]-\left[\begin{array}{rrrr} 0.5 & 0.5 & 0 & 0 \\ 0.5 & 0 & 0.5 & 0 \\ 0.5 & 0 & 0 & 0.5 \\ 0.5 & 0 & 0 & 0 \end{array}\right]=\left[\begin{array}{rrrr} 0.5 & -0.5 & 0 & 0 \\ -0.5 & 1 & -0.5 & 0 \\ -0.5 & 0 & 1 & -0.5 \\ -0.5 & 0 & 0 & 1 \end{array}\right]$

$(I-Q)^{-1} = \bordermatrix{ & 0 & \text{1=H} & \text{2=HH} & \text{3=HHH} \cr 0 & 16 & 8 & 4 & 2 \cr 1 & 14 & 8 & 4 & 2 \cr 2 & 12 & 6 & 4 & 2 \cr 3 & 8 & 4 & 2 & 2 \cr } \qquad$

The sum of the first row of $(I-Q)^{-1}$ is 30. It takes on average 30 tosses to get 4 consecutive heads if starting from scratch. After getting the first head, the process spends on average 28 tosses (the sum of the second row) before getting 4 consecutive heads. Even with three consecutive heads obtained, the chain still spends on average 16 tosses before reaching the state HHHH! On the surface, this may seem unbelievable. Remember that whenever a toss results in a tail, the process reverts back to 0 and the process essentially starts over again.

Example 3 (Occupancy Problem)
This example revisits the occupancy problem, which is discussed here. The occupancy problem is a classic problem in probability. The setting of the problem is that $k$ balls are randomly distributed into $N$ cells (or boxes or other containers) one at a time. After all the $k$ balls are distributed into the cells, we examine how many of the cells are occupied with balls. In this particular example, the number of balls is not fixed. Instead, balls are thrown into the cells one at a time until all cells are occupied. On average how many balls have to be thrown to achieve this?

More specifically, let $n=6$ (cells). On average, how many balls have to be thrown until all 6 cells are occupied? Let $X_n$ be the number of occupied cells after $n$ balls are thrown. The following is the transition probability matrix describing this Markov chain.

$\displaystyle \mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 1 & 0 & \frac{1}{6} & \frac{5}{6} & 0 & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0 & \frac{2}{6} & \frac{4}{6} & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 3 & 0 & 0 & 0 & \frac{3}{6} & \frac{3}{6} & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 4 & 0 & 0 & 0 & 0 & \frac{4}{6} & \frac{2}{6} & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 5 & 0 & 0 & 0 & 0 & 0 & \frac{5}{6} & \frac{1}{6} \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

For a more detailed discussion on this matrix, see the previous post on the occupancy problem. The transient states here are 0, 1, 2, 3, 4 and 5. Extract the matrix $Q$ with these 5 transient states. Derive the matrix $I-Q$. The following is the inverse of $I-Q$.

$(I-Q)^{-1} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 1 & 6/5 & 3/2 & 2 & 3 & 6 \cr 1 & 0 & 6/5 & 3/2 & 2 & 3 & 6 \cr 2 & 0 & 0 & 3/2 & 2 & 3 & 6 \cr 3 & 0 & 0 & 0 & 2 & 3 & 6 \cr 4 & 0 & 0 & 0 & 0 & 3 & 6 \cr 5 & 0 & 0 & 0 & 0 & 0 & 6 \cr } \qquad$

The sum of the first row (the row for state 0) is 14.7. If starting with all 6 cells empty, it will take on average the throwing of 14.7 balls to have all 6 cells occupied. If there is already one ball in the 6 cells, then it will take on average 13.7 balls to fill the 6 cells. The decrease in the number of throws is not linear. When there are already 4 occupied cells, it will take on average 9 throws to fill the cells. When there is only one empty cell, it will take on average 6 more throws to fill the cells.

Because the number of cells is 6, the problem can be interpreted as rolling a die. The question can be restated: in rolling a fair die, how many rolls does it take on average to have each side of the die appeared at least one?

There is yet another interpretation of the problem of throwing balls into cells until all cells are occupied. This is precisely the coupon collector problem. The 6 cells in the example would be like 6 different coupon types (e.g. promotional prizes that come with the purchase of a product). When such a product is purchased, a prize (a coupon) is given at random. How many units of the product do you need to buy in order to have collected each of the coupon types at least once? For a fuller discussion on the coupon collector problem, see this blog post in a companion blog.

Example 4 (Gambler’s Ruin)
Consider a small scaled gambler’s ruin problem. Say, $N=8$ and $p=0.48$. That is, the gambler makes a series of one-unit bets against the house such that the probability of winning a bet is 0.48 and such that the gambler stops whenever his total capital is 0 units (the gambler is in ruin) or 8 units.

Let $X_n$ be the gambler’s capital (the number of units) after the $n$th bet. The following is the transition probability matrix describing this Markov chain.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \cr 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \cr 1 & 0.52 & 0 & 0.48 & 0 & 0 & 0 & 0 & 0 & 0 \cr 2 & 0 & 0.52 & 0 & 0.48 & 0 & 0 & 0 & 0 & 0 \cr 3 & 0 & 0 & 0.52 & 0 & 0.48 & 0 & 0 & 0 & 0 \cr 4 & 0 & 0 & 0 & 0.52 & 0 & 0.48 & 0 & 0 & 0 \cr 5 & 0 & 0 & 0 & 0 & 0.52 & 0 & 0.48 & 0 & 0 \cr 6 & 0 & 0 & 0 & 0 & 0 & 0.52 & 0 & 0.48 & 0 \cr 7 & 0 & 0 & 0 & 0 & 0 & 0 & 0.52 & 0 & 0.48 \cr 8 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

There are two absorbing states in this chain, state 0 and state 8 (ruin or winning all units). The transient states are 1, 2, 3, 4, 5, 6 and 7. As discussed in this post, $Q$ is the matrix consisting of all transition probabilities $P_{ij}$ from the transient states into the transient states. Then derive the matrix $I-Q$ and compute its inverse.

$I-Q = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 & 7 \cr 1 & 1 & -0.48 & 0 & 0 & 0 & 0 & 0 \cr 2 & -0.52 & 1 & -0.48 & 0 & 0 & 0 & 0 \cr 3 & 0 & -0.52 & 1 & -0.48 & 0 & 0 & 0 \cr 4 & 0 & 0 & -0.52 & 1 & -0.48 & 0 & 0 \cr 5 & 0 & 0 & 0 & -0.52 & 1 & -0.48 & 0 \cr 6 & 0 & 0 & 0 & 0 & -0.52 & 1 & -0.48 \cr 7 & 0 & 0 & 0 & 0 & 0 & -0.52 & 1 \cr } \qquad$

$W=(I-Q)^{-1} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 & 7 \cr 1 & 1.7444 & 1.4316 & 1.1429 & 0.8763 & 0.6303 & 0.4032 & 0.1935 \cr 2 & 1.5509 & 2.9825 & 2.3810 & 1.8257 & 1.3131 & 0.8399 & 0.4032 \cr 3 & 1.3413 & 2.5794 & 3.7223 & 2.8541 & 2.0528 & 1.3131 & 0.6303 \cr 4 & 1.1142 & 2.1426 & 3.0920 & 3.9683 & 2.8541 & 1.8257 & 0.8763 \cr 5 & 0.8681 & 1.6695 & 2.4092 & 3.0920 & 3.7223 & 2.3810 & 1.1429 \cr 6 & 0.6016 & 1.1569 & 1.6695 & 2.1426 & 2.5794 & 2.9825 & 1.4316 \cr 7 & 0.3128 & 0.6016 & 0.8681 & 0.1142 & 1.3413 & 1.5509 & 1.7444 \cr } \qquad$

The fundamental matrix $(I-Q)^{-1}$ gives a wealth of information about the prospect for the gambler in question. Suppose the gambler initially has 3 units in capital. The sum of the third row is 14.49, roughly 14.5. So on average the gambler makes 14.5 bets before reaching one of the two absorbing states (in ruin or owning all 8 units). With one bet = one period of time, the gambler spends on average 3.72 periods of time in state 3. Note that most of the time the gambler is in the lower capital states. The gambler that has initially 3 units only spends on average 0.6303 amount of time in state 7, an indication that the gambler does not have a great chance of winning all 8 units.

The matrix $W \times R$ gives the probabilities of absorption.

$R = \bordermatrix{ & 0 & 8 \cr 1 & 0.52 & 0 \cr 2 & 0 & 0 \cr 3 & 0 & 0 \cr 4 & 0 & 0 \cr 5 & 0 & 0 \cr 6 & 0 & 0 \cr 7 & 0 & 0.48 \cr } \qquad$

$W \times R = (I-Q)^{-1} \times R= \bordermatrix{ & 0 & 8 \cr 1 & 0.9071 & 0.0929 \cr 2 & 0.8065 & 0.1935 \cr 3 & 0.6975 & 0.3025 \cr 4 & 0.5794 & 0.4206 \cr 5 & 0.4514 & 0.5486 \cr 6 & 0.3128 & 0.6872 \cr 7 & 0.1627 & 0.8373 \cr } \qquad$

The first column of $W \times R$ gives the probabilities of ruin for the gambler. The smaller the initial capital, the greater the probability of ruin. With 3 units initially, the gambler has an almost 70% chance of ruin.

The fundamental matrix of an absorbing Markov chain is central to the technique described here. We now attempt to give an indication why the technique works.

Let $\mathbf{P}$ be the transition probability transition matrix for an absorbing Markov chain. Suppose that $\mathbf{P}$ is stated as follows

$\mathbf{P}=\left[\begin{array}{rr} Q & R \\ \mathbf{0} & I \\ \end{array}\right]$

such that $Q$ is the submatrix of $\mathbf{P}$ that consists of the one-step transition probabilities between transient states and $R$ is the submatrix of $\mathbf{P}$ that consists of the one-step transition probabilities from transient states into absorbing states. The fundamental matrix is $W=(I-Q)^{-1}$, which is the inverse of the matrix $I-Q$ where $I$ is the identity matrix of the same size as $Q$.

We sketch an answer to two questions. Why each entry $W_{ij}$ in the fundamental matrix $W$ is the mean time the process spent in state $j$ given that the process starts in state $i$? Why the matrix $W \times R$ gives the probabilities of absorption?

Note that the entries in $Q^n$, the $n$th power of the matrix $Q$, approach zero as $n$ becomes large. This is because $Q$ contains the one-step transition probabilities of the transient states (once in a transient state, the process has to leave the transient states at some point). We have the following equality.

$W=I+Q+Q^2+Q^3+\cdots+Q^n+\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

To see the equality (3), fix transient states $i$ and $j$ with $i \ne j$. We show the following:

$W_{ii}=1+P_{ii}+P_{ii}^2+P_{ii}^3+\cdots+P_{ii}^n+\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

$W_{ij}=P_{ij}+P_{ij}^2+P_{ij}^3+\cdots+P_{ij}^n+\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

Suppose the process starts at state $i$. Define the indicator variables $Y_n$ and $Z_n$ for $n \ge 1$. At time $n$, if the process is in state $i$, let $Y_n=1$. Otherwise $Y_n=0$. Likewise, at time $n$, if the process is in state $j$, let $Z_n=1$. Otherwise $Z_n=0$. By definition,

$\displaystyle 1+Y_1+Y_2+Y_3+\cdots+Y_n+\cdots$

$\displaystyle Z_1+Z_2+Z_3+\cdots+Z_n+\cdots$

are the times spent in state $i$ and in state $j$, respectively. The 1 in the first random time accounts for the fact that the process is already in state $i$ initially. Because we are dealing with transient states, only a finite number of $Y_n$ and $Z_n$ can be 1’s. So these two random time variables are well defined. Each $Y_n$ is a Bernoulli random variable with $P[Y_n=1]=P_{ii}^n$. Likewise each $Z_n$ is a Bernoulli random variable with $P[Z_n=1]=P_{ij}^n$. The following gives (4) and (5).

\displaystyle \begin{aligned} W_{ii}&=1+E[Y_1]+E[Y_2]+\cdots+E[Y_n]+\cdots \\&=1+P_{ii}+P_{ii}^2+\cdots+P_{ii}^n+\cdots \end{aligned}

\displaystyle \begin{aligned} W_{ij}&=E[Z_1]+E[Z_2]+\cdots+E[Z_n]+\cdots \\&=P_{ij}+P_{ij}^2+\cdots+P_{ij}^n+\cdots \end{aligned}

Expressing equalities (4) and (5) in matrix form gives the equality (3). The following derivation shows that $W=(I-Q)^{-1}$.

\displaystyle \begin{aligned} W&=I+Q+Q^2+Q^3+\cdots+Q^n+\cdots \\&=I+Q (I+Q+Q^2+Q^3+\cdots+Q^n+\cdots) \\& = I+ Q W\end{aligned}

$W-Q W=I$

$W (I-Q)=I$

Thus the mean times spent in the transient states are obtained by taking the inverse of the matrix $I-Q$. Based on the derivation, the sum $I+Q+Q^2+Q^3+\cdots+Q^n$ for a sufficiently large $n$ would give a good approximation of the waiting time matrix $W$. Of course, computationally speaking, it is much easier to compute the inverse matrix of $I-Q$.

For the second question, it is instructive to examine the power of $\mathbf{P}$. By induction on the power $n$, $\mathbf{P}^n$ can be expressed as follows:

$\mathbf{P}^n=\left[\begin{array}{rr} Q^n & M_n \\ \mathbf{0} & I \\ \end{array}\right]=\left[\begin{array}{rr} Q^n & (I+Q+Q^2+Q^3+\cdots+Q^{n-1}) R \\ \mathbf{0} & I \\ \end{array}\right]$

The entries in the matrix $M_n$ are the $n$-step transition probabilities from a transient state to an absorbing state. For example, if the starting state is the transient state $i$ and $k$ is an absorbing state, then $P_{ik}^n$ is an entry of $M_n=(I+Q+Q^2+Q^3+\cdots+Q^{n-1}) R$. Starting at a transient state, the process will be absorbed at some point in time. So the matrix $M_n$ gives good approximation for probabilities of absorption when $n$ is large. Note that $M_n=(I+Q+Q^2+Q^3+\cdots+Q^{n-1}) R$ approaches $W \times R$ as $n$ goes to infinity.

The type of Markov chains discussed here are called absorbing Markov chains, the subject of next post.

Practice Problems

Practice problems to reinforce the concepts discussed here are available in a companion blog. There are two problem sets. The first one is here and the second one is here.

Dan Ma Markov chains

Daniel Ma Markov chains

$\copyright$ 2017 – Dan Ma

# Introductory examples on first step analysis

This post gives some examples to demonstrate the useful technique called first step analysis. A great number of problems involving Markov chains can be evaluated by this technique. As we demonstrate here, the general idea of the method is to break down the possibilities resulting from the first step (first transition) in the Markov chain. Then use the law of total probability and Markov property to derive a set of relationship among the unknown variables. This topic will be further developed in subsequent posts. We start with the following examples.

Example 1
Consider a Markov process that cycles through the states 0, 1 and 2 according to the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 1 & 0 & 0 \cr 1 & P_{10} & P_{11} & P_{12} \cr 2 & 0 & 0 & 1 \cr } \qquad$

Once the process reaches state 0 or state 2, it stays there forever. Hence these two states are called absorbing states. If the process starts at state 1, the process may stay in state 1 for a duration but eventually the process will leave state 1 and enter one of the absorbing states. Here are two natural questions.

• Which state is more likely (state 0 or state 2) for the process to enter into if the process starts at state 1?
• On average how long does it take to reach one of the absorbing states?

To answer these questions, we define the following random variable.

$T=\{ n \ge 0: X_n=0 \text{ or } X_n=2 \}$

The random variable $T$ is the amount of time it takes for the process to be absorbed (to enter into one of the absorbing states). In other words, $T$ is the time to absorption. Then the two questions are answered by finding the following two quantities.

$\displaystyle U=P[X_T=0 \lvert X_0=1]$

$\displaystyle V=E[T \lvert X_0=1]$

The event $X_T=0$ is the event that the process is absorbed into state 0. The quantity $U$ is the probability of being absorbed into state 0 if the starting state is 1. The quantity $V$ is mean time to absorption if the starting state is 1. Both are conditional quantities with $U$ being a conditional probability and $V$ being a conditional mean. Of course, once $U$ is known, $1-U$ would be the probability $P[X_T=2 \lvert X_0=1]$.

First, we calculate the quantity $U$. Consider the first step in the process. There are three possibilities, which are $X_1=0$, $X_1=1$ and $X_1=2$, with probabilities $P_{10}$, $P_{11}$ and $P_{12}$, respectively. In the first case, $T=1$ and $X_T=0$ (this means that the probability of absorption into state 0 at time 1 is 1). In the second case, $T>1$ and the problem repeats itself (this means that the probability of absorption into state 0 at time 1 is $U$ again). In the third case, $T=1$ and $X_T=2$ (the probability of absorption into state 0 at time 1 is 0). Weighting these three cases by their respective probabilities gives the following equation.

$U=P_{10} \times 1+P_{11} \times U+P_{12} \times 0$

Solving for $U$ gives the following.

$\displaystyle U=\frac{P_{10}}{1-P_{11}}=\frac{P_{10}}{P_{10}+P_{12}}$

The quantity $U$ is simply the conditional probability of a transition into state 0 given that a transition into state 0 or state 2 has occurred. So far the result from the first step analysis is encouraging.

To determine the quantity $V$, consider the first step again. Starting at state 1, the time to absorption is at least 1 (it takes at least one step to get to 0 or 2). There are the same three possibilities for the first step – $X_1=0$, $X_1=1$ and $X_1=2$. After the first step is taken, let’s look at the mean time to absorption in each case. In the first case the process is absorbed into state 0 at time 1. So the mean time to absorption at time 1 is 0. In the second case, the process goes back to its starting point. So the mean time to absorption at time 1 is $V$. In the third case, the process is absorbed into state 2. The mean time to absorption at time 1 is 0. Weighting these three cases by their respective probabilities gives the following equation.

$V=1+P_{10} \times 0+P_{11} \times V+P_{12} \times 0$

Solving for $V$ produces the following answer.

$\displaystyle V=\frac{1}{1-P_{11}}$

Note that if $P_{11}$ is large (i.e. it is highly likely that the process keeps staying in state 1), then it will make more steps on average to reach 0 or 2. Thus both $U$ and $V$ make sense.

We would like to comment that in this small example (only one transient state), the problem can be computed directly from the time to absorption $T$. With starting state being 1, each step of the process is like a Bernoulli event – either staying in state 1 or being absorbed into 0 or 2. Consider absorption as a success. The probability of success at each step is thus $1-P_{11}$. Then the number of steps it takes to be absorbed (i.e. the random variable $T$ as described above) has a geometric distribution with mean $V$ is shown above. Thus we can solve this simple problem (with only one transient state) without using first step analysis. But for a larger problem, the time to absorption does not have a simple characterization. The next example further demonstrates how first step analysis is done.

Example 2
Consider a process that cycles through three states (0, 1, 2 and 3) according to the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 \cr 0 & 1 & 0 & 0 & 0 \cr 1 & P_{10} & P_{11} & P_{12} & P_{13} \cr 2 & P_{20} & P_{21} & P_{22} & P_{23} \cr 3 & 0 & 0 & 0 & 1 \cr } \qquad$

In this 4-state process, state 0 and state 3 are absorbing states. State 1 and state 2 are transient states. We are interested in the same questions: what is the probability of being absorbed into state 0? What is the mean time to absorption? The time to absorption (into 0 or 3) depends on the starting state (1 or 2). To answer these questions, we state the following quantities.

$U_i=P[X_T=0 \lvert X_0=i] \ \ \ \ i=1,2$

$V_i=E[T \lvert X_0=i] \ \ \ \ \ \ \ \ \ \ i=1,2$

The subscripts in the $U$ and $V$ reflect the starting state of the process. Based on the first reasoning, the following gives the equations involving $U_i$ and the equations involving $V_i$.

\displaystyle \begin{aligned}&U_1=P_{10}+P_{11} \times U_1+P_{12} \times U_2 +P_{13} \times 0 \\&U_2=P_{20}+P_{21} \times U_1+P_{22} \times U_2 +P_{23} \times 0 \end{aligned}

\displaystyle \begin{aligned}&V_1=1+P_{10} \times 0+P_{11} \times V_1+P_{12} \times V_2 +P_{13} \times 0 \\&V_2=1+P_{20} \times 0+P_{21} \times V_1+P_{22} \times V_2 +P_{23} \times 0 \end{aligned}

The following is the derivation for the equation for $U_1$.

\displaystyle \begin{aligned} U_1=P[X_T=0 \lvert X_0=1]&=\sum \limits_{k=0}^3 P[X_1=k \lvert X_0=1] \times P[X_T=0 \lvert X_0=1,X_1=k] \\&=\sum \limits_{k=0}^3 P_{1k} \times P[X_T=0 \lvert X_1=k] \\&=P_{10} \times 1 +P_{11} \times U_1+P_{12} \times U_2 +P_{13} \times 0 \end{aligned}

Note that the first line shows the four cases for the first step weighted by their probabilities (the law of total probability). The second line uses the Markov property. The third line lists out the 4 cases. the following shows the derivation of $V_1$.

\displaystyle \begin{aligned} V_1=E[T \lvert X_0=1]&=1+\sum \limits_{k=0}^3 P[X_1=k \lvert X_0=1] \times E[T \lvert X_0=1,X_1=k] \\&=1+\sum \limits_{k=0}^3 P_{1k} \times E[T \lvert X_1=k] \\&=1+ P_{10} \times 0 +P_{11} \times V_1+P_{12} \times V_2 +P_{13} \times 0 \end{aligned}

We carry out the numerical calculation using the following matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 \cr 0 & 1 & 0 & 0 & 0 \cr 1 & 0.3 & 0.3 & 0.2 & 0.2 \cr 2 & 0.15 & 0.2 & 0.25 & 0.4 \cr 3 & 0 & 0 & 0 & 1 \cr } \qquad$

Plugging in the numerical values gives the following two sets of equations.

\displaystyle \begin{aligned}&U_1=0.3+0.3 \times U_1+0.2 \times U_2 \\&U_2=0.15+0.2 \times U_1+0.25 \times U_2 \end{aligned}

\displaystyle \begin{aligned}&V_1=1+0.3 \times V_1+0.2 \times V_2 \\&V_2=1+0.2 \times V_1+0.25 \times V_2 \end{aligned}

The solutions are:

$\displaystyle U_1=\frac{51}{97}=0.5278$

$\displaystyle U_2=\frac{33}{97}=0.34$

$\displaystyle V_1=\frac{9.5}{4.85}=1.9588$

$\displaystyle V_2=\frac{9}{4.85}=1.85567$

It is more likely to reach state 0 if the starting position is 1 (53% chance). If the process starts from state 2, it is more likely to reach state 3 (only 34% chance to reach state 0). However, the average number of steps it takes to reach state 0 is about the same (either starting from 1 or 2).

One More Example

We present one more example. This example will be used in the next post to demonstrate another angle of first step analysis.

Example 3
Consider the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 1 & 0 & 0 & 0 & 0 \cr 1 & 0.4 & 0.3 & 0.1 & 0.1 & 0.1 \cr 2 & 0.1 & 0.1 & 0.5 & 0.2 & 0.1 \cr 3 & 0.1 & 0.2 & 0.1 & 0.5 & 0.1 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The state space is one state bigger than the two examples shown above. There are two absorbing states as in the two previous examples. We focus on the same two problems. Given a starting state (1, 2 or 3), we want to see how fast the process transition to state 0 (one of the absorbing states). Secondly, we want to see on average how many steps it takes the process to transition to one of the absorbing states.

As before let $T$ the time it takes the process to be absorbed into 0 or 4. More specifically let $T$ be defined by $T=\{ n \ge 0: X_n=0 \text{ or } X_n=4 \}$. As before if the starting state is $i$, let $U_i$ be the probability of the process being absorbed into state 0 and let $V_i$ be the expected time to absorption (into state 0 or state 4). More specifically, define $U_i$ and $V_i$ as follows:

$U_i=P[X_T=0 \lvert X_0=i] \ \ \ \ i=1,2,3$

$V_i=E[T \lvert X_0=i] \ \ \ \ \ \ \ \ \ \ i=1,2,3$

Applying the first step analysis produces the following two sets of equations.

\displaystyle \begin{aligned}&U_1=0.4+0.3 \times U_1+0.1 \times U_2+0.1 \times U_3 \\&U_2=0.1+0.1 \times U_1+0.5 \times U_2+0.2 \times U_3 \\&U_3=0.1+0.2 \times U_1+0.1 \times U_2+0.5 \times U_3 \end{aligned}

\displaystyle \begin{aligned}&V_1=1+0.3 \times V_1+0.1 \times V_2+0.1 \times V_3 \\&V_2=1+0.1 \times V_1+0.5 \times V_2+0.2 \times V_3 \\&V_3=1+0.2 \times V_1+0.1 \times V_2+0.5 \times V_3 \end{aligned}

For a reason that will be made clear in the next post, we rearrange the equations by putting the variables $U_i$ or $V_i$ on one side and the constant terms on the other side.

$\displaystyle \begin{array}{rcr} \displaystyle 0.7 \times U_1-0.1 \times U_2-0.1 \times U_3 & = & 0.4 \\ \displaystyle -0.1 \times U_1+0.5 \times U_2-0.2 \times U_3 & = & 0.1 \\ \displaystyle -0.2 \times U_1-0.1 \times U_2+0.5 \times U_3 & = & 0.1 \end{array}$

$\displaystyle \begin{array}{rcr} \displaystyle 0.7 \times V_1-0.1 \times V_2-0.1 \times V_3 & = & 1 \\ \displaystyle -0.1 \times V_1+0.5 \times V_2-0.2 \times V_3 & = & 1 \\ \displaystyle -0.2 \times V_1-0.1 \times V_2+0.5 \times V_3 & = & 1 \end{array}$

Solving these two systems of equations produces the following solutions.

Starting State Probability of Absorption to State 0 Expected Time to Absorption
1 $\displaystyle U_1=\frac{35}{47}=0.745$ $\displaystyle V_1=\frac{360}{141}=2.55$
2 $\displaystyle U_2=\frac{28}{47}=0.596$ $\displaystyle V_2=\frac{350}{141}=2.48$
3 $\displaystyle U_3=\frac{29}{47}=0.617$ $\displaystyle V_3=\frac{540}{141}=3.83$

The introductory examples here show that first step analysis on time to absorption involves the solving of systems of linear equations. Thus the larger the state space in the Markov chain, the larger of the systems of equations. Using software when the matrix is large is an excellent approach. First step analysis can also be approached in other angles. Applications of first step analysis will be presented. The technique of first step analysis will be further explored in subsequent posts.

Dan Ma Markov chains

Daniel Ma Markov chains

$\copyright$ 2017 – Dan Ma

# Collapsing states into an absorbing state

This post discusses the technique of collapsing a set of states into an absorbing state to order to make certain Markov chain problems easier. We motivate the technique with the following examples. The method discussed here is to raise the new transition probability matrix (the one with the collapsed absorbing state) to an appropriate $n$th power (Chapman-Kolmogorov equations are discussed here). Other applications of Markov chains are discussed here. Markov chains are applied to the occupancy problem in this previous post.

Example 1
On any given day, the weather condition in a city is either sunny, cloudy (but not rainy) or rainy. The weather condition on a given day is identical to that of the previous day with probability 0.5. The weather condition on a given day is equally likely to be one of the other two weather conditions if it is different from the previous day. Suppose that it is sunny today in this city. Find the probability that none of the next five days will be rainy.

Example 2
A fair coin is flipped repeatedly until a run of 4 consecutive heads appear. We want to know how many flips are required to achieve this goal. Specifically we have the following questions.

• Find the probability that it takes at most 9 flips to get a run of 4 consecutive heads.
• Find the probability that it takes at exactly 9 flips to get a run of 4 consecutive heads.

Example 3
A maze is divided into 9 areas as shown below.

Area 1 = initial position of the mouse

A mouse is placed in area 1 initially. Suppose that areas 3, 7 and 9 contain food and that the other areas in the maze do not contain food. Further suppose that the mouse moves through the areas in the maze at random. That is, whenever the mouse is in an area that has $w$ exits, the next move is to one of these $w$ areas with probability $1/w$.

• Find the probability that after the 5 moves, the mouse is one area from a food source.
• Find the probability that after the 7 moves, the mouse has not found food and is in area 6 or area 8 after the 7th move.
• Find the probability that it takes at most 8 moves for the mouse to find food.
• Find the probability that it takes exactly 8 moves for the mouse to find food.

Working Example 1

We work Example 1 to illustrate the method. The following is the transition probability matrix for Example 1 where state 0 = sunny weather, state 1 = cloudy weather and state 2 = rainy weather.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.50 & 0.25 & 0.25 \cr 1 & 0.25 & 0.50 & 0.25 \cr 2 & 0.25 & 0.25 & 0.50 \cr } \qquad$

Since it is currently a sunny day, $X_0=0$. The probability of not having a rainy day in the next 5 days is the probability $P[X_5 \le 1, X_4 \le 1, X_3 \le 1, X_2 \le 1, X_1 \le 1 \lvert X_0=0]$. This is the probability that $X_j=0$ or $X_j=1$ for the next 5 periods given the current state is 0. Because the range of movement is restricted, using the Chapman-Kolmogorov equations, i.e. finding the 5th power of the matrix $\mathbf{P}$, does not work. For example, though $P_{01}^5$ is the probability that the chain will end up in state 1 at time 5 given that it is in state 0 at time 0, the probability would include the possibilities of a rainy day (state 2) in between time 0 and time 5. The technique is to make the state 2 (the state that is not to be visited in the next 5 days) an absorbing state. Doing so produces a new transition probability matrix.

$\mathbf{Q} = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.50 & 0.25 & 0.25 \cr 1 & 0.25 & 0.50 & 0.25 \cr 2 & 0 & 0 & 1 \cr } \qquad$

The matrix $\mathbf{Q}$ is obtained by changing state 2 in the matrix $\mathbf{P}$ an absorbing state (i.e. the entry in the row for state 2 and in the column for state 2 has probability 1). It is a valid transition probability matrix since the sum of each row is 1. So it describes a Markov chain. How does the new Markov chain relate to the original Markov chain? The following is the 5th power of $\mathbf{Q}$.

$\displaystyle \mathbf{Q}^5 = \bordermatrix{ & 0 & 1 & 2 \cr 0 & \frac{122}{1024} &\frac{121}{1024} & \frac{781}{1024} \cr \text{ } & \text{ } & \text{ } & \text{ } \cr 1 & \frac{121}{1024} & \frac{122}{1024} & \frac{781}{1024} \cr \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0 & 1 \cr } \qquad = \bordermatrix{ & 0 & 1 & 2 \cr 0 & 0.11914 & 0.11816 & 0.76270 \cr \text{ } & \text{ } & \text{ } & \text{ } \cr 1 & 0.11816 & 0.11914 & 0.76270 \cr \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0 & 1 \cr } \qquad$

The transition probability $Q_{02}^5=0.7627$ is the probability of transitioning in the new process from state 0 at time 0 to state 2 at time 5 (the 5th day). Starting at state 0, there is a 76.27% chance that the new process will reach state 2 at time 5. This means that the original process will have reached state 2 at time 5 or prior to time 5. The complement of this probability is $Q_{00}^5+Q_{01}^5=0.2373$. So there is a 23.73% chance the original process will never reach state 2 at any time in the first 5 periods. If the weather is indeed governed the chain described in $\mathbf{P}$, it is likely (76.27% chance) that there will be a rainy day at some point within the next 5 periods (given that today is sunny). In fact, even if today is cloudy, there is also a 76.27% chance that there will be a rainy day within the next 5 days.

Discussion

Before working Example 2, let’s discuss the idea of collapsing states into an absorbing state. Consider a Markov chain with transition probability matrix $\mathbf{P}$. Let $\mathcal{S}$ be the state space of this chain. Let $\mathcal{M}$ be a set of states. Suppose that the states in $\mathcal{M}$ are either desirable to enter in or states that are negative situations to be avoided. In case of $\mathcal{M}$ being desirable or attractive, we wish to find the probability of entering into a state in $\mathcal{M}$ after a number of transitions. In the case of $\mathcal{M}$ being negative situations, we wish to find the probability of not in entering into any state in $\mathcal{M}$ after a number of transitions.

The key to finding these probabilities is to collapse the states in $\mathcal{M}$ into a state called $M$ that is an integer that is different from all states $i$ not in $\mathcal{M}$. For convenience, we let $M$ be the smallest integer that is greater than all states not in $\mathcal{M}$. For example, if $\mathcal{S}=\left\{0,1,2,3,4,5 \right\}$, and $\mathcal{M}=\left\{4,5 \right\}$, then $M=4$.

Now define a new Markov chain by letting all states not in $\mathcal{M}$ remain the same and by collapsing all states in $\mathcal{M}$ into the new state $M$. The new state $M$ is made an absorbing state. The state space of the new Markov chain is the original states not in $\mathcal{M}$ along with the new state $M$. Let $\mathbf{Q}$ be the transition probability matrix of the new Markov chain. The transition probabilities in $\mathbf{Q}$ are defined as follows:

$\displaystyle Q_{ij} = \left\{ \begin{array}{ll} \displaystyle P_{ij} &\ \ \ \ \ \ i \notin \mathcal{M}, j \notin \mathcal{M} \\ \text{ } & \text{ } \\ \displaystyle \sum \limits_{k \in \mathcal{M}} P_{ik} &\ \ \ \ \ \ i \notin \mathcal{M}, j=M \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=M, j=M \\ \end{array} \right.$

The transition probability matrix $\mathbf{Q}$ is derived from and is closely related to the matrix $\mathbf{P}$. For states $i,j$ not in $\mathcal{M}$, the transition probabilities are the same in both matrices (the first condition). The second condition says that $Q_{iM}$ is the sum of all $P_{ik}$ where $k \notin \mathcal{M}$. Thus $Q_{iM}$ is the probability of the original process entering one of the states in $\mathcal{M}$ when starting from a state $i \notin \mathcal{M}$. Since $M$ is an absorbing state, $Q_{MM}=1$.

To calculate the two probabilities concerning the process for $\mathbf{P}$ indicated earlier, compute powers of the matrix $\mathbf{Q}$. In the new Markov process for the matrix $\mathbf{Q}$, there are two kinds of states, the states $i \notin \mathcal{M}$ and the collapsed state $M$. Whenever the process is in state $i \ne M$, the process is not yet absorbed (into the state $M$). When the process starts at a state not in $M$ (i.e. not yet absorbed), the process will eventually be absorbed (after some number of transitions).

Consider $Q_{iM}^n$ where $i \ne M$. This is the probability that the process starts in state $i$ (not yet absorbed) and is absorbed at time $n$. This is the connection between the two processes: being absorbed into state $M$ in the new process at time $n$ means that the original process enters a state in $\mathcal{M}$ in some period prior to time $n$ or at time $n$. We have the following summary.

Consider a Markov process $\left\{X_n:n \ge 0 \right\}$ with the transition probability matrix $\mathbf{P}$. Collapse $\mathcal{M}$, a set of states, into an absorbing state called $M$ in the manner described above. Let $\mathbf{Q}$ be the resulting transition probability matrix of the new Markov chain. For $i \ne M$, $Q_{iM}^n$ is the probability of absorption given that the process starts out not being absorbed, i.e. the probability that the original process enters a state in $\mathcal{M}$ at time $n$ or prior to time $n$ given that initially the process is in state $i$. Equivalently $Q_{iM}^n=P[X_k \in \mathcal{M} \text{ for some } k=1,2,\cdots,n \lvert X_0=i]$.

The above observation gives a way to interpret the complement $1-Q_{iM}^n$. It would be the probability of the original process never entering any state in $\mathcal{M}$ during the $n$ time periods (given that the process starts at $i \notin \mathcal{M}$). This is the answer to Example 1. We have the following summary.

The complement of $Q_{iM}^n$ is $1-Q_{iM}^n$. It is the probability that the original process never enters any of the states in $\mathcal{M}$ during the $n$ time periods (given that the process starts at $i \notin \mathcal{M}$). Equivalently, $1-Q_{iM}^n$ can be stated as $1-Q_{iM}^n=P[X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n \lvert X_0=i]$. Note that the probability $1-Q_{iM}^n$ is also the sum $\sum_{j \notin \mathcal{M}} Q_{ij}^n$. Specifically, for $j \notin \mathcal{M}$, $Q_{ij}^n$ can be stated as $Q_{ij}^n=P[X_n=j, X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n-1 \lvert X_0=i]$.

The probability $Q_{iM}^n$ is the probability of absorption on or prior to time $n$. Another probability of interest is the probability of absorption exactly at time $n$.

As described above, let $\mathbf{Q}$ be the resulting transition probability matrix of the new Markov chain after collapsing some states into an absorbing state. For $i \ne M$, $Q_{iM}^n-Q_{iM}^{n-1}$ is the probability of absorption at time $n$ given that the process starts out not being absorbed, i.e. the probability that the original process enters a state in $\mathcal{M}$ at time $n$ given that initially the process is in state $i$. Equivalently $Q_{iM}^n-Q_{iM}^{n-1}=P[X_n \in \mathcal{M} \text{ and } X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n-1 \lvert X_0=i]$.

In a nutshell, the idea of collapsing states into an absorbing state is to inform on two probabilities concerning the original process. To illustrate this process, consider the following example.

Example 0
Consider a Markov chain with the following transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 0.6 & 0.08 & 0.08 & 0.08 & 0.08 & 0.08 \cr 1 & 0.5 & 0.10 & 0.10 & 0.10 & 0.10 & 0.10 \cr 2 & 0.4 & 0.12 & 0.12 & 0.12 & 0.12 & 0.12 \cr 3 & 0.3 & 0.14 & 0.14 & 0.14 & 0.14 & 0.14 \cr 4 & 0.2 & 0.16 & 0.16 & 0.16 & 0.16 & 0.16 \cr 5 & 0.1 & 0.18 & 0.18 & 0.18 & 0.18 & 0.18 \cr } \qquad$

Now collapse state 4 and state 5 into a new state called 4. The following is the resulting matrix $\mathbf{Q}$.

$\mathbf{Q} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 0.6 & 0.08 & 0.08 & 0.08 & 0.16 \cr 1 & 0.5 & 0.10 & 0.10 & 0.10 & 0.20 \cr 2 & 0.4 & 0.12 & 0.12 & 0.12 & 0.24 \cr 3 & 0.3 & 0.14 & 0.14 & 0.14 & 0.28 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The following is the 6th power of $\mathbf{Q}$.

$\mathbf{Q}^6 = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 0.195466 & 0.034574 & 0.034574 & 0.034574 & 0.700812 \cr 1 & 0.184167 & 0.032578 & 0.032578 & 0.032578 & 0.718099 \cr 2 & 0.172869 & 0.030582 & 0.030582 & 0.030582 & 0.735386 \cr 3 & 0.161570 & 0.028586 & 0.028586 & 0.028586 & 0.752673 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

As a result, $Q_{04}^6=0.700812$ is the probability that the new process will be in state 4 at time 4 (given the process starts at state 0). This means that it is the probability that the original process will reach state 4 or state 5 by time 6 (at or before time 6) given the original process starts in state 0. Note that regardless of the initial state, the probability of being absorbed into state 4 or state 5 by time 6 is roughly the same (70% to 75%).

On the other hand, $Q_{04}^5=0.631665$. Thus $Q_{04}^6-Q_{04}^5=0.700812-0.631665=0.069147$. Thus there is a 6.9% chance that absorption will take place exactly at time 6.

More Examples

We now work Example 2 and Example 3.

Example 2 (Statement of the example given above)
Define a Markov chain with states 0, 1, 2, 3 and 4 with $i \le 3$ meaning that there is currently a run of $i$ consecutive heads (the last $i$ flips are heads) and with state 4 meaning that a run of 4 consecutive heads has already occurred (the last 4 or more flips are heads). Thus state 4 is an absorbing state. The following is the transition probability matrix.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 0.5 & 0.5 & 0 & 0 & 0 \cr 1 & 0.5 & 0 & 0.5 & 0 & 0 \cr 2 & 0.5 & 0 & 0 & 0.5 & 0 \cr 3 & 0.5 & 0 & 0 & 0 & 0.5 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

Here’s how $\mathbf{P}$ is derived. For example, if there is currently a run of 2 heads (the third row), the next state is either 0 (the next flip is a tail) or 3 (the next flip is a head) with equal probability. To find the two probabilities in question, raise $\mathbf{P}$ to the 8th and 9th powers.

$\mathbf{P}^9 = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 0.406250 & 0.210938 & 0.109375 & 0.056641 & 0.216797 \cr 1 & 0.376953 & 0.195313 & 0.101563 & 0.052734 & 0.273438 \cr 2 & 0.320313 & 0.166016 & 0.085938 & 0.044922 & 0.382813 \cr 3 & 0.210938 & 0.109375 & 0.056641 & 0.029297 & 0.593750 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

$\mathbf{P}^8 = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr 0 & 0.421875 & 0.218750 & 0.113281 & 0.058594 & 0.187500 \cr 1 & 0.390625 & 0.203125 & 0.105469 & 0.054688 & 0.246094 \cr 2 & 0.332031 & 0.171875 & 0.089844 & 0.046875 & 0.359375 \cr 3 & 0.218750 & 0.113281 & 0.058594 & 0.031250 & 0.578125 \cr 4 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The starting state is 0 since the coin flips are counted from the very first one. With $P_{04}^9=0.216797$, we know that the probability of taking 9 flips or less to get 4 consecutive heads is 0.216797.

With $P_{04}^9-P_{04}^8=0.216797-0.187500=0.029297$, there is a roughly 3% chance that the run of 4 consecutive heads is achieved at the 9th flip.

Example 3 (Statement of the example given above)
Let $X_n$ be the area the mouse is in after making $n$ moves. Then $\left\{X_n: n=0,1,2,\cdots \right\}$ is a Markov chain. The following is the transition probability matrix. To help see the matrix, the diagram of the maze is repeated.

Area 1 = initial position of the mouse

$\mathbf{P} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 \cr 1 & 0 & 1/2 & 0 & 0 & 0 & 1/2 & 0 & 0 & 0 \cr 2 & 1/3 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 0 \cr 3 & 0 & 1/2 & 0 & 1/2 & 0 & 0 & 0 & 0 & 0 \cr 4 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 \cr 5 & 0 & 1/4 & 0 & 1/4 & 0 & 1/4 & 0 & 1/4 & 0 \cr 6 & 1/3 & 0 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 0 \cr 7 & 0 & 0 & 0 & 0 & 0 & 1/2 & 0 & 1/2 & 0 \cr 8 & 0 & 0 & 0 & 0 & 1/3 & 0 & 1/3 & 0 & 1/3 \cr 9 & 0 & 0 & 0 & 1/2 & 0 & 0 & 0 & 1/2 & 0 \cr } \qquad$

States 3, 7 and 9 are desirable states since these are the areas with food. Let $\mathcal{M}=\left\{3,7,9 \right\}$. We then collapse $\mathcal{M}$ into a new state called 9. The state 9 in the new process is an absorbing state, which means that the original process has reached a state with food (states 3, 7 and 9). The following is the transition probability matrix for the new (collapsed) process.

$\mathbf{Q} = \bordermatrix{ & 1 & 2 & 4 & 5 & 6 & 8 & 9 \cr 1 & 0 & 1/2 & 0 & 0 & 1/2 & 0 & 0 \cr 2 & 1/3 & 0 & 0 & 1/3 & 0 & 0 & 1/3 \cr 4 & 0 & 0 & 0 & 1/3 & 0 & 0 & 2/3 \cr 5 & 0 & 1/4 & 1/4 & 0 & 1/4 & 1/4 & 0 \cr 6 & 1/3 & 0 & 0 & 1/3 & 0 & 0 & 1/3 \cr 8 & 0 & 0 & 0 & 1/3 & 0 & 0 & 2/3 \cr 9 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The state 9 in the matrix $\mathbf{Q}$ refers to the situation in the original process reaching states 3, 7 or 9. Then all the questions in the example can be answered by raising $\mathbf{Q}$ to an appropriate power. In particular, raising $\mathbf{Q}$ to the 5th, 7th and 8th power.

$\mathbf{Q}^5 = \bordermatrix{ & 1 & 2 & 4 & 5 & 6 & 8 & 9 \cr 1 & 0 & 5/36 & 1/18 & 0 & 5/36 & 1/18 & 11/18 \cr 2 & 5/54 & 0 & 0 & 7/54 & 0 & 0 & 7/9 \cr 4 & 1/27 & 0 & 0 & 1/18 & 0 & 0 & 49/54 \cr 5 & 0 & 7/72 & 1/24 & 0 & 7/72 & 1/24 & 13/18 \cr 6 & 5/54 & 0 & 0 & 7/54 & 0 & 0 & 7/9 \cr 8 & 1/27 & 0 & 0 & 1/18 & 0 & 0 & 49/54 \cr 9 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

$\mathbf{Q}^7 = \bordermatrix{ & 1 & 2 & 4 & 5 & 6 & 8 & 9 \cr 1 & 0 & 17/216 & 7/216 & 0 & 17/216 & 7/216 & 7/9 \cr 2 & 17/324 & 0 & 0 & 2/27 & 0 & 0 & 283/324 \cr 4 & 7/324 & 0 & 0 & 5/162 & 0 & 0 & 307/324 \cr 5 & 0 & 1/18 & 5/216 & 0 & 1/18 & 5/216 & 91/108 \cr 6 & 17/324 & 0 & 0 & 2/27 & 0 & 0 & 283/324 \cr 8 & 7/324 & 0 & 0 & 5/162 & 0 & 0 & 307/324 \cr 9 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

$\mathbf{Q}^8 = \bordermatrix{ & 1 & 2 & 4 & 5 & 6 & 8 & 9 \cr 1 & 17/324 & 0 & 0 & 2/27 & 0 & 0 & 283/324 \cr 2 & 0 & 29/648 & 1/54 & 0 & 29/648 & 1/54 & 283/324 \cr 4 & 0 & 1/54 & 5/648 & 0 & 1/54 & 5/648 & 307/324 \cr 5 & 1/27 & 0 & 0 & 17/324 & 0 & 0 & 295/324 \cr 6 & 0 & 29/648 & 1/54 & 0 & 29/648 & 1/54 & 283/324 \cr 8 & 0 & 1/54 & 5/648 & 0 & 1/54 & 5/648 & 307/324 \cr 9 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The statements for the probabilities to be sought are repeated here.

• Find the probability that after the 5 moves, the mouse is one area from a food source.
• Find the probability that after the 7 moves, the mouse has not found food and is in area 6 or area 8 after the 7th move.
• Find the probability that it takes at most 8 moves for the mouse to find food.
• Find the probability that it takes exactly 8 moves for the mouse to find food.

To obtain the answers, it is a matter of picking the entries from the matrices.

• $Q_{12}^5+Q_{14}^5+Q_{16}^5+Q_{18}^5=7/18=0.3889$
• $Q_{16}^7+Q_{18}^7=17/216+7/216=24/216=0.1111$
• $Q_{19}^8=283/324=0.8735$
• $Q_{19}^8-Q_{19}^7=283/324-7/9=0.095679$

One area from a food source means being in areas 2, 4, 6 and 8. Thus the first probability is $Q_{12}^5+Q_{14}^5+Q_{16}^5+Q_{18}^5$. The probability of being in areas 6 or 8 after 7 moves is $Q_{16}^7+Q_{18}^7$. The probability that it will take 8 moves or less to reach food is $Q_{19}^8$. The probability of reaching food in exactly 8 moves is $Q_{19}^8-Q_{19}^7$.

With the matrices $\mathbf{Q}^5$, $\mathbf{Q}^7$ and $\mathbf{Q}^8$, we can also find out the probability of reaching food. Just from the diagram, it is clear that the likelihood of reaching food is greater if the initial position is area 8 or area 6. Between these two areas, it is easier to reach food if the mouse is in area 8 initially since area 8 is adjacent to two food sources. This is also confirmed by the calculation. For example, $Q_{69}^5=7/9=0.7778$ and $Q_{89}^5=49/54=0.9074$. So it is over 90% certain that the mouse will find food if it is placed in area 8 initially.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

# The occupancy problem

The occupancy problem is a classic problem in probability. The setting of the problem is that balls are randomly distributed into cells (or boxes or other containers) one at a time. After all the balls are distributed into the cells, we examine how many of the cells are occupied with balls. In this post, we examine the occupancy problem using Markov chains.

The Occupancy Problem

As indicated above, we consider the random experiment of randomly placing $k$ balls into $n$ cells. The following diagram shows the result of randomly distributing 8 balls into 6 cells (or boxes).

Figure 1 – Occupancy Problem – throwing balls into cells

One ball is thrown into the 6 boxes for a total of 8 balls. In Figure 1, a total of four cells are occupied. In the current discussion, the focus is on the number of occupied cells and not on which of the cells are occupied. Since Figure 1 has 6 cells, we can also view the throwing a ball into the 6 boxes as rolling a fair die (the occupied cell would be the face value that turns up). Thus we can view the experiment in Figure 1 a rolling a die 8 times. Then at the end, we look at the number faces that appear.

Figure 2 – Occupancy Problem – rolling die repeatedly and counting the faces that appear

The top of Figure 2 shows the 8 rolls of the die. Four faces turn up in these 8 rolls – faces 2, 4, 5 and 6, the same result described in Figure 1. Another way to look at the occupancy problem is that of random sampling. Each time a ball is thrown into $n$ cells, the action can be viewed as randomly selecting one of the $n$ cells. Thus the occupancy problem can be viewed as randomly selecting $k$ balls (one at a time with replacement) from an urn with $n$ balls labeled $1,2,\cdots,n$. The following diagram shows an urn with 6 balls labeled 1 through 6. Then Figure 1 and Figure 2 would be equivalent to randomly selecting 8 balls with replacement from this urn.

Figure 3 – Occupancy Problem – random selection of balls with replacement from an urn and counting the distinct numbers that are drawn

Figure 1 describes a generic setting of the occupancy problem – throwing balls at random into cells. The interpretation in Figure 2 is ideal for the setting of six cells. If the number of cells is not six, we can view the problem as rolling an $n$-sided die $k$ times. The random selection of balls with replacement (Figure 3) is also a good way to view the occupancy problem. Regardless of how we view the occupancy problem, we consider the following questions. In randomly assigning $k$ balls into $n$ cells,

• What is the probability that exactly $j$ cells are occupied with balls where $j=1,2,\cdots,n$?
• What is the expected number of occupied cells?

The first question is about the probability distribution of the number of occupied cells after the balls are thrown. The second question is about the mean of the probability distribution of the number of occupied cells. Once the probability distribution is known, we can calculate the distributional quantities such as the mean and variance. Thus the main focus is on the first question.

The occupancy problem has been discussed in a companion blog. In this blog post, the first question is answered by using the multinomial theorem (applied twice). Another blog post develops a formula for the probability distribution of the number of occupied cells. These previous blog posts use counting methods to solve the occupancy problem. For example, in throwing eight balls into 6 cells, there are $6^8$ many distinct outcomes. How many of these correspond to only one occupied cell, two occupied cells and so on? In this post, we present another way to answer the same question using Markov chain.

The occupancy problem discussed here and in the previous blog posts assumes that each ball is equally likely to be assigned into any one of the cells. Thus the occupancy problem where the weights of assignment of the cells are not uniform would be an interesting extension of the problem.

Applying Markov Chains

The notion is Markov chains is introduced here. The calculation involving transition probability matrix is discussed here.

To demonstrate, we use the specific example of randomly distributing balls into 6 cells. First, define a Markov chain. The state of the Markov chain is the number of occupied cells at any given time. Let $X_0$ denote the number of occupied cells at the beginning of the experiment (before any balls are thrown). If all cells are unoccupied at the beginning, then $X_0=0$. For $n \ge 1$, $X_n$ be the number of occupied cells after the $n$th ball has been distributed. The resulting stochastic process $\left\{X_n: n=0,1,2,\cdots \right\}$ is a Markov chain since the state $X_{n+1}$ only depends on the preceding state $X_n$. The following is the transition probability matrix.

$\displaystyle \mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 1 & 0 & \frac{1}{6} & \frac{5}{6} & 0 & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0 & \frac{2}{6} & \frac{4}{6} & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 3 & 0 & 0 & 0 & \frac{3}{6} & \frac{3}{6} & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 4 & 0 & 0 & 0 & 0 & \frac{4}{6} & \frac{2}{6} & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 5 & 0 & 0 & 0 & 0 & 0 & \frac{5}{6} & \frac{1}{6} \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

The matrix $\mathbf{P}$ contains the one-step transition probabilities, i.e. the probabilities of the the next state given the current state. If the Markov chain is in state 0 (no occupied cells), the the next state is 1 for certain. If the current state is 1 (one occupied cell), then the next ball is is the occupied cell with probability 1/6 and is in one of the empty cells with probability 5/6. In general, if there are currently $i$ occupied cells, then the next ball is either in one of the $i$ occupied cells with probability $i/6$ (meaning the next state remains at $i$) or in one of the unoccupied cells with probability $(6-i)/6$ (meaning the next state is $i+1$). Since the number of occupied cells cannot decrease, the transition probabilities in $\mathbf{P}$ have the appearance of moving diagonally from the upper left corner to the lower right corner. State 6 (all cells are occupied) is a called an absorbing state. When the Markov chain is transitioned to state 6, it stays there for ever no matter how many more additional balls are thrown into the cells.

The transition probability matrix $\mathbf{P}$ holds the key to answering the first question indicated above. In throwing $k$ balls into 6 cells, the answer lies in the matrix $\mathbf{P}^k$, the $k$th power of $\mathbf{P}$ (the matrix obtained by multiplying $\mathbf{P}$ by itself $k$ times). This can be done by using software, e.g. using an online matrix calculator.

Consider the example of throwing eight balls into 6 cells. The following is the matrix $\mathbf{P}^8$.

$\displaystyle \mathbf{P}^8 = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr 0 & 0 & 0.00000357 & 0.00226838 & 0.06901578 & 0.36458333 & 0.45010288 & 0.11402606 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 1 & 0 & 0.0000006 & 0.0007591 & 0.03602014 & 0.27756344 & 0.49661351 & 0.18904321 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0 & 0.00015242 & 0.01501534 & 0.18815015 & 0.50831619 & 0.28836591 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 3 & 0 & 0 & 0 & 0.00390625 & 0.10533658 & 0.47531221 & 0.41544496 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 4 & 0 & 0 & 0 & 0 & 0.03901844 & 0.38709919 & 0.57388236 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 5 & 0 & 0 & 0 & 0 & 0 & 0.23256804 & 0.76743196 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

Each entry in $\mathbf{P}^8$ is denoted by $P_{ij}^8$, the 8-step transition probability, located in the row corresponding to the state $i$ and in the column corresponding to the state $j$. For example, $P_{03}^8=0.069$, indicating that there is a 6.9% chance that 3 cells are occupied after eight balls are thrown (with all cells being empty at the beginning). Another example: $P_{13}^8=0.036$. So if there is one occupied cell at the beginning, there is a 3.6% chance that 3 cells are occupied after throwing 8 balls into the 6 cells. So each row in $\mathbf{P}^8$ gives a probability distribution of the number of occupied cells depending the initial state.

The first question stated above assumes that the all cells are empty at the beginning. Thus the first row of the matrix $\mathbf{P}^8$ gives the answers.

$P[\text{1 occupied cell}]=P_{01}^8=0.00000357$

$P[\text{2 occupied cell}]=P_{02}^8=0.00226838$

$P[\text{3 occupied cell}]=P_{03}^8=0.06901578$

$P[\text{4 occupied cell}]=P_{04}^8=0.36458333$

$P[\text{5 occupied cell}]=P_{05}^8=0.45010288$

$P[\text{6 occupied cell}]=P_{06}^8=0.11402606$

In randomly throwing 8 balls into 6 cells, the more likely outcomes would be having 4, 5 or 6 occupied cells with the mostly likely being 4 occupied cells. Let $Y$ be the number of occupied cells after 8 balls are thrown into 6 cells. The mean number of occupied cells is:

\displaystyle \begin{aligned} E[Y]&=1 \times 0.00000357+2 \times 0.00226838+3 \times 0.06901578 \\&\ \ + 4 \times 0.36458333+5 \times 0.45010288+6 \times 0.11402606 \\&=4.60459175 \end{aligned}

To give an indication that the probability distribution is correct, we simulate 8 rolls of a die 1,000 times using the =RANDBETWEEN(1, 6) function in Excel. The results: 0% (1 occupied cell), 0.4% (2 occupied cells), 7.3% (3 occupied cells), 35.2% (4 occupied cells), 47.3% (5 occupied cells) and 9.8% (6 occupied cells). These results are in general agreement with the calculated distribution. Of course, the larger the simulation, the closer the simulated results will be to the theoretical distribution.

Here’s the algorithm for solving the occupancy problem using the Markov chain approach.

If the occupancy problem involves throwing balls into $n$ cells, the state of the Markov chain is the number of occupied cells at any given time. Set up the one-step transition probability matrix $\mathbf{P}$ as follows. Set $P_{01}=1$ and $P_{n,n-1}=1$. For $0, set $P_{ii}=i/n$, $P_{i,i+1}=(n-i)/n$. Then the matrix $\mathbf{P}^k$ gives the probability distribution of the number of occupied cells after $k$ balls are randomly distributed into the $n$ cells. The row in $\mathbf{P}^k$ corresponding to the state $i$ gives the probability of the number of occupied cells when there are $i$ occupied cells initially. If initially all cells are empty, then the first row of $\mathbf{P}^k$ gives the probability distribution of the number of occupied cells.

We would like to make further comment on the example. Assuming that all cells are empty at the beginning, the example can be worked by eliminating the first row of $\mathbf{P}$, the row for state 0. The first ball will always go into one empty cell. Thus in throwing 8 balls, we can focus on 7 balls going into 6 cells with one cell already occupied (by the first ball). Thus we can focus on the following transition probability matrix.

$\displaystyle \mathbf{P} = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 \cr 1 & \frac{1}{6} & \frac{5}{6} & 0 & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & \frac{2}{6} & \frac{4}{6} & 0 & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 3 & 0 & 0 & \frac{3}{6} & \frac{3}{6} & 0 & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 4 & 0 & 0 & 0 & \frac{4}{6} & \frac{2}{6} & 0 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 5 & 0 & 0 & 0 & 0 & \frac{5}{6} & \frac{1}{6} \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 6 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

Assuming that the first ball goes into one of the empty cells, we focus on the next 7 balls. So we compute the matrix $\mathbf{P}^7$.

$\displaystyle \mathbf{P}^7 = \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 \cr 1 & 0.00000357 & 0.00226838 & 0.06901578 & 0.36458333 & 0.45010288 & 0.11402606 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 2 & 0 & 0.00045725 & 0.02942101 & 0.26015947 & 0.50591564 & 0.20404664 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 3 & 0 & 0 & 0.0078125 & 0.15214549 & 0.50951646 & 0.33052555 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 4 & 0 & 0 & 0 & 0.05852766 & 0.44110797 & 0.50036437 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 5 & 0 & 0 & 0 & 0 & 0.27908165 & 0.72091835 \cr \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr 6 & 0 & 0 & 0 & 0 & 0 & 1 \cr } \qquad$

Note that the first row of $\mathbf{P}^7$ is identical to the first row of $\mathbf{P}^8$ earlier. So a slightly different algorithm is to put one ball in a cell and then focus on the random behavior of the next $k-1$ balls. In other words, set up the transition probability matrix $\mathbf{P}$ with only the states $1, 2,\cdots,n$ and then raise $\mathbf{P}$ to $k-1$.

Remarks

As the above example demonstrates, the occupancy problem is a matter of calculating the power of a transition probability matrix. It is an algorithm that is suitable for computer implementation. It illustrates the versatility of the Markov chain method. The combinatorial approach of using multinomial theorem (discussed here and here) is also a valuable approach. Each approach provides its own unique insight. The post ends with an exercise.

Exercise
Suppose that eleven candies are randomly distributed to 4 children. Suppose that the names of the 4 children are Marcus, Issac, Samantha and Paul.

• Given the one-step transition probability matrix for the Markov chain describing the process of distributing candies to 4 children.
• What is the probability distribution of the number of children who have received candies?
• What is the probability all 4 children receiving candies?
• What is the mean number of children who have received candies?
• What is the probability that Marcus receives 1 candy, Issac receives 4 candies, Samantha receives 4 candies and Paul receives 2 candies?
• What is the probability that one of the children receives 1 candy, another child receives 4 candies, another child receives 4 candies and the last child receives 2 candies?

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

# A first look at applications of Markov chains

This post presents several interesting examples of Markov chain models. Previous posts on Markov chains are: an introduction, how to work with transition probabilities (here and here).

Examples

The following gives several examples of Markov chains, discussed at an introductory level. They will be further developed in subsequent posts.

Example 1 – The Ehrenfest Chain

The Ehrenfest chain, named for the physicist Paul Ehrenfest, is a simple, discrete model for the exchange of gas molecules contained in a volume divided into two regions A and B by a permeable membrane. Suppose that the gas has $N$ molecules. At each period of time, a molecule is chosen at random from the set of $N$ molecules and moved from the region that it is in to the other region.

Figure 1 – The Efrenfest Chain

The model has a simple mathematical description using balls and urns. Suppose that two urns, labeled A and B, contain $N$ balls, labeled $1, 2, 3, \cdots, N$. At each step (i.e. at each discrete time unit), an integer is selected at random from $1, 2, 3, \cdots, N$ independent of the past selections. Then the ball labeled by the selected integer is removed from its urn and placed in the other urn. The procedure is repeated indefinitely.

The state of the system at time $n=0,1,2,\cdots$ is the number of balls in urn A and is denoted by $X_n$. Then $\left\{X_n: n=0,1,2,\cdots \right\}$ is a Markov chain on the state space $\left\{0,1,\cdots,N \right\}$.

When the process is in state 0 (urn A is empty), the next state is 1. When the process is in state $N$ (urn A is full), the next state is $N-1$. When the process is in state $i$ where $0, the next state of the process is either $i-1$ or $i+1$. The following gives the one-step transition probabilities:

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle 1 &\ \ \ \ \ \ i=0 \text{ and } j=1 \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=N \text{ and } j=N-1 \\ \text{ } & \text{ } \\ \displaystyle \frac{i}{N} &\ \ \ \ \ \ 0

When urn A has $i$ balls, there is a probability of $\frac{i}{N}$ such that the randomly selected ball is from urn A and thus urn A loses a ball. On the other hand, there is a probability of $1-\frac{i}{N}$ such that the randomly selected ball is from urn B and thus urn A gains a ball. As an illustration, the following gives the transition probability matrix for $N=5$ balls.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & 0 & 1 & 0 & 0 & 0 & 0 \cr 1 & \frac{1}{5} & 0 & \frac{4}{5} & 0 & 0 & 0 \cr 2 & 0 & \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0 \cr 3 & 0 & 0 & \frac{3}{5} & 0 & \frac{2}{5} & 0 \cr 4 & 0 & 0 & 0 & \frac{4}{5} & 0 & \frac{1}{5} \cr 5 & 0 & 0 & 0 & 0 & 1 & 0 \cr } \qquad$

An important problem in the Ehrenfest chain is the long-term, or equilibrium, distribution of $X_n$, the number of balls in urn A. Another interesting problem concerns the changes in the composition of the two regions over time. For example, if all molecules are in one region at the beginning, how long on average will it be before each region has half of the molecules? The theory of Markov chains provide a good method for answering such questions.

Example 2 – Random Walk

The simplest random walk is a Markov chain $\left\{X_n: n=0,1,2,\cdots \right\}$ such that each state is the result of a random one-unit up or down move from the previous state.

Figure 2 – Five Simulations of a Random Walk

In the random walk in Figure 1, each state is one unit above or below the preceding state with equal probability. Instead of a random one-unit up or down move, let’s give a more general description of a random walk. The moves in the random walk are determined by a predetermined discrete distribution. Let $Y_1, Y_2, Y_3 \cdots$ be integer-valued random variables that are independent and identically distributed. Let $P[Y=y]$ be the common probability function. Let $X_0$ be an integer-valued random variable that is independent of the random variables $Y_n$. The random variable $X_0$ will be the initial position of the random walk. The random variables $Y_n$ are the increments (they are the amounts added to the stochastic process as time increases).

For $n \ge 1$, define $X_n=X_0+Y_1+\cdots+Y_{n-1}+Y_n$. Then $\left\{X_n: n=0,1,2,\cdots \right\}$ is called a random walk. It is a Markov chain with the state space the set of all integers. Wherever the process is at any given time, the next step in the process is always a walk in a distance and direction that is dictated by the independent random variables $Y_n$. If the random variables $Y_n$ are independent Bernoulli variables with $P(Y_n=1)=p$ and $P(Y_n=-1)=1-p$, then we revert back to the simple random walks described in Figure 1.

To transition from state $i$ to state $j$, the increment $Y_n$ must be $j-i$. Thus the one-step transition probabilities $P_{ij}$ are defined by $P_{ij}=P[Y=j-i]$. Recall that $P[Y=y]$ is the common probability function for the sequence $Y_1, Y_2, Y_3 \cdots$. For a more detailed description, see this previous post.

Example 3 – Gambler’s Ruin

A gambler’s ruin is a simple random walk that can take on only finitely many states. At each time period, the state is one-unit higher or lower than the preceding state as a result of a random bet. More specifically, suppose that a gambler starts out with $d$ units in capital (in dollars or other monetary units) and makes a series of one-unit bets against the house. For the gambler, the probabilities of winning and losing each bet are $p$ and $1-p$, respectively. Whenever the capital reaches zero, the gambler is in ruin and his capital remains zero thereafter. On the other hand, if the capital of the gambler increases to $m$ where $d, then the gambler quits playing. Let $X_n$ be the capital of the gambler after the $n$th bet. Then $\left\{X_n: n=0,1,2,3,\cdots \right\}$ is a random walk. The starting state is $X_0=d$. The states 0 and $m$ are absorbing states. The following gives the transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle p &\ \ \ \ \ \ j=i+1,i \ne 0, i \ne m \\ \text{ } & \text{ } \\ \displaystyle 1-p &\ \ \ \ \ \ j=i-1,i \ne 0, i \ne m \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=0,j=0 \\ \text{ } & \text{ } \\ \displaystyle 1 &\ \ \ \ \ \ i=m,j=m \\ \text{ } & \text{ } \\ 0 &\ \ \ \ \ \ \text{otherwise} \end{array} \right.$

Whenever the gambler reaches the absorbing state of 0, the gambler is in ruin. If the process reaches the absorbing state of $m$, the gambler strikes gold. If the initial capital of the gambler is small relative to the casino, there is a virtually 100% chance that the gambler will be in ruin even if each bet has even odds (see here for the calculation). One interesting questions: on average how long does it take for the gambler to be in ruin? Such questions will be answered after necessary tools are built in subsequent posts.

An alternative interpretation of the gambler’s ruin random walk is this. Assume that two players make a series of one-unit bets against each other and that the total capital between the two players is $m$ units with the first player having $d$ units in capital. Further suppose that the first player has probability $p$ of winning a bet and the second player has probability $1-p$ of winning a bet. Then the two gamblers play until one of them goes broke (is in ruin). Let $X_n$ be the capital of the first player after the $n$th play. Then $\left\{X_n: n=0,1,2,3,\cdots \right\}$ is a Markov chain that is a gambler’s ruin. The following graph shows five simulations of a gambler’s ruin from this previous post.

Figure 3 – Five Simulations of a Gambler’s Ruin

Example 4 – Discrete Birth and Death Chain

In a birth and death chain, the current state $i$ in the process is transitioned to $i+1$ (a birth), $i-1$ (a death) or $i$. The state space is either $\left\{0,1,2,3,\cdots \right\}$ or $\left\{0,1,2,3,\cdots, m \right\}$. The following gives the one-step transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle q_i &\ \ \ \ \ \ j=i-1 \\ \text{ } & \text{ } \\ \displaystyle r_i &\ \ \ \ \ \ j=i \\ \text{ } & \text{ } \\ \displaystyle p_i &\ \ \ \ \ \ j=i+1 \\ \text{ } & \text{ } \\ 0 &\ \ \ \ \ \ \text{otherwise} \end{array} \right.$

where for each state $i$, $q_i$ (the probability of down), $r_i$ (the probability of same) and $p_i$ (the probability of up) are non-negative real numbers such that $q_i+r_i+p_i=1$. The following gives the transition probability matrix for an illustrative case of $m=5$.

$\mathbf{P} = \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr 0 & r_0 & p_0 & 0 & 0 & 0 & 0 \cr 1 & q_1 & r_1 & p_1 & 0 & 0 & 0 \cr 2 & 0 & q_2 & r_2 & p_2 & 0 & 0 \cr 3 & 0 & 0 & q_3 & r_3 & p_3 & 0 \cr 4 & 0 & 0 & 0 & q_4 & r_4 & p_4 \cr 5 & 0 & 0 & 0 & 0 & q_5 & r_5 \cr } \qquad$

In the above matrix, $q_0=0$ and $p_5=0$. If $r_0=1$, then state 0 is absorbing. If $p_0=1$, then state 0 is reflecting. On the other hand, if $r_5=1$, then state 5 is absorbing. If $q_5=1$, then state 5 is reflecting.

The Ehrenfest chain is an example of a birth and death chain (with the boundary states being reflecting). When the probability of down $q_i$, the probability of same $r_i$ and the probability of up $p_i$ do not vary according to the current state (i.e. they are constant), the resulting birth and death chain is a random walk. Thus the gambler’s ruin chain is also an example of a birth and death chain. The birth and death chain is suitable model for applications in which the state of the process is the population of a living system.

Example 5 – Discrete Queueing Markov Chain

Though continuous-time queueing models are much more realistic, a simple discrete-time queueing model is introduced here in order to illustrate applications of discrete Markov chains. Suppose that time is measured in a convenient time interval such as one minute or some appropriate group of minutes. The model assumes that during one period of time, only one customer is served if at least one customer is present. The model also assumes that a random number of new customers arrive at any given time period and that the numbers of customers arriving in the time periods form an independent identically distributed sequence of random variables.

More specifically, let $Y_1,Y_2,Y_3,\cdots$ be independent and identically distributed random variables that take on non-negative integers. Let $P[Y=k]$, $k=0,1,2,\cdots$, be the common probability function. The random variable $Y_n$ is the number of customers that arrive into the system during the $n$th period. Let $X_0$ be the number of customers present initially in the system. For $n \ge 1$, let $X_n$ be the number of customers in the system during the $n$th period. If the current state is $X_n$, then the next state is:

$\displaystyle X_{n+1} = \left\{ \begin{array}{ll} \displaystyle Y_{n+1} &\ \ \ \ \ \ X_n=0 \\ \text{ } & \text{ } \\ \displaystyle (X_n-1)+Y_{n+1} &\ \ \ \ \ \ X_n>0 \\ \end{array} \right.$

where $n \ge 0$. In essence, the number of customers in the system at any given time period is the number of new customers arriving plus the customers already in the system less one (if the system is not empty). If the system is empty, the number of customers is simply the number of new customers. It follows that $\left\{X_n: n \ge 0 \right\}$ is a Markov chain with the state space $\left\{0,1,2,\cdots \right\}$, the set of all non-negative integers. The one-step transition probabilities $P_{ij}$ are given by:

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle P[Y=j] &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\ \text{ } & \text{ } \\ \displaystyle P[Y=j-(i-1)] &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\ \end{array} \right.$

where $P[Y=k]$ is the common probability function for the independent numbers of arrivals of customers $Y_1,Y_2,Y_3,\cdots$. If the $Y_n$ have a common binomial distributions with parameters 3 and $p=0.5$. Then the following gives the transition probability matrix.

$\displaystyle \mathbf{P} = \bordermatrix{ & 0 & \text{ } & 1 & \text{ } & 2 & \text{ } & 3 & \text{ } & 4 & \text{ } & 5 & \text{ } & 6 & \text{ } & 7 & \text{ } & 8 & \cdots \cr 0 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 1 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 2 & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 3 & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 4 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \cdots \cr 5 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \cdots \cr 6 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \cdots \cr 7 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \cdots \cr 8 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \cdots \cr \vdots & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } \cr } \qquad$

If the customer arrivals have a common Poisson distribution with parameter $\lambda=0.2$ (per period), then the following gives the transition probabilities.

$\displaystyle P_{ij} = \left\{ \begin{array}{ll} \displaystyle \frac{e^{-0.2} \ 0.2^j}{j!} &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\ \text{ } & \text{ } \\ \displaystyle \frac{e^{-0.2} \ 0.2^{j-(i-1)}}{[j-(i-1)]!} &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\ \end{array} \right.$

Example 6 – Discrete Branching Chain

Suppose that we have a system of particles that can generate new particles of the same type, e.g. particles such as neutrons and bacteria.

In such a system, we assume that a particle is replaced by a ransom number of new particles (regarded as offspring of the original particle). Furthermore, we assume that each particle generates offspring independently and offspring generation follows the same distribution. Let $Y$ be the random variable that is the common distribution for offspring generation across all particles in the system. Let $P[Y=k]$ be the common probability function of the number of offspring of a particle.

For $n \ge 0$, let $X_n$ be the number of particles in the $n$th generation in the system. It follows from the assumptions that with $X_n=i$

$\displaystyle X_{n+1}=\sum \limits_{k=1}^{i} Y_k$

where $Y_1,Y_2, \cdots, Y_i$ are independent random variables with common probability function $P[Y=k]$. Then $\left\{X_n: n \ge 0 \right\}$ is a Markov chain with the state space the set of all non-negative integers. The one-step transition probabilities $P_{ij}$ are

$\displaystyle P_{ij}=P \biggl[\sum \limits_{k=1}^{i} Y_k = j \biggr]$

In other words, the transition probability $P_{ij}$ is defined by the probability that the independent sum $Y_1+\cdots+Y_i$ taking on the next state $j$.

Note that state 0 is an absorbing state, which means extinction. An interesting problems in branching chains is the computation of the probability of extinction. One approach is to compute the probability of extinction for a branching chain with a single particle, i.e. the probability $\rho$ of starting in state 1 and being absorbed into state 0. Then the probability of extinction for a branching chain starting with $k$ particles is then $\rho^k$ since the particles are assumed to generate new particles independently.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma