Collapsing states into an absorbing state

This post discusses the technique of collapsing a set of states into an absorbing state to order to make certain Markov chain problems easier. We motivate the technique with the following examples. The method discussed here is to raise the new transition probability (the one with the collapsed absorbing state) to an appropriate nth power (Chapman-Kolmogorov equations are discussed here). Other applications of Markov chains are discussed here. Markov chains are applied to the occupancy problem in this previous post.

Example 1
On any given day, the weather condition in a city is either sunny, cloudy (but not rainy) or rainy. The weather condition on a given day is identical to that of the previous day with probability 0.5. The weather condition on a given day is equally likely to be one of the other two weather conditions if it is different from the previous day. Suppose that it is sunny today in this city. Find the probability that none of the next five days will be rainy.

Example 2
A fair coin is flipped repeatedly until a run of 4 consecutive heads appear. We want to know how many flips are required to achieve this goal. Specifically we have the following questions.

  • Find the probability that it takes at most 9 flips to get a run of 4 consecutive heads.
  • Find the probability that it takes at exactly 9 flips to get a run of 4 consecutive heads.

Example 3
A maze is divided into 9 areas as shown below.

Area 1 = initial position of the mouse

A ouse is placed in area 1 initially. Suppose that areas 3, 7 and 9 contain food and that the other areas in the maze do not contain food. Further suppose that the mouse moves through the areas in the maze at random. That is, whenever the mouse is in an area that has w exits, the next move is to one of these w areas with probability 1/w.

  • Find the probability that after the 5 moves, the mouse is one area from a food source.
  • Find the probability that after the 7 moves, the mouse has not found food and is in area 6 or area 8 after the 7th move.
  • Find the probability that it takes at most 8 moves for the mouse to find food.
  • Find the probability that it takes exactly 8 moves for the mouse to find food.

Working Example 1

We work Example 1 to illustrate the method. The following is the transition probability matrix for Example 1 where state 0 = sunny weather, state 1 = cloudy weather and state 2 = rainy weather.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.50 & 0.25 & 0.25   \cr        1 & 0.25 & 0.50 & 0.25  \cr        2 & 0.25 & 0.25 & 0.50   \cr           } \qquad

Since it is currently a sunny day, X_0=0. The probability of not having a rainy day in the next 5 days is the probability P[X_5 \le 1, X_4 \le 1, X_3 \le 1, X_2 \le 1, X_1 \le 1 \lvert X_0=0]. This is the probability that X_j=0 or X_j=1 for the next 5 periods given the current state is 0. Because the range of movement is restricted, using the Chapman-Kolmogorov equations, i.e. finding the 5th power of the matrix \mathbf{P}, does not work. For example, though P_{01}^5 is the probability that the chain will end up in state 1 at time 5 given that it is in state 0 at time 0, the probability would include the possibilities of a rainy day (state 2) in between time 0 and time 5. The technique is to make the state 2 (the state that is not to be visited in the next 5 days) an absorbing state. Doing so produces a new transition probability matrix.

    \mathbf{Q} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.50 & 0.25 & 0.25   \cr        1 & 0.25 & 0.50 & 0.25  \cr        2 & 0 & 0 & 1   \cr           } \qquad

The matrix \mathbf{Q} is obtained by changing state 2 in the matrix \mathbf{P} an absorbing state (i.e. the entry in the row for state 2 and in the column for state 2 has probability 1). It is a valid transition probability matrix since the sum of each row is 1. So it describes a Markov chain. How does the new Markov chain relate to the original Markov chain? The following is the 5th power of \mathbf{Q}.

    \displaystyle \mathbf{Q}^5 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & \frac{122}{1024} &\frac{121}{1024} & \frac{781}{1024}   \cr        \text{ } & \text{ } & \text{ } & \text{ }  \cr        1 & \frac{121}{1024} & \frac{122}{1024} & \frac{781}{1024}  \cr        \text{ } & \text{ } & \text{ } & \text{ }  \cr        2 & 0 & 0 & 1   \cr } \qquad        =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.11914 & 0.11816 & 0.76270   \cr        \text{ } & \text{ } & \text{ } & \text{ }  \cr        1 & 0.11816 & 0.11914 & 0.76270  \cr        \text{ } & \text{ } & \text{ } & \text{ }  \cr        2 & 0 & 0 & 1   \cr   } \qquad

The transition probability Q_{02}^5=0.7627 is the probability of transitioning in the new process from state 0 at time 0 to state 2 at time 5 (the 5th day). Starting at state 0, there is a 76.27% chance that the new process will reach state 2 at time 5. This means that the original process will have reached state 2 at time 5 or prior to time 5. The complement of this probability is Q_{00}^5+Q_{01}^5=0.2373. So there is a 23.73% chance the original process will never reach state 2 at any time in the first 5 periods. If the weather is indeed governed the chain described in \mathbf{P}, it is likely (76.27% chance) that there will be a rainy day at some point within the next 5 periods (given that today is sunny). In fact, even if today is cloudy, there is also a 76.27% chance that there will be a rainy day within the next 5 days.

Discussion

Before working Example 2, let’s discuss the idea of collapsing states into an absorbing state. Consider a Markov chain with transition probability matrix \mathbf{P}. Let \mathcal{S} be the state space of this chain. Let \mathcal{M} be a set of states. Suppose that the states in \mathcal{M} are either desirable to enter in or states that are negative situations to be avoided. In case of \mathcal{M} being desirable or attractive, we wish to find the probability of entering into a state in \mathcal{M} after a number of transitions. In the case of \mathcal{M} being negative situations, we wish to find the probability of not in entering into any state in \mathcal{M} after a number of transitions.

The key to finding these probabilities is to collapse the states in \mathcal{M} into a state called M that is an integer that is different from all states i not in \mathcal{M}. For convenience, we let M be the smallest integer that is greater than all states not in \mathcal{M}. For example, if \mathcal{S}=\left\{0,1,2,3,4,5 \right\}, and \mathcal{M}=\left\{4,5 \right\}, then M=4.

Now define a new Markov chain by letting all states not in \mathcal{M} remain the same and by collapsing all states in \mathcal{M} into the new state M. The new state M is made an absorbing state. The state space of the new Markov chain is the original states not in \mathcal{M} along with the new state M. Let \mathbf{Q} be the transition probability matrix of the new Markov chain. The transition probabilities in \mathbf{Q} are defined as follows:

    \displaystyle  Q_{ij} = \left\{ \begin{array}{ll}           \displaystyle  P_{ij} &\ \ \ \ \ \ i \notin \mathcal{M}, j \notin \mathcal{M} \\            \text{ } & \text{ } \\           \displaystyle  \sum \limits_{k \in \mathcal{M}} P_{ik} &\ \ \ \ \ \ i \notin \mathcal{M}, j=M \\            \text{ } & \text{ } \\          \displaystyle  1 &\ \ \ \ \ \ i=M, j=M \\                   \end{array} \right.

The transition probability matrix \mathbf{Q} is derived from and is closely related to the matrix \mathbf{P}. For states i,j not in \mathcal{M}, the transition probabilities are the same in both matrices (the first condition). The second condition says that Q_{iM} is the sum of all P_{ik} where k \notin \mathcal{M}. Thus Q_{iM} is the probability of the original process entering one of the states in \mathcal{M} when starting from a state i \notin \mathcal{M}. Since M is an absorbing state, Q_{MM}=1.

To calculate the two probabilities concerning the process for \mathbf{P} indicated earlier, compute powers of the matrix \mathbf{Q}. In the new Markov process for the matrix \mathbf{Q}, there are two kinds of states, the states i \notin \mathcal{M} and the collapsed state M. Whenever the process is in state i \ne M, the process is not yet absorbed (into the state M). When the process starts at a state not in M (i.e. not yet absorbed), the process will eventually be absorbed (after some number of transitions).

Consider Q_{iM}^n where i \ne M. This is the probability that the process starts in state i (not yet absorbed) and is absorbed at time n. This is the connection between the two processes: being absorbed into state M in the new process at time n means that the original process enters a state in \mathcal{M} in some period prior to time n or at time n. We have the following summary.

Consider a Markov process \left\{X_n:n \ge 0  \right\} with the transition probability matrix \mathbf{P}. Collapse \mathcal{M}, a set of states, into an absorbing state called M in the manner described above. Let \mathbf{Q} be the resulting transition probability matrix of the new Markov chain. For i \ne M, Q_{iM}^n is the probability of absorption given that the process starts out not being absorbed, i.e. the probability that the original process enters a state in \mathcal{M} at time n or prior to time n given that initially the process is in state i. Equivalently Q_{iM}^n=P[X_k \in \mathcal{M} \text{ for some } k=1,2,\cdots,n \lvert X_0=i].

The above observation gives a way to interpret the complement 1-Q_{iM}^n. It would be the probability of the original process never entering any state in \mathcal{M} during the n time periods (given that the process starts at i \notin \mathcal{M}). This is the answer to Example 1. We have the following summary.

The complement of Q_{iM}^n is 1-Q_{iM}^n. It is the probability that the original process never enters any of the states in \mathcal{M} during the n time periods (given that the process starts at i \notin \mathcal{M}). Equivalently, 1-Q_{iM}^n can be stated as 1-Q_{iM}^n=P[X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n \lvert X_0=i]. Note that the probability 1-Q_{iM}^n is also the sum \sum_{j \notin \mathcal{M}} Q_{ij}^n. Specifically, for j \notin \mathcal{M}, Q_{ij}^n can be stated as Q_{ij}^n=P[X_n=j, X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n-1 \lvert X_0=i].

The probability Q_{iM}^n is the probability of absorption on or prior to time n. Another probability of interest is the probability of absorption exactly at time n.

As described above, let \mathbf{Q} be the resulting transition probability matrix of the new Markov chain after collapsing some states into an absorbing state. For i \ne M, Q_{iM}^n-Q_{iM}^{n-1} is the probability of absorption at time n given that the process starts out not being absorbed, i.e. the probability that the original process enters a state in \mathcal{M} at time n given that initially the process is in state i. Equivalently Q_{iM}^n-Q_{iM}^{n-1}=P[X_n \in \mathcal{M} \text{ and } X_k \notin \mathcal{M} \text{ for all } k=1,2,\cdots,n-1 \lvert X_0=i].

In a nutshell, the idea of collapsing states into an absorbing state is to inform on two probabilities concerning the original process. To illustrate this process, consider the following example.

Example 0
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4  & 5 \cr        0 & 0.6 & 0.08 & 0.08  & 0.08 & 0.08 & 0.08 \cr        1 & 0.5 & 0.10 & 0.10  & 0.10 & 0.10 & 0.10 \cr        2 & 0.4 & 0.12 & 0.12  & 0.12 & 0.12 & 0.12 \cr        3 & 0.3 & 0.14 & 0.14  & 0.14 & 0.14 & 0.14 \cr        4 & 0.2 & 0.16 & 0.16  & 0.16 & 0.16 & 0.16 \cr        5 & 0.1 & 0.18 & 0.18  & 0.18 & 0.18 & 0.18 \cr   } \qquad

Now collapse state 4 and state 5 into a new state called 4. The following is the resulting matrix \mathbf{Q}.

    \mathbf{Q} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0.6 & 0.08 & 0.08  & 0.08 & 0.16  \cr        1 & 0.5 & 0.10 & 0.10  & 0.10 & 0.20  \cr        2 & 0.4 & 0.12 & 0.12  & 0.12 & 0.24  \cr        3 & 0.3 & 0.14 & 0.14  & 0.14 & 0.28  \cr        4 & 0 & 0 & 0  & 0 & 1  \cr           } \qquad

The following is the 6th power of \mathbf{Q}.

    \mathbf{Q}^6 =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0.195466 & 0.034574 & 0.034574  & 0.034574 & 0.700812  \cr        1 & 0.184167 & 0.032578 & 0.032578  & 0.032578 & 0.718099  \cr        2 & 0.172869 & 0.030582 & 0.030582  & 0.030582 & 0.735386  \cr        3 & 0.161570 & 0.028586 & 0.028586  & 0.028586 & 0.752673  \cr        4 & 0 & 0 & 0  & 0 & 1  \cr           } \qquad

As a result, Q_{04}^6=0.700812 is the probability that the new process will be in state 4 at time 4 (given the process starts at state 0). This means that it is the probability that the original process will reach state 4 or state 5 by time 6 (at or before time 6) given the original process starts in state 0. Note that regardless of the initial state, the probability of being absorbed into state 4 or state 5 by time 6 is roughly the same (70% to 75%).

On the other hand, Q_{04}^5=0.631665. Thus Q_{04}^6-Q_{04}^5=0.700812-0.631665=0.069147. Thus there is a 6.9% chance that absorption will take place exactly at time 6.

More Examples

We now work Example 2 and Example 3.

Example 2 (Statement of the example given above)
Define a Markov chain with states 0, 1, 2, 3 and 4 with i \le 3 meaning that there is currently a run of i consecutive heads (the last i flips are heads) and with state 4 meaning that a run of 4 consecutive heads has already occurred (the last 4 or more flips are heads). Thus state 4 is an absorbing state. The following is the transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0.5 & 0.5 & 0  & 0 & 0  \cr        1 & 0.5 & 0 & 0.5  & 0 & 0  \cr        2 & 0.5 & 0 & 0  & 0.5 & 0  \cr        3 & 0.5 & 0 & 0  & 0 & 0.5  \cr        4 & 0 & 0 & 0  & 0 & 1  \cr           } \qquad

Here’s how \mathbf{P} is derived. For example, if there is currently a run of 2 heads (the third row), the next state is either 0 (the next flip is a tail) or 3 (the next flip is a head) with equal probability. To find the two probabilities in question, raise \mathbf{P} to the 8th and 9th powers.

    \mathbf{P}^9 =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0.406250 & 0.210938 & 0.109375  & 0.056641 & 0.216797  \cr        1 & 0.376953 & 0.195313 & 0.101563  & 0.052734 & 0.273438  \cr        2 & 0.320313 & 0.166016 & 0.085938  & 0.044922 & 0.382813  \cr        3 & 0.210938 & 0.109375 & 0.056641  & 0.029297 & 0.593750  \cr        4 & 0 & 0 & 0  & 0 & 1  \cr           } \qquad

    \mathbf{P}^8 =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0.421875 & 0.218750 & 0.113281  & 0.058594 & 0.187500  \cr        1 & 0.390625 & 0.203125 & 0.105469  & 0.054688 & 0.246094  \cr        2 & 0.332031 & 0.171875 & 0.089844  & 0.046875 & 0.359375  \cr        3 & 0.218750 & 0.113281 & 0.058594  & 0.031250 & 0.578125  \cr        4 & 0 & 0 & 0  & 0 & 1  \cr           } \qquad

The starting state is 0 since the coin flips are counted from the very first one. With Q_{04}^9=0.216797, we know that the probability of taking 9 flips or less to get 4 consecutive heads is 0.216797.

With Q_{04}^9-Q_{04}^8=0.216797-0.187500=0.029297, there is a roughly 3% chance that the run of 4 consecutive heads is achieved at the 9th flip.

Example 3 (Statement of the example given above)
Let X_n be the area the mouse is in after making n moves. Then \left\{X_n: n=0,1,2,\cdots \right\} is a Markov chain. The following is the transition probability matrix. To help see the matrix, the diagram of the maze is repeated.

Area 1 = initial position of the mouse

    \mathbf{P} =       \bordermatrix{ & 1 & 2 & 3 & 4  & 5 & 6 & 7 & 8 & 9 \cr        1 & 0 & 1/2 & 0  & 0 & 0 & 1/2 & 0  & 0 & 0 \cr        2 & 1/3 & 0 & 1/3  & 0 & 1/3 & 0 & 0  & 0 & 0 \cr        3 & 0 & 1/2 & 0  & 1/2 & 0 & 0 & 0  & 0 & 0 \cr        4 & 0 & 0 & 1/3  & 0 & 1/3 & 0 & 0  & 0 & 1/3 \cr        5 & 0 & 1/4 & 0  & 1/4 & 0 & 1/4 & 0  & 1/4 & 0 \cr        6 & 1/3 & 0 & 0  & 0 & 1/3 & 0 & 1/3  & 0 & 0 \cr        7 & 0 & 0 & 0  & 0 & 0 & 1/2 & 0  & 1/2 & 0 \cr        8 & 0 & 0 & 0  & 0 & 1/3 & 0 & 1/3  & 0 & 1/3 \cr        9 & 0 & 0 & 0  & 1/2 & 0 & 0 & 0  & 1/2 & 0 \cr   } \qquad

States 3, 7 and 9 are desirable states since these are the areas with food. Let \mathcal{M}=\left\{3,7,9 \right\}. We then collapse \mathcal{M} into a new state called 9. The state 9 in the new process is an absorbing state, which means that the original process has reached a state with food (states 3, 7 and 9). The following is the transition probability matrix for the new (collapsed) process.

    \mathbf{Q} =       \bordermatrix{ & 1 & 2 & 4 & 5  & 6 & 8 & 9  \cr        1 & 0 & 1/2 & 0  & 0 & 1/2 & 0 & 0  \cr        2 & 1/3 & 0 & 0  & 1/3 & 0 & 0 & 1/3  \cr        4 & 0 & 0 & 0  & 1/3 & 0 & 0 & 2/3  \cr        5 & 0 & 1/4 & 1/4  & 0 & 1/4 & 1/4 & 0  \cr        6 & 1/3 & 0 & 0  & 1/3 & 0 & 0 & 1/3  \cr        8 & 0 & 0 & 0  & 1/3 & 0 & 0 & 2/3  \cr        9 & 0 & 0 & 0  & 0 & 0 & 0 & 1  \cr           } \qquad

The state 9 in the matrix \mathbf{Q} refers to the situation in the original process reaching states 3, 7 or 9. Then all the questions in the example can be answered by raising \mathbf{Q} to an appropriate power. In particular, raising \mathbf{Q} to the 5th, 7th and 8th power.

    \mathbf{Q}^5 =       \bordermatrix{ & 1 & 2 & 4 & 5  & 6 & 8 & 9  \cr        1 & 0 & 5/36 & 1/18  & 0 & 5/36 & 1/18 & 11/18  \cr        2 & 5/54 & 0 & 0  & 7/54 & 0 & 0 & 7/9  \cr        4 & 1/27 & 0 & 0  & 1/18 & 0 & 0 & 49/54  \cr        5 & 0 & 7/72 & 1/24  & 0 & 7/72 & 1/24 & 13/18  \cr        6 & 5/54 & 0 & 0  & 7/54 & 0 & 0 & 7/9  \cr        8 & 1/27 & 0 & 0  & 1/18 & 0 & 0 & 49/54  \cr        9 & 0 & 0 & 0  & 0 & 0 & 0 & 1  \cr           } \qquad

    \mathbf{Q}^7 =       \bordermatrix{ & 1 & 2 & 4 & 5  & 6 & 8 & 9  \cr        1 & 0 & 17/216 & 7/216  & 0 & 17/216 & 7/216 & 7/9  \cr        2 & 17/324 & 0 & 0  & 2/27 & 0 & 0 & 283/324  \cr        4 & 7/324 & 0 & 0  & 5/162 & 0 & 0 & 307/324  \cr        5 & 0 & 1/18 & 5/216  & 0 & 1/18 & 5/216 & 91/108  \cr        6 & 17/324 & 0 & 0  & 2/27 & 0 & 0 & 283/324  \cr        8 & 7/324 & 0 & 0  & 5/162 & 0 & 0 & 307/324  \cr        9 & 0 & 0 & 0  & 0 & 0 & 0 & 1  \cr           } \qquad

    \mathbf{Q}^8 =       \bordermatrix{ & 1 & 2 & 4 & 5  & 6 & 8 & 9  \cr        1 & 17/324 & 0 & 0  & 2/27 & 0 & 0 & 283/324  \cr        2 & 0 & 29/648 & 1/54  & 0 & 29/648 & 1/54 & 283/324  \cr        4 & 0 & 1/54 & 5/648  & 0 & 1/54 & 5/648 & 307/324  \cr        5 & 1/27 & 0 & 0  & 17/324 & 0 & 0 & 295/324  \cr        6 & 0 & 29/648 & 1/54  & 0 & 29/648 & 1/54 & 283/324  \cr        8 & 0 & 1/54 & 5/648  & 0 & 1/54 & 5/648 & 307/324  \cr        9 & 0 & 0 & 0  & 0 & 0 & 0 & 1  \cr           } \qquad

The statements for the probabilities to be sought are repeated here.

  • Find the probability that after the 5 moves, the mouse is one area from a food source.
  • Find the probability that after the 7 moves, the mouse has not found food and is in area 6 or area 8 after the 7th move.
  • Find the probability that it takes at most 8 moves for the mouse to find food.
  • Find the probability that it takes exactly 8 moves for the mouse to find food.

To obtain the answers, it is a matter of picking the entries from the matrices.

  • Q_{12}^5+Q_{14}^5+Q_{16}^5+Q_{18}^5=7/18=0.3889
  • Q_{16}^7+Q_{18}^7=17/216+7/216=24/216=0.1111
  • Q_{19}^8=283/324=0.8735
  • Q_{19}^8-Q_{19}^7=283/324-7/9=0.095679

One area from a food source means being in areas 2, 4, 6 and 8. Thus the first probability is Q_{12}^5+Q_{14}^5+Q_{16}^5+Q_{18}^5. The probability of being in areas 6 or 8 after 7 moves is Q_{16}^7+Q_{18}^7. The probability that it will take 8 moves or less to reach food is Q_{19}^8. The probability of reaching food in exactly 8 moves is Q_{19}^8-Q_{19}^7.

With the matrices \mathbf{Q}^5, \mathbf{Q}^7 and \mathbf{Q}^8, we can also find out the probability of reaching food. Just from the diagram, it is clear that the likelihood of reaching food is greater if the initial position is area 8 or area 6. Between these two areas, it is easier to reach food if the mouse is in area 8 initially since area 8 is adjacent to two food sources. This is also confirmed by the calculation. For example, Q_{69}^5=7/9=0.7778 and Q_{89}^5=49/54=0.9074. So it is over 90% certain that the mouse will find food if it is placed in area 8 initially.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Advertisements

The occupancy problem

The occupancy problem is a classic problem in probability. The setting of the problem is that balls are randomly distributed into cells (or boxes or other containers) one at a time. After all the balls are distributed into the cells, we examine how many of the cells are occupied with balls. In this post, we examine the occupancy problem using Markov chains.

The Occupancy Problem

As indicated above, we consider the random experiment of randomly placing k balls into n cells. The following diagram shows the result of randomly distributing 8 balls into 6 cells (or boxes).

Figure 1 – Occupancy Problem – throwing balls into cells

One ball is thrown into the 6 boxes for a total of 8 balls. In Figure 1, a total of four cells are occupied. In the current discussion, the focus is on the number of occupied cells and not on which of the cells are occupied. Since Figure 1 has 6 cells, we can also view the throwing a ball into the 6 boxes as rolling a fair die (the occupied cell would be the face value that turns up). Thus we can view the experiment in Figure 1 a rolling a die 8 times. Then at the end, we look at the number faces that appear.

Figure 2 – Occupancy Problem – rolling die repeatedly and counting the faces that appear

The top of Figure 2 shows the 8 rolls of the die. Four faces turn up in these 8 rolls – faces 2, 4, 5 and 6, the same result described in Figure 1. Another way to look at the occupancy problem is that of random sampling. Each time a ball is thrown into n cells, the action can be viewed as randomly selecting one of the n cells. Thus the occupancy problem can be viewed as randomly selecting k balls (one at a time with replacement) from an urn with n balls labeled 1,2,\cdots,n. The following diagram shows an urn with 6 balls labeled 1 through 6. Then Figure 1 and Figure 2 would be equivalent to randomly selecting 8 balls with replacement from this urn.

Figure 3 – Occupancy Problem – random selection of balls with replacement from an urn and counting the distinct numbers that are drawn

Figure 1 describes a generic setting of the occupancy problem – throwing balls at random into cells. The interpretation in Figure 2 is ideal for the setting of six cells. If the number of cells is not six, we can view the problem as rolling an n-sided die k times. The random selection of balls with replacement (Figure 3) is also a good way to view the occupancy problem. Regardless of how we view the occupancy problem, we consider the following questions. In randomly assigning k balls into n cells,

  • What is the probability that exactly j cells are occupied with balls where j=1,2,\cdots,n?
  • What is the expected number of occupied cells?

The first question is about the probability distribution of the number of occupied cells after the balls are thrown. The second question is about the mean of the probability distribution of the number of occupied cells. Once the probability distribution is known, we can calculate the distributional quantities such as the mean and variance. Thus the main focus is on the first question.

The occupancy problem has been discussed in a companion blog. In this blog post, the first question is answered by using the multinomial theorem (applied twice). Another blog post develops a formula for the probability distribution of the number of occupied cells. These previous blog posts use counting methods to solve the occupancy problem. For example, in throwing eight balls into 6 cells, there are 6^8 many distinct outcomes. How many of these correspond to only one occupied cell, two occupied cells and so on? In this post, we present another way to answer the same question using Markov chain.

The occupancy problem discussed here and in the previous blog posts assumes that each ball is equally likely to be assigned into any one of the cells. Thus the occupancy problem where the weights of assignment of the cells are not uniform would be an interesting extension of the problem.

Applying Markov Chains

The notion is Markov chains is introduced here. The calculation involving transition probability matrix is discussed here.

To demonstrate, we use the specific example of randomly distributing balls into 6 cells. First, define a Markov chain. The state of the Markov chain is the number of occupied cells at any given time. Let X_0 denote the number of occupied cells at the beginning of the experiment (before any balls are thrown). If all cells are unoccupied at the beginning, then X_0=0. For n \ge 1, X_n be the number of occupied cells after the nth ball has been distributed. The resulting stochastic process \left\{X_n: n=0,1,2,\cdots \right\} is a Markov chain since the state X_{n+1} only depends on the preceding state X_n. The following is the transition probability matrix.

    \displaystyle \mathbf{P} =    \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr    0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr    1 & 0 & \frac{1}{6} & \frac{5}{6} & 0 & 0 & 0 & 0 \cr  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  2 & 0 & 0 & \frac{2}{6} & \frac{4}{6} & 0 & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  3 & 0 & 0 & 0 & \frac{3}{6} & \frac{3}{6} & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  4 & 0 & 0 & 0 & 0 & \frac{4}{6} & \frac{2}{6} & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  5 & 0 & 0 & 0 & 0 & 0 & \frac{5}{6} & \frac{1}{6} \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr  } \qquad

    The matrix \mathbf{P} contains the one-step transition probabilities, i.e. the probabilities of the the next state given the current state. If the Markov chain is in state 0 (no occupied cells), the the next state is 1 for certain. If the current state is 1 (one occupied cell), then the next ball is is the occupied cell with probability 1/6 and is in one of the empty cells with probability 5/6. In general, if there are currently i occupied cells, then the next ball is either in one of the i occupied cells with probability i/6 (meaning the next state remains at i) or in one of the unoccupied cells with probability (6-i)/6 (meaning the next state is i+1). Since the number of occupied cells cannot decrease, the transition probabilities in \mathbf{P} have the appearance of moving diagonally from the upper left corner to the lower right corner. State 6 (all cells are occupied) is a called an absorbing state. When the Markov chain is transitioned to state 6, it stays there for ever no matter how many more additional balls are thrown into the cells.

    The transition probability matrix \mathbf{P} holds the key to answering the first question indicated above. In throwing k balls into 6 cells, the answer lies in the matrix \mathbf{P}^k, the kth power of \mathbf{P} (the matrix obtained by multiplying \mathbf{P} by itself k times). This can be done by using software, e.g. using an online matrix calculator.

    Consider the example of throwing eight balls into 6 cells. The following is the matrix \mathbf{P}^8.

      \displaystyle \mathbf{P}^8 =    \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 \cr    0 & 0 & 0.00000357 & 0.00226838 & 0.06901578 & 0.36458333 & 0.45010288 & 0.11402606 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr    1 & 0 & 0.0000006 & 0.0007591 & 0.03602014 & 0.27756344 & 0.49661351 & 0.18904321 \cr  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  2 & 0 & 0 & 0.00015242 & 0.01501534 & 0.18815015 & 0.50831619 & 0.28836591 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  3 & 0 & 0 & 0 & 0.00390625 & 0.10533658 & 0.47531221 & 0.41544496 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  4 & 0 & 0 & 0 & 0 & 0.03901844 & 0.38709919 & 0.57388236 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  5 & 0 & 0 & 0 & 0 & 0 & 0.23256804 & 0.76743196 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  6 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr  } \qquad

    Each entry in \mathbf{P}^8 is denoted by P_{ij}^8, the 8-step transition probability, located in the row corresponding to the state i and in the column corresponding to the state j. For example, P_{03}^8=0.069, indicating that there is a 6.9% chance that 3 cells are occupied after eight balls are thrown (with all cells being empty at the beginning). Another example: P_{13}^8=0.036. So if there is one occupied cell at the beginning, there is a 3.6% chance that 3 cells are occupied after throwing 8 balls into the 6 cells. So each row in \mathbf{P}^8 gives a probability distribution of the number of occupied cells depending the initial state.

    The first question stated above assumes that the all cells are empty at the beginning. Thus the first row of the matrix \mathbf{P}^8 gives the answers.

      P[\text{1 occupied cell}]=P_{01}^8=0.00000357

      P[\text{2 occupied cell}]=P_{02}^8=0.00226838

      P[\text{3 occupied cell}]=P_{03}^8=0.06901578

      P[\text{4 occupied cell}]=P_{04}^8=0.36458333

      P[\text{5 occupied cell}]=P_{05}^8=0.45010288

      P[\text{6 occupied cell}]=P_{06}^8=0.11402606

    In randomly throwing 8 balls into 6 cells, the more likely outcomes would be having 4, 5 or 6 occupied cells with the mostly likely being 4 occupied cells. Let Y be the number of occupied cells after 8 balls are thrown into 6 cells. The mean number of occupied cells is:

      \displaystyle \begin{aligned} E[Y]&=1 \times 0.00000357+2 \times 0.00226838+3 \times 0.06901578 \\&\ \ + 4 \times 0.36458333+5 \times 0.45010288+6 \times 0.11402606 \\&=4.60459175 \end{aligned}

    To give an indication that the probability distribution is correct, we simulate 8 rolls of a die 1,000 times using the =RANDBETWEEN(1, 6) function in Excel. The results: 0% (1 occupied cell), 0.4% (2 occupied cells), 7.3% (3 occupied cells), 35.2% (4 occupied cells), 47.3% (5 occupied cells) and 9.8% (6 occupied cells). These results are in general agreement with the calculated distribution. Of course, the larger the simulation, the closer the simulated results will be to the theoretical distribution.

    Here’s the algorithm for solving the occupancy problem using the Markov chain approach.

    If the occupancy problem involves throwing balls into n cells, the state of the Markov chain is the number of occupied cells at any given time. Set up the one-step transition probability matrix \mathbf{P} as follows. Set P_{01}=1 and P_{n,n-1}=1. For 0<i<n, set P_{ii}=i/n, P_{i,i+1}=(n-i)/n. Then the matrix \mathbf{P}^k gives the probability distribution of the number of occupied cells after k balls are randomly distributed into the n cells. The row in \mathbf{P}^k corresponding to the state i gives the probability of the number of occupied cells when there are i occupied cells initially. If initially all cells are empty, then the first row of \mathbf{P}^k gives the probability distribution of the number of occupied cells.

    We would like to make further comment on the example. Assuming that all cells are empty at the beginning, the example can be worked by eliminating the first row of \mathbf{P}, the row for state 0. The first ball will always go into one empty cell. Thus in throwing 8 balls, we can focus on 7 balls going into 6 cells with one cell already occupied (by the first ball). Thus we can focus on the following transition probability matrix.

      \displaystyle \mathbf{P} =    \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 \cr  1 & \frac{1}{6} & \frac{5}{6} & 0 & 0 & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  2 & 0 & \frac{2}{6} & \frac{4}{6} & 0 & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  3 & 0 & 0 & \frac{3}{6} & \frac{3}{6} & 0 & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  4 & 0 & 0 & 0 & \frac{4}{6} & \frac{2}{6} & 0 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  5 & 0 & 0 & 0 & 0 & \frac{5}{6} & \frac{1}{6} \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  6 & 0 & 0 & 0 & 0 & 0 & 1 \cr  } \qquad

      Assuming that the first ball goes into one of the empty cells, we focus on the next 7 balls. So we compute the matrix \mathbf{P}^7.

        \displaystyle \mathbf{P}^7 =    \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 \cr    1 & 0.00000357 & 0.00226838 & 0.06901578 & 0.36458333 & 0.45010288 & 0.11402606 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr    2 & 0 & 0.00045725 & 0.02942101 & 0.26015947 & 0.50591564 & 0.20404664 \cr  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  3 & 0 & 0 & 0.0078125 & 0.15214549 & 0.50951646 & 0.33052555 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  4 & 0 & 0 & 0 & 0.05852766 & 0.44110797 & 0.50036437 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  5 & 0 & 0 & 0 & 0 & 0.27908165 & 0.72091835 \cr    \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \cr  6 & 0 & 0 & 0 & 0 & 0 & 1 \cr    } \qquad

        Note that the first row of \mathbf{P}^7 is identical to the first row of \mathbf{P}^8 earlier. So a slightly different algorithm is to put one ball in a cell and then focus on the random behavior of the next k-1 balls. In other words, set up the transition probability matrix \mathbf{P} with only the states 1, 2,\cdots,n and then raise \mathbf{P} to k-1.

        Remarks

        As the above example demonstrates, the occupancy problem is a matter of calculating the power of a transition probability matrix. It is an algorithm that is suitable for computer implementation. It illustrates the versatility of the Markov chain method. The combinatorial approach of using multinomial theorem (discussed here and here) is also a valuable approach. Each approach provides its own unique insight. The post ends with an exercise.

        Exercise
        Suppose that eleven candies are randomly distributed to 4 children. Suppose that the names of the 4 children are Marcus, Issac, Samantha and Paul.

        • Given the one-step transition probability matrix for the Markov chain describing the process of distributing candies to 4 children.
        • What is the probability distribution of the number of children who have received candies?
        • What is the probability all 4 children receiving candies?
        • What is the mean number of children who have received candies?
        • What is the probability that Marcus receives 1 candy, Issac receives 4 candies, Samantha receives 4 candies and Paul receives 2 candies?
        • What is the probability that one of the children receives 1 candy, another child receives 4 candies, another child receives 4 candies and the last child receives 2 candies?

        \text{ }

        \text{ }

        \text{ }

        \copyright 2017 – Dan Ma

A first look at applications of Markov chains

This post presents several interesting examples of Markov chain models. Previous posts on Markov chains are: an introduction, how to work with transition probabilities (here and here).

Examples

The following gives several examples of Markov chains, discussed at an introductory level. They will be further developed in subsequent posts.

Example 1 – The Ehrenfest Chain

The Ehrenfest chain, named for the physicist Paul Ehrenfest, is a simple, discrete model for the exchange of gas molecules contained in a volume divided into two regions A and B by a permeable membrane. Suppose that the gas has N molecules. At each period of time, a molecule is chosen at random from the set of N molecules and moved from the region that it is in to the other region.

Figure 1 – The Efrenfest Chain

The model has a simple mathematical description using balls and urns. Suppose that two urns, labeled A and B, contain N balls, labeled 1, 2, 3, \cdots, N. At each step (i.e. at each discrete time unit), an integer is selected at random from 1, 2, 3, \cdots, N independent of the past selections. Then the ball labeled by the selected integer is removed from its urn and placed in the other urn. The procedure is repeated indefinitely.

The state of the system at time n=0,1,2,\cdots is the number of balls in urn A and is denoted by X_n. Then \left\{X_n: n=0,1,2,\cdots \right\} is a Markov chain on the state space \left\{0,1,\cdots,N \right\}.

When the process is in state 0 (urn A is empty), the next state is 1. When the process is in state N (urn A is full), the next state is N-1. When the process is in state i where 0<i<N, the next state of the process is either i-1 or i+1. The following gives the one-step transition probabilities:

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  1 &\ \ \ \ \ \ i=0 \text{ and } j=1 \\            \text{ } & \text{ } \\           \displaystyle  1 &\ \ \ \ \ \ i=N \text{ and } j=N-1 \\            \text{ } & \text{ } \\          \displaystyle  \frac{i}{N} &\ \ \ \ \ \ 0<i<N \text{ and } j=i-1 \\           \text{ } & \text{ } \\          \displaystyle  1-\frac{i}{N} &\ \ \ \ \ \ 0<i<N \text{ and } j=i+1 \\           \text{ } & \text{ } \\           0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

When urn A has i balls, there is a probability of \frac{i}{N} such that the randomly selected ball is from urn A and thus urn A loses a ball. On the other hand, there is a probability of 1-\frac{i}{N} such that the randomly selected ball is from urn B and thus urn A gains a ball. As an illustration, the following gives the transition probability matrix for N=5 balls.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr        0 & 0 & 1 & 0 & 0 & 0 & 0 \cr        1 & \frac{1}{5} & 0 & \frac{4}{5} & 0 & 0 & 0 \cr        2 & 0 & \frac{2}{5} & 0 & \frac{3}{5} & 0 & 0 \cr        3 & 0 & 0 & \frac{3}{5} & 0 & \frac{2}{5} & 0 \cr        4 & 0 & 0 & 0 & \frac{4}{5} & 0 & \frac{1}{5} \cr        5 & 0 & 0 & 0 & 0 & 1 & 0 \cr   } \qquad

An important problem in the Ehrenfest chain is the long-term, or equilibrium, distribution of X_n, the number of balls in urn A. Another interesting problem concerns the changes in the composition of the two regions over time. For example, if all molecules are in one region at the beginning, how long on average will it be before each region has half of the molecules? The theory of Markov chains provide a good method for answering such questions.

Example 2 – Random Walk

The simplest random walk is a Markov chain \left\{X_n: n=0,1,2,\cdots \right\} such that each state is the result of a random one-unit up or down move from the previous state.

Figure 2 – Five Simulations of a Random Walk

In the random walk in Figure 1, each state is one unit above or below the preceding state with equal probability. Instead of a random one-unit up or down move, let’s give a more general description of a random walk. The moves in the random walk are determined by a predetermined discrete distribution. Let Y_1, Y_2, Y_3 \cdots be integer-valued random variables that are independent and identically distributed. Let P[Y=y] be the common probability function. Let X_0 be an integer-valued random variable that is independent of the random variables Y_n. The random variable X_0 will be the initial position of the random walk. The random variables Y_n are the increments (they are the amounts added to the stochastic process as time increases).

For n \ge 1, define X_n=X_0+Y_1+\cdots+Y_{n-1}+Y_n. Then \left\{X_n: n=0,1,2,\cdots \right\} is called a random walk. It is a Markov chain with the state space the set of all integers. Wherever the process is at any given time, the next step in the process is always a walk in a distance and direction that is dictated by the independent random variables Y_n. If the random variables Y_n are independent Bernoulli variables with P(Y_n=1)=p and P(Y_n=-1)=1-p, then we revert back to the simple random walks described in Figure 1.

To transition from state i to state j, the increment Y_n must be j-i. Thus the one-step transition probabilities P_{ij} are defined by P_{ij}=P[Y=j-i]. Recall that P[Y=y] is the common probability function for the sequence Y_1, Y_2, Y_3 \cdots. For a more detailed description, see this previous post.

Example 3 – Gambler’s Ruin

A gambler’s ruin is a simple random walk that can take on only finitely many states. At each time period, the state is one-unit higher or lower than the preceding state as a result of a random bet. More specifically, suppose that a gambler starts out with d units in capital (in dollars or other monetary units) and makes a series of one-unit bets against the house. For the gambler, the probabilities of winning and losing each bet are p and 1-p, respectively. Whenever the capital reaches zero, the gambler is in ruin and his capital remains zero thereafter. On the other hand, if the capital of the gambler increases to m where d<m, then the gambler quits playing. Let X_n be the capital of the gambler after the nth bet. Then \left\{X_n: n=0,1,2,3,\cdots \right\} is a random walk. The starting state is X_0=d. The states 0 and m are absorbing states. The following gives the transition probabilities.

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  p &\ \ \ \ \ \ j=i+1,i \ne 0, i \ne m \\            \text{ } & \text{ } \\          \displaystyle  1-p &\ \ \ \ \ \ j=i-1,i \ne 0, i \ne m \\           \text{ } & \text{ } \\           \displaystyle  1 &\ \ \ \ \ \ i=0,j=0 \\           \text{ } & \text{ } \\           \displaystyle  1 &\ \ \ \ \ \ i=m,j=m \\           \text{ } & \text{ } \\           0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

Whenever the gambler reaches the absorbing state of 0, the gambler is in ruin. If the process reaches the absorbing state of m, the gambler strikes gold. If the initial capital of the gambler is small relative to the casino, there is a virtually 100% chance that the gambler will be in ruin even if each bet has even odds (see here for the calculation). One interesting questions: on average how long does it take for the gambler to be in ruin? Such questions will be answered after necessary tools are built in subsequent posts.

An alternative interpretation of the gambler’s ruin random walk is this. Assume that two players make a series of one-unit bets against each other and that the total capital between the two players is m units with the first player having d units in capital. Further suppose that the first player has probability p of winning a bet and the second player has probability 1-p of winning a bet. Then the two gamblers play until one of them goes broke (is in ruin). Let X_n be the capital of the first player after the nth play. Then \left\{X_n: n=0,1,2,3,\cdots \right\} is a Markov chain that is a gambler’s ruin. The following graph shows five simulations of a gambler’s ruin from this previous post.

Figure 3 – Five Simulations of a Gambler’s Ruin

Example 4 – Discrete Birth and Death Chain

In a birth and death chain, the current state i in the process is transitioned to i+1 (a birth), i-1 (a death) or i. The state space is either \left\{0,1,2,3,\cdots \right\} or \left\{0,1,2,3,\cdots, m \right\}. The following gives the one-step transition probabilities.

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  q_i &\ \ \ \ \ \ j=i-1 \\            \text{ } & \text{ } \\          \displaystyle  r_i &\ \ \ \ \ \ j=i \\           \text{ } & \text{ } \\           \displaystyle  p_i &\ \ \ \ \ \ j=i+1 \\           \text{ } & \text{ } \\                    0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

where for each state i, q_i (the probability of down), r_i (the probability of same) and p_i (the probability of up) are non-negative real numbers such that q_i+r_i+p_i=1. The following gives the transition probability matrix for an illustrative case of m=5.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr        0 & r_0 & p_0 & 0 & 0 & 0 & 0 \cr        1 & q_1 & r_1 & p_1 & 0 & 0 & 0 \cr        2 & 0 & q_2 & r_2 & p_2 & 0 & 0 \cr        3 & 0 & 0 & q_3 & r_3 & p_3 & 0 \cr        4 & 0 & 0 & 0 & q_4 & r_4 & p_4 \cr        5 & 0 & 0 & 0 & 0 & q_5 & r_5 \cr   } \qquad

In the above matrix, q_0=0 and p_5=0. If r_0=1, then state 0 is absorbing. If p_0=1, then state 0 is reflecting. On the other hand, if r_5=1, then state 5 is absorbing. If q_5=1, then state 5 is reflecting.

The Ehrenfest chain is an example of a birth and death chain (with the boundary states being reflecting). When the probability of down q_i, the probability of same r_i and the probability of up p_i do not vary according to the current state (i.e. they are constant), the resulting birth and death chain is a random walk. Thus the gambler’s ruin chain is also an example of a birth and death chain. The birth and death chain is suitable model for applications in which the state of the process is the population of a living system.

Example 5 – Discrete Queueing Markov Chain

Though continuous-time queueing models are much more realistic, a simple discrete-time queueing model is introduced here in order to illustrate applications of discrete Markov chains. Suppose that time is measured in a convenient time interval such as one minute or some appropriate group of minutes. The model assumes that during one period of time, only one customer is served if at least one customer is present. The model also assumes that a random number of new customers arrive at any given time period and that the numbers of customers arriving in the time periods form an independent identically distributed sequence of random variables.

More specifically, let Y_1,Y_2,Y_3,\cdots be independent and identically distributed random variables that take on non-negative integers. Let P[Y=k], k=0,1,2,\cdots, be the common probability function. The random variable Y_n is the number of customers that arrive into the system during the nth period. Let X_0 be the number of customers present initially in the system. For n \ge 1, let X_n be the number of customers in the system during the nth period. If the current state is X_n, then the next state is:

    \displaystyle  X_{n+1} = \left\{ \begin{array}{ll}           \displaystyle  Y_{n+1} &\ \ \ \ \ \ X_n=0 \\            \text{ } & \text{ } \\          \displaystyle  (X_n-1)+Y_{n+1} &\ \ \ \ \ \ X_n>0 \\                      \end{array} \right.

where n \ge 0. In essence, the number of customers in the system at any given time period is the number of new customers arriving plus the customers already in the system less one (if the system is not empty). If the system is empty, the number of customers is simply the number of new customers. It follows that \left\{X_n: n \ge 0 \right\} is a Markov chain with the state space \left\{0,1,2,\cdots \right\}, the set of all non-negative integers. The one-step transition probabilities P_{ij} are given by:

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  P[Y=j] &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\            \text{ } & \text{ } \\          \displaystyle  P[Y=j-(i-1)] &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\                      \end{array} \right.

where P[Y=k] is the common probability function for the independent numbers of arrivals of customers Y_1,Y_2,Y_3,\cdots. If the Y_n have a common binomial distributions with parameters 3 and p=0.5. Then the following gives the transition probability matrix.

    \displaystyle \mathbf{P} =       \bordermatrix{ & 0 & \text{ } & 1 & \text{ } & 2 & \text{ } & 3 & \text{ } & 4 & \text{ } & 5 & \text{ } & 6 & \text{ } & 7 & \text{ } & 8   & \cdots \cr        0 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 &  \cdots \cr                1 & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 &    \cdots \cr          2 & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0    & \cdots \cr          3 & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0   & \cdots \cr          4 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0 & \text{ } & 0    & \cdots \cr          5 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125 & \text{ } & 0  & \cdots \cr          6 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375 & \text{ } & 0.125   & \cdots \cr         7 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375 & \text{ } & 0.375    & \cdots \cr        8 & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0 & \text{ } & 0.125 & \text{ } & 0.375    & \cdots \cr       \vdots & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots & \text{ } & \vdots    & \text{ } \cr       } \qquad

If the customer arrivals have a common Poisson distribution with parameter \lambda=0.2 (per period), then the following gives the transition probabilities.

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  \frac{e^{-0.2} \ 0.2^j}{j!} &\ \ \ \ \ \ i=0 \text{ and } j \in \left\{0,1,2,\cdots \right\} \\            \text{ } & \text{ } \\          \displaystyle  \frac{e^{-0.2} \ 0.2^{j-(i-1)}}{[j-(i-1)]!} &\ \ \ \ \ \ i>0 \text{ and } j \in \left\{i-1,i,i+1,\cdots \right\} \\                      \end{array} \right.

Example 6 – Discrete Branching Chain

Suppose that we have a system of particles that can generate new particles of the same type, e.g. particles such as neutrons and bacteria.

In such a system, we assume that a particle is replaced by a ransom number of new particles (regarded as offspring of the original particle). Furthermore, we assume that each particle generates offspring independently and offspring generation follows the same distribution. Let Y be the random variable that is the common distribution for offspring generation across all particles in the system. Let P[Y=k] be the common probability function of the number of offspring of a particle.

For n \ge 0, let X_n be the number of particles in the nth generation in the system. It follows from the assumptions that with X_n=i

    \displaystyle X_{n+1}=\sum \limits_{k=1}^{i} Y_k

where Y_1,Y_2, \cdots, Y_i are independent random variables with common probability function P[Y=k]. Then \left\{X_n: n \ge 0 \right\} is a Markov chain with the state space the set of all non-negative integers. The one-step transition probabilities P_{ij} are

    \displaystyle P_{ij}=P \biggl[\sum \limits_{k=1}^{i} Y_k = j \biggr]

In other words, the transition probability P_{ij} is defined by the probability that the independent sum Y_1+\cdots+Y_i taking on the next state j.

Note that state 0 is an absorbing state, which means extinction. An interesting problems in branching chains is the computation of the probability of extinction. One approach is to compute the probability of extinction for a branching chain with a single particle, i.e. the probability \rho of starting in state 1 and being absorbed into state 0. Then the probability of extinction for a branching chain starting with k particles is then \rho^k since the particles are assumed to generate new particles independently.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Chapman-Kolmogorov Equations

Stochastic processes and Markov chains are introduced in this previous post. Transition probabilities are an integral part of the theory of Markov chains. The post preceding this one is a beginning look at transition probabilities. This post shows how to calculate the n-step transition probabilities. The Chapman-Kolmogorov equations are also discussed and derived.

Introductory Example

Suppose \left\{X_n: n=0,1,2,\cdots \right\} is a Markov chain with the transition probability matrix \mathbf{P}. The entries of the matrix are the one-step transition probabilities P_{ij}. The number P_{ij} is the probability that the Markov chain will move to state j at time m+1 given that it is in state i at time m, independent of where the chain was prior to time m. Thus P_{ij} can be expressed as the following conditional probability.

    (1) \ \ \ \ \ P_{ij}=P[X_{m+1}=j \lvert X_m=i]

Thus the future state only depends on the period immediately preceding it. This is called the Markov property. Also note that P_{ij} is independent of the time period m. Any Markov chain with this property is called a time-homogeneous Markov chain or stationary Markov chain.

The focus of this post is to show how to calculate the probability that the Markov chain will move to state j at time n+m given that it is in state i at time m. This probability is denoted by P_{ij}^n. In other words, P_{ij}^n is the probability that the chain will be in state j after the Markov chain goes through n more periods given that it is in state i in the current period. The probability P_{ij}^n, as a conditional probability, can be notated as follows:

    (2) \ \ \ \ \ P_{ij}^n=P[X_{m+n}=j \lvert X_m=i]

In (2), the n in the n-step transition probabilities satisfies n \ge 0. Note that when n=0, P_{ij}^0=1 for i=j and P_{ij}^0=0 for i \ne j. Including the case for n=0 will make the Chapman-Kolmogorov equations work better.

Before discussing the general method, we use examples to illustrate how to compute 2-step and 3-step transition probabilities. Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.6 & 0.2 & 0.2   \cr        1 & 0.3 & 0.5 & 0.2  \cr        2 & 0.4 & 0.3 & 0.3   \cr           } \qquad

Calculate the two-step transition probabilities P_{02}^2, P_{12}^2 and P_{22}^2. Then calculate the three-step transition probability P_{02}^3 using the two-step transition probabilities.

First, let’s handle P_{02}^2. We can condition on the first steps. To go from state 0 to state 2 in two steps, the chain must first go to an interim state and then from that state go to state 2.

    \displaystyle \begin{aligned}P_{02}^2&=\sum \limits_{k=0}^2 P_{0k} \ P_{k2} \\&=P_{00} \ P_{02}+P_{01} \ P_{12}+P_{02} \ P_{22} \\&=0.6 \times 0.2+0.2 \times 0.2+0.2 \times 0.3 \\&=0.22  \end{aligned}

Note that the above calculation lists out the three possible paths to go from state 0 to state 2 in two steps – from state 0 to state 0 and then from state 0 to state 2, from state 0 to state 1 and then from state 1 to state 2 and from state 0 to state 2 and then from state 2 to state 2. Looking at the calculation more closely, the above calculation is basically the first row of \mathbf{P} (the row corresponding to state 0) multiplying the third column of \mathbf{P} (the column corresponding to state 2).

    P_{02}^2= \left[\begin{array}{ccc}      0.6 & 0.2 & 0.2  \\          \end{array}\right]     \left[\begin{array}{c}      0.2  \\ 0.2 \\ 0.3          \end{array}\right]=    \left[\begin{array}{c}      0.22          \end{array}\right]

By the same idea, the following gives the other two 2-step transition probabilities.

    \displaystyle \begin{aligned}P_{12}^2&=\sum \limits_{k=0}^2 P_{1k} \ P_{k2} \\&=P_{10} \ P_{02}+P_{11} \ P_{12}+P_{12} \ P_{22} \\&=0.3 \times 0.2+0.5 \times 0.2+0.2 \times 0.3 \\&=0.22  \end{aligned}

    \displaystyle \begin{aligned}P_{22}^2&=\sum \limits_{k=0}^2 P_{2k} \ P_{k2} \\&=P_{20} \ P_{02}+P_{21} \ P_{12}+P_{22} \ P_{22} \\&=0.4 \times 0.2+0.3 \times 0.2+0.3 \times 0.3 \\&=0.23  \end{aligned}

As discussed, the above two calculations can be viewed as the sum of all the possible paths to go from the beginning state to the end state (conditioning on the interim state in the middle) or as a row in the transition probability \mathbf{P} multiplying a column in \mathbf{P}. The following shows all three calculations in terms of matrix calculation.

    P_{02}^2= \left[\begin{array}{ccc}      0.6 & 0.2 & 0.2  \\          \end{array}\right]     \left[\begin{array}{c}      0.2  \\ 0.2 \\ 0.3          \end{array}\right]=    \left[\begin{array}{c}      0.22          \end{array}\right]

    P_{12}^2= \left[\begin{array}{ccc}      0.3 & 0.5 & 0.2  \\          \end{array}\right]     \left[\begin{array}{c}      0.2  \\ 0.2 \\ 0.3          \end{array}\right]=    \left[\begin{array}{c}      0.22          \end{array}\right]

    P_{22}^2= \left[\begin{array}{ccc}      0.4 & 0.3 & 0.3  \\          \end{array}\right]     \left[\begin{array}{c}      0.2  \\ 0.2 \\ 0.3          \end{array}\right]=    \left[\begin{array}{c}      0.23          \end{array}\right]

The view of matrix calculation will be crucial in understanding Chpan-Kolmogorov equations discussed below. To conclude the example, consider the 3-step transition probability P_{02}^3. We can also condition on the first step. The chain goes from state 0 to the next step (3 possibilities) and then goes from that state to state 2 in two steps.

    \displaystyle \begin{aligned} P_{02}^3&=\sum \limits_{k=0}^2 P_{0k} \ P_{k2}^2 \\&=P_{00} \ P_{02}^2+P_{01} \ P_{12}^2+P_{02} \ P_{22}^2 \\&=0.6 \times 0.22+0.2 \times 0.22+0.2 \times 0.23 \\&=0.222  \end{aligned}

The example shows that the calculation of a 3-step transition probability is based 2-step probabilities. Thus longer length transition probabilities can be built up from shorter length transition probabilities. However, it pays to focus on a more general framework before carrying out more calculation. The view of matrix calculation demonstrated above will help in understanding the general frame work.

Chapman-Kolmogorov Equations

The examples indicate that finding n-step transition probabilities involve matrix calculation. Let \mathbf{Q}_n be the n-step transition probability matrix. The goal now is to have a systematic way to compute the entries in the matrix \mathbf{Q}_n. The computation is based on the Chapman-Kolmogorov equations. These equations are:

    \displaystyle (3) \ \ \ \ \ P_{ij}^{n+m}=\sum \limits_{k=0}^\infty P_{ik}^n \times P_{kj}^m

for n, m \ge 0 and for all i,j. For a finite-state Markov chain, the summation in (3) does not go up to infinity and would have an upper limit. The number P_{ij}^{n+m} is the probability that the chain will be in state j after taking n+m steps if it is currently in state i. The following gives the derivation of (3).

    \displaystyle \begin{aligned} P_{ij}^{n+m}&=P[X_{n+m}=j \lvert X_0=i] \\&=\sum \limits_{k=0}^\infty  P[X_{n+m}=j, X_n=k \lvert X_0=i]\\&=\sum \limits_{k=0}^\infty \frac{P[X_{n+m}=j, X_n=k, X_0=i]}{P[X_0=i]} \\&=\sum \limits_{k=0}^\infty \frac{P[X_n=k,X_0=i]}{P[X_0=i]} \times \frac{P[X_{n+m}=j, X_n=k, X_0=i]}{P[X_n=k,X_0=i]} \\&=\sum \limits_{k=0}^\infty P[X_n=k \lvert X_0=i] \times P[X_{n+m}=j \lvert X_n=k, X_0=i] \\&=\sum \limits_{k=0}^\infty P[X_n=k \lvert X_0=i] \times P[X_{n+m}=j \lvert X_n=k] \ * \\&=\sum \limits_{k=0}^\infty P_{ik}^n \times P_{kj}^m \end{aligned}

Here’s the idea behind the derivation. The path from state i to state j in n+m steps can be broken down into two paths, one from state i to an intermediate state k in the first n transitions, and the other from state k to state j in the remaining m transitions. Summing over all intermediate states k yields the probability that the process will go from state i to state j in n+m transitions.

The entries in the matrix \mathbf{Q}_n can be then computed by (3). There is an interesting an useful interpretation of (3). The following is another way to state the Chapman-Kolmogorov equations:

    (4) \ \ \ \ \ \mathbf{Q}_{n+m}=\mathbf{Q}_n \times \mathbf{Q}_m.

A typical element of the matrix \mathbf{Q}_n is P_{ik}^n and a typical element of the matrix \mathbf{Q}_m is P_{kj}^m. Note that P_{ik}^n with k varying is a row in \mathbf{Q}_n (the row corresponding to the state i) and that P_{kj}^n with k varying is a column in \mathbf{Q}_m (the column corresponding to the state j).

    \left[\begin{array}{cccccc}      P_{i0}^n &  P_{i1}^n & \cdots  &  P_{ik}^n & \cdots  &  P_{iw}^n \\          \end{array}\right]     \left[\begin{array}{c}      P_{0j}^m \\ \text{ } \\ P_{1j}^m  \\ \vdots \\ P_{kj}^m \\ \vdots \\ P_{wj}^m          \end{array}\right]

The product of the above row and column is the transition probability P_{ij}^{n+m}.

Powers of the One-Step Transition Probability Matrix

Let \mathbf{P} be the one-step transition probability matrix of a Markov chain. Let \mathbf{Q}_n be the n-step transition probability matrix, which can be computed by using the Chapman-Kolmogorov equations. We now derive another important fact.

The n-step transition probability matrix \mathbf{Q}_n is obtained by multiplying the one-step transition probability matrix \mathbf{P} by itself n times, i.e. \mathbf{Q}_n is the nth power of \mathbf{P}.

This fact is important in terms of calculation of transition probabilities. Compute \mathbf{P}^n, the nth power of \mathbf{P} (in terms of matrix calculation). Then P_{ij}^n is simply an entry in \mathbf{P}^n (in the row that corresponds to state i and in the column that corresponds to state j). If the matrix size is large and/or n is large, the matrix multiplication can be done using software or by using an online matrix calculator (here’s one matrix calculator).

Of course, the above fact is also important Markov chain theory in general. Almost all mathematical properties of Markov chains are at root based on this basic fact.

We can establish this basic fact using an induction argument. Clearly \mathbf{Q}_1=\mathbf{P}^1=\mathbf{P}. Suppose that the fact is true for n-1. Based on (4), \mathbf{Q}_{n}=\mathbf{Q}_{n-1} \times \mathbf{Q}_1. Continuing the derivation: \mathbf{Q}_{n}=\mathbf{Q}_{n-1} \times \mathbf{Q}_1=\mathbf{P}^{n-1} \times \mathbf{P}=\mathbf{P}^n.

The Row Vector and the Column Vector

As demonstrated in the preceding section, \mathbf{P}^n, the nth power of the \mathbf{P}, is precisely the n-step transition probability matrix. Let’s examine the matrix \mathbf{P}^n more closely. Suppose that the Markov chain has a state space \left\{0,1,2,\cdots,w \right\}. The following shows the matrix \mathbf{P}^n with special focus on a generic row and a generic column.

    \mathbf{P}^n= \left[\begin{array}{ccccccc}      P_{0,0}^n & P_{0,1}^n & \cdots & P_{0,k}^n & \cdots & P_{0,w}^n \\      P_{1,0}^n & P_{1,1}^n & \cdots & P_{1,k}^n & \cdots & P_{1,w}^n \\      \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\      P_{i,0}^n & P_{i,1}^n & \cdots & P_{i,k}^n & \cdots & P_{i,w}^n \\      \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\  P_{w,0}^n & P_{w,1}^n & \cdots & P_{w,k}^n & \cdots & P_{w,w}^n \\    \end{array}\right]

Now look at the row corresponding to the state i and call it R_i. Also look at the column corresponding to the state k and call it C_k. They are separated from the matrix \mathbf{P}^n below.

        R_i=\left[\begin{array}{cccccc}      P_{i,0}^n &  P_{i,1}^n & \cdots  &  P_{i,k}^n & \cdots  &  P_{i,w}^n \\          \end{array}\right] \ \ \ \ \ \text{row vector}

        C_k=\left[\begin{array}{c}      P_{0,k}^n \\ \text{ } \\ P_{1,k}^n  \\ \vdots \\ P_{i,k}^n \\ \vdots \\ P_{w,k}^n           \end{array}\right] \ \ \ \ \ \text{column vector}

The row vector R_i is a conditional distribution. It gives the probabilities of the process transitioning (after n transitions) into one of the states given the process starts in state i. If it is certain that the process begins in state i, then R_i gives the probability function for the random variable X_n since P[X_n=k]=P_{i,k}^n.

On the other hand, if the initial state is not certain, then the column vector C_k gives the probabilities of the process transitioning (after n transitions) into state k from any one of the possible initial states. Thus a column from the matrix \mathbf{P}^n gives the probability of the process in a given state at the nth period from all possible initial states. The following gives the probability distribution for X_n (the state of the Markov chain at time n).

    \displaystyle (5) \ \ \ \ \ P[X_n=k]=\sum \limits_{i=0}^\infty P_{i,k}^n \times P[X_0=i]

The calculation in (5) can be viewed as matrix calculation. If we put the probabilities P[X_0=i] in a row vector, then the product of this row vector with the column vector C_k would be P[X_n=k]. The following is (5) in matrix multiplication.

    (6) \ \ \ \ \ P[X_n=k]=\left[\begin{array}{cccccc}      P(X_0=0) &  P(X_0=1) & \cdots  &  P(X_0=j) & \cdots  &  P(X_0=w) \\          \end{array}\right]     \left[\begin{array}{c}      P_{0k}^n \\ \text{ } \\ P_{1k}^n  \\ \vdots \\ P_{jk}^n \\ \vdots \\ P_{wk}^n          \end{array}\right]

Additional Comments on Chapman-Kolmogorov

Even though the matrix calculation for P_{ij}^n should be done using software, it pays to understand the orientation of the matrix \mathbf{P}^n. In the preceding section, we learn that a row of \mathbf{P}^n is the conditional distribution X_n \lvert X_0 (the state of the process at time n given the initial state). In the preceding section, we also learn that a column in the matrix \mathbf{P}^n gives the probabilities of the process landing in a fixed state k from any one of the initial states.

The Chapman-Kolmogorov equations in (3) tells us that an entry in the matrix \mathbf{P}^{n+m} is simply the product of a row in \mathbf{P}^n and a column in \mathbf{P}^m. This observation makes it possible to focus just on the transition probability that is asked in a given problem rather than calculating the entire matrix. For example, to calculate an entry P_{ij}^2 of the matrix \mathbf{P}^2, multiply a row of \mathbf{P} (the row corresponding to the state i) and a column of \mathbf{P} (the row corresponding to state j). There is no need to calculate the entire matrix \mathbf{P}^2 if the goal is just one entry of \mathbf{P}^2.

To calculate an entry of \mathbf{P}^n, there is a considerable amount of variation in multiplication approaches. For example, multiple a row of \mathbf{P} and a column of \mathbf{P}^{n-1} or multiple a row of \mathbf{P}^{n-1} and a column of \mathbf{P}. Or multiple a row of \mathbf{P}^2 and a column of \mathbf{P}^{n-2}.

If a row of transition probabilities multiplies a transition matrix, the result is a row in a higher transition probability matrix. For example, if a row in \mathbf{P} multiplies the matrix \mathbf{P}^n, the result is a row in \mathbf{P}^{n+1}. More specifically, the first row in \mathbf{P} multiplying the matrix \mathbf{P} gives the first row of \mathbf{P}^2. The first row of \mathbf{P} multiplying the matrix \mathbf{P}^2 gives the first row in \mathbf{P}^3, etc.

On the other hand, when a transition probability matrix multiplies a column of transition probabilities, the result is a column in a higher transition probability matrix. For example, the matrix \mathbf{P}^n multiplies a column in \mathbf{P}, the result is a column in \mathbf{P}^{n+1}. More specifically, when the matrix \mathbf{P} multiples the first column in the matrix \mathbf{P}, the result is the first column of \mathbf{P}^2. When the matrix \mathbf{P}^2 multiples a column in the matrix \mathbf{P}, the result is the first column in \mathbf{P}^3, etc.

Examples

We now give some examples on how to calculate transition probabilities.

Example 1
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.6 & 0.2 & 0.2   \cr        1 & 0.3 & 0.5 & 0.2  \cr        2 & 0.4 & 0.3 & 0.3   \cr           } \qquad

Determine P[X_3=1 \lvert X_0=0] and P[X_5=2 \lvert X_2=0].

Of course, the most straightforward way would be to calculate \mathbf{P}^3. Then P[X_3=1 \lvert X_0=0]=P_{01}^3 would be the (0,1)th entry of \mathbf{P}^3 and P[X_5=2 \lvert X_2=0]=P_{02}^3 would be the (0,1)th entry of \mathbf{P}^3. In fact, it is a good practice to use an online matrix calculator for this task. Doing so produces the following marrices.

    \mathbf{P}^2 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.50 & 0.28 & 0.22   \cr        1 & 0.41 & 0.37 & 0.22  \cr        2 & 0.45 & 0.32 & 0.23   \cr           } \qquad \ \ \ \ \ \   \mathbf{P}^3 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.472 & 0.306 & 0.222   \cr        1 & 0.445 & 0.333 & 0.222  \cr        2 & 0.458 & 0.319 & 0.223   \cr           } \qquad

From the matrix \mathbf{P}^3, we see that P_{01}^3=0.306 and P_{02}^3=0.222. Note that P_{02} is calculated in the introductory example earlier.

Example 2
For the transition matrix \mathbf{P} in Example 1, use the ideas discussed in the section Additional Comments on Chapman-Kolmogorov to compute the first row and the first column in the transition matrix \mathbf{P}^4.

    \left[\begin{array}{ccc}      0.6 &  0.2 & 0.2 \\          \end{array}\right]     \left[\begin{array}{ccc}     0.472 & 0.306 & 0.222  \\      0.445 & 0.333 & 0.222  \\      0.458 & 0.319 & 0.223            \end{array}\right] = \left[\begin{array}{ccc}      0.4638 &  0.314 & 0.2222           \end{array}\right]= \left[\begin{array}{ccc}      P_{00}^4 &  P_{01}^4 & P_{02}^4           \end{array}\right]

    \left[\begin{array}{ccc}     0.472 & 0.306 & 0.222  \\      0.445 & 0.333 & 0.222  \\      0.458 & 0.319 & 0.223            \end{array}\right]     \left[\begin{array}{c}      0.6 \\  0.3 \\ 0.4          \end{array}\right]    = \left[\begin{array}{c}      0.4638 \\  0.4557 \\ 0.4597          \end{array}\right]= \left[\begin{array}{c}      P_{00}^4 \\  P_{10}^4 \\ P_{20}^4          \end{array}\right]

As discussed in earlier, the first row of \mathbf{P}^4 is obtained by multiplying the first row of \mathbf{P} with the matrix \mathbf{P}^3. The first column of \mathbf{P}^4 is obtained by multiplying the matrix \mathbf{P}^3 with the first column of the matrix \mathbf{P}.

Example 3
A particle moves among states 0, 1 and 2 according to a Markov process with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.6 & 0.3 & 0.1   \cr        1 & 0.3 & 0.3 & 0.4  \cr        2 & 0.3 & 0.2 & 0.5   \cr           } \qquad

Let X_n be the position of the particle after the nth move. Suppose that at the beginning, the particle is in state 1. Determine the probability P[X_2=k] where k=0,1,2.

Since the initial state of the process is certain, P[X_2=k]=P[X_2=k \lvert X_0=1]=P_{1k}^2. Thus the problem is to find the second row of \mathbf{P}^2.

    \left[\begin{array}{ccc}      0.3 &  0.3 & 0.4 \\          \end{array}\right]     \left[\begin{array}{ccc}     0.6 & 0.3 & 0.1  \\      0.3 & 0.3 & 0.4  \\      0.3 & 0.2 & 0.5            \end{array}\right] = \left[\begin{array}{ccc}      0.39 &  0.26 & 0.35           \end{array}\right]= \left[\begin{array}{ccc}      P_{10}^2 &  P_{11}^2 & P_{12}^2           \end{array}\right]

Example 4
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.3 & 0.2 & 0.5   \cr        1 & 0.1 & 0.3 & 0.6  \cr        2 & 0.4 & 0.3 & 0.3   \cr           } \qquad

Suppose that initially the process is equally likely to be in state 0 or state 1. Determine P[X_3=0] and P[X_3=1].

To find P[X_3=0], we need the first column of \mathbf{P}^3. To find P[X_3=1], we need the second column of \mathbf{P}^3. Using an online calculator produces \mathbf{P}^3.

    \mathbf{P}^3 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.288 & 0.269 & 0.443   \cr        1 & 0.283 & 0.270 & 0.447  \cr        2 & 0.295 & 0.273 & 0.432   \cr           } \qquad

The following calculation gives the answers.

    P[X_3=0]=\left[\begin{array}{ccc}      0.5 & 0.5 & 0   \\          \end{array}\right]     \left[\begin{array}{c}      0.288 \\ 0.283 \\ 0.295           \end{array}\right]=0.2855

    P[X_3=1]=\left[\begin{array}{ccc}      0.5 & 0.5 & 0   \\          \end{array}\right]     \left[\begin{array}{c}      0.269 \\ 0.270 \\ 0.273           \end{array}\right]=0.2695

    P[X_3=2]=1-P[X_3=0]-P[X_3=1]=0.445

Example 5
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.6 & 0.2 & 0.2   \cr        1 & 0.2 & 0.2 & 0.6  \cr        2 & 0 & 0 & 1   \cr           } \qquad

At time 0, the Markov process is in state 0. Let T=\text{min}\left\{n \ge 0:X_n=2 \right\}. Note that state 2 is called an absorbing state. The Markov process will eventually reach and be absorbed in state 2 (it stays there forever whenever the process reaches state 2). Thus T is the first time period in which the process reaches state 2. Suppose that the Markov process is being observed and that absorption has not taken place. Then we would be interested in this conditional probability: the probability that the process is in state 0 or state 1 given that the process has not been absorbed. Determine P[X_4=0 \lvert T>4].

The key idea here is that for the event T>4 to happen, X_4=0 or X_4=1. Thus

    P[T>4]=P[X_4=0 \text{ or } X_4=1]=P[X_4=0]+P[X_4=1]=P_{00}^4+P_{01}^4.

Note that since the initial state is certain to be state 0, P[X_4=0]=P_{00}^4 and P[X_4=1]=P_{01}^4. Using an online calculator gives the following matrix.

    \mathbf{P}^4 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.1856 & 0.0768 & 0.7376   \cr        1 & 0.0768 & 0.032 & 0.8912  \cr        2 & 0 & 0 & 1   \cr           } \qquad

The following gives the desired result.

    \displaystyle P[X_4=0 \lvert T>4]=\frac{P_{00}^4}{P_{00}^4+P_{01}^4}=\frac{0.1856}{0.1856+0.0768}=0.7073

Thus if absorption has not taken place in the 4th period, the process is in state 0 about 71% of the time.

Example 6
Continue with the Markov chain in Example 5. Determine P[T=4].

Note that P[T=4]=P[T>3]-P[T>4]. The probability P[T>4] is already determined using \mathbf{P}^4. To determine P[T>3], we need the top row of \mathbf{P}^3.

    \mathbf{P}^3 =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.272 & 0.112 & 0.616   \cr        1 & 0.112 & 0.048 & 0.84  \cr        2 & 0 & 0 & 1   \cr           } \qquad

Thus P[T>3]=P_{00}^3+P_{01}^3=0.272+0.112=0.384. Thus P[T=4]=0.384-0.2624=0.1216. Thus about 12% of the time, absorption takes place in the 4th period about 12% of the time.

Exercises

We give a few practice problems to reinforce the concepts and calculation discussed here.

Exercise 1
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.1 & 0.2 & 0.7   \cr        1 & 0.2 & 0.2 & 0.6  \cr        2 & 0.6 & 0.1 & 0.3   \cr           } \qquad

Determine these conditional probabilities.

  • P[X_3=2 \lvert X_1=1]
  • P[X_3=2 \lvert X_0=1]
  • P[X_4=2 \lvert X_0=1]

Exercise 2
A particle moves among states 1, 2 and 3 according to a Markov process with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 1 & 2 & 3 \cr        1 & 0 & 0.6 & 0.4   \cr        2 & 0.6 & 0 & 0.4  \cr        3 & 0.6 & 0.4 & 0   \cr           } \qquad

Let X_n be the position of the particle after the nth move. Determine the probability P[X_3=1 \lvert X_0=1] and P[X_4=1 \lvert X_0=1].

Exercise 3
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.1 & 0.2 & 0.7   \cr        1 & 0.9 & 0.1 & 0  \cr        2 & 0.1 & 0.8 & 0.1   \cr           } \qquad

Suppose that initially the process is equally likely to be in state 0 or state 1 or state 2. Determine the distribution for X_3.

Exercise 4
Continue with Example 5 and Example 6. Work these two examples assuming that the Markov chain starts in state 1 initially.

\text{ }

\text{ }

\text{ }

Answers to Exercises

Exercise 1

  • P[X_3=2 \lvert X_1=1]=P_{12}^2=0.13
  • P[X_3=2 \lvert X_0=1]=P_{12}^3=0.16
  • P[X_4=2 \lvert X_0=1]=P_{12}^4=0.1473

Exercise 2

  • P[X_3=1 \lvert X_0=1]=P_{11}^3=0.24
  • P[X_4=1 \lvert X_0=1]=P_{11}^4=0.456

Exercise 3

  • \displaystyle P[X_3=0]=\frac{1.076}{3}=0.3587
  • \displaystyle P[X_3=1]=\frac{1.013}{3}=0.3377
  • \displaystyle P[X_3=2]=\frac{0.911}{3}=0.3037

Exercise 4

  • \displaystyle P[X_4=0 \lvert T>4]=\frac{P_{10}^4}{P_{10}^4+P_{11}^4}=\frac{0.0768}{0.0768+0.0.032}=0.70588
  • P[T=4]=P[T>3]-P[T>4]=0.16-0.1088=0.0512

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

A beginning look at transition probabilities

The previous post introduces the notion of Markov chains, more specifically discrete Markov chains. This and the next post focus on calculating transition probabilities.

Suppose that \left\{X_n: n \ge 0 \right\} is a Markov chain with transition probability matrix \mathbf{P}. Elements of \mathbf{P} are the one-step transition probabilities P_{ij}. As the Markov process moves through the states over time, the probabilities in the matrix \mathbf{P} shows how likely the process will transition to state j in the next time period if the process is currently in state i. Elements of \mathbf{P} are conditional probabilities:

    (1) \ \ \ \ \ P_{ij}=P[X_{n+1}=j \lvert X_n=i]

The transition probabilities described in (1) are independent of the time period n. These are called stationary transition probabilities. Markov chains with stationary transition probabilities are called stationary Markov chains or time-homogeneous Markov chains. One immediate goal is to develop the n-step transition probabilities P_{ij}^n from the one-step transition probabilities P_{ij}. This is to be developed in the next post. This post focuses on basic calculations with P_{ij}.

Examples of Conditional Probabilities

We use examples to demonstrate conditional probabilities that can be calculated from the one-step transition probabilities P_{ij}. The calculation is based on the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 \cr        0 & 0.6 & 0.2 & 0.2   \cr        1 & 0.3 & 0.5 & 0.2  \cr        2 & 0.4 & 0.3 & 0.3   \cr           } \qquad

Example 1
Determine the following conditional probabilities:

  • P[X_1=0, X_2=1 \lvert X_0=0]
  • P[X_2=0, X_3=1 \lvert X_1=0]
  • P[X_1=2, X_2=1,X_3=0 \lvert X_0=1]

Note that the first two are identical since the transition probabilities are stationary. To calculate the first and third probabilities, it is a matter of linking up the one-step transition probabilities from one state to the next.

    \displaystyle \begin{aligned} P[X_1=0, X_2=1 \lvert X_0=0]&=P[X_1=0 \lvert X_0=0] \times P[X_2=1 \lvert X_1=0] \\&=P_{00} \times P_{01} \\&=0.6 \times 0.2=0.12 \end{aligned}

    \displaystyle P[X_2=0, X_3=1 \lvert X_1=0]=P[X_1=0, X_2=1 \lvert X_0=0]=0.12

    \displaystyle \begin{aligned} P[X_1=2, X_2=1,X_3=0 \lvert X_0=1]&=P[X_1=2 \lvert X_0=1] \times P[X_2=1 \lvert X_1=2] \times P[X_3=0 \lvert X_2=1] \\&=P_{12} \times P_{21} \times P_{10} \\&=0.2 \times 0.3 \times 0.3=0.018 \end{aligned}

Example 2
Example 1 shows that the probability of a chain moving from along a path conditional on a starting state is simply the product of the transition probabilities on that path. We now calculate unconditional probability of a path. For this to be possible, we need to assume a distribution for the initial state.

Assume that P[X_0=0]=0.4, P[X_0=1]=0.3 and P[X_0=2]=0.3. Determine the following unconditional probabilities:

  • P[X_0=0,X_1=0,X_2=1]
  • P[X_1=0,X_2=0,X_3=1]
  • P[X_0=1,X_1=2, X_2=1,X_3=0]

To calculate these unconditional probabilities, we multiply the transition probabilities on the path along with the probability of the first state in the path.

    \displaystyle \begin{aligned} P[X_0=0,X_1=0,X_2=1]&=P[X_0=0] \times P[X_1=0 \lvert X_0=0] \times P[X_2=1 \lvert X_1=0] \\&=P[X_0=0] \times P_{00} \times P_{01} \\&=0.4 \times 0.6 \times 0.2=0.048 \end{aligned}

To calculate the second probability, we need to first compute P[X_1=0]. This can be done by conditioning on the initial state X_0.

    \displaystyle \begin{aligned} P[X_1=0]&=P[X_0=0] \times P_{00}+ P[X_0=1] \times P_{10} +P[X_0=2] \times P_{20} \\&=0.4 \times 0.6 +0.3 \times 0.3+0.3 \times 0.4=0.45 \end{aligned}

    \displaystyle \begin{aligned}  P[X_1=0,X_2=0,X_3=1]&=P[X_1=0] \times P[X_2=0 \lvert X_1=0] \times  P[X_3=1 \lvert X_2=0]\\&=P[X_1=0] \times P_{00} \times P_{01} \\&=0.45 \times 0.6 \times 0.2 \\&=0.054  \end{aligned}

The following gives the third probability.

    \displaystyle \begin{aligned} P[X_0=1,X_1=2, X_2=1,X_3=0]&=P[X_0=1] \times P[X_1=2 \lvert X_0=1] \times P[X_2=1 \lvert X_1=2] \times P[X_3=0 \lvert X_2=1] \\&=P[X_0=1] \times P_{12} \times P_{21} \times P_{10} \\&=0.4 \times 0.2 \times 0.3 \times 0.3 \\&=0.0072 \end{aligned}

Note that each unconditional probability in this example is obtained by multiplying the corresponding conditional probability in Example 1 with the probability of the conditioned event.

Example 3
Sometimes we know for certain that the Markov chain starts in a given state. Then the unconditional probability of a path is simply the conditional probability of that path (conditioned on the initial state). Determine the three probabilities in Example 2 given that the chain starts in state 0 in the first two probabilities and the chain starts in state 1 in the third probability. The probabilities are:

  • P[X_0=0,X_1=0,X_2=1]
  • P[X_1=0,X_2=0,X_3=1]
  • P[X_0=1,X_1=2, X_2=1,X_3=0]

For the first probability, P[X_0=0,X_1=0,X_2=1]=P[X_0=0] \times P[X_1=0,X_2=1 \lvert X_0=0]. With P[X_0=0]=1, P[X_0=0,X_1=0,X_2=1]=P[X_1=0,X_2=1 \lvert X_0=0]=0.12, the answer from Example 1.

For the second probability, we need to first calculate P[X_1=0]. Since P[X_0=0]=1, P[X_1=0]=P[X_0=0] \times P_{00}=0.6. Thus P[X_1=0,X_2=0,X_3=1]=P[X_1=0] \times P_{00} \times P_{01}=0.6 \times 0.6 \times 0.2=0.072, which is 0.6 times the corresponding conditional probability in Example 1.

Similarly, the third probability is

    \displaystyle \begin{aligned} P[X_0=1,X_1=2, X_2=1,X_3=0]&=P[X_1=2, X_2=1,X_3=0 \lvert X_0=1] \\& =0.18\end{aligned}

Conditional Probabilities of Paths in Markov Chains

The conditional probabilities in the above examples are not difficult to do as long as the basic idea of conditional probabilities and the Markov property are understood. First, the idea of conditional probability. The idea is that the probability of the intersection of two events can be obtained by conditioning on one of the events. For example, given two events A and B, we have

    (2a) \ \ \ \ \ P[A \cap B]=P[A] \times P[B \lvert A]

    \displaystyle (2b) \ \ \ \ \ P[B \lvert A]=\frac{P[A \cap B]}{P[A]}

The probability of the intersection of two events is the product of a conditional probability of one event given another event times the probability of the event being conditioned. Relation (2b) is the probability of the event B given that event A has occurred. Of course, the conditioned event has to have positive probability, i.e. P[A]>0.

Suppose that the event A is X=x, the discrete random variable X taking on the value x and that the event B is Y=y. Then we have the following relations.

    (3a) \ \ \ \ \ P[X=x,Y=y]=P[X=x] \times P[Y=y \lvert X=x]

    \displaystyle (3b) \ \ \ \ \ P[Y=y \lvert X=x]=\frac{P[X=x,Y=y]}{P[X=x]}

We can also write the conditional probability of X=x conditioning on the event Y=y. Suppose we have a path in a Markov chain, X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n. This is the event that the chain starts in state x_0 initially, then move to state x_1 in time 1 and so on and at time n the chain is in state x_0. Conditioning on the initial state, consider the following conditional probability.

    \displaystyle (4) \ \ \ \ \ P[X_1=x_1,X_2=x_2,\cdots,X_n=x_n \lvert X_0=x_0]=\frac{P[X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n]}{P[X_0=x_0]}

Relation (4) can be further simplified since the random variables X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n satisfy the Markov property.

    \displaystyle \begin{aligned} (5) \ \ \ \ \ P[X_1=x_1,X_2=x_2,\cdots,X_n=x_n \lvert X_0=x_0]&=P[X_1=x_1 \lvert X_0=x_0] \times P[X_2=x_2 \lvert X_1=x_1] \\& \ \ \ \times \cdots \times P[X_n=x_n \lvert X_{n-1}=x_{n-1}] \\&=P_{x_0,x_1} \times P_{x_1,x_2} \times \cdots \times P_{x_{n-1},x_n}  \end{aligned}

Relation (5) says that the probability of a path in a Markov chain conditioning on the initial state is simply the product of the transition probabilities along the path. This is derived using the Markov property. Relation (5) can be established by an induction argument on the subscript n, i.e. the length of a path.

Clearly P[X_1=j \lvert X_0=i]=P_{ij}. This is the fundamental assumption in Markov chains. Now consider P[X_1=j,X_2=k \lvert X_0=i].

    \displaystyle \begin{aligned} P[X_1=j,X_2=k \lvert X_0=i]&=\frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i]} \\&=\frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i,X_1=j]} \times \frac{P[X_0=i,X_1=j]}{P[X_0=i]} \\&=P[X_2=k \lvert X_0=i,X_1=j] \times P[X_1=j \lvert X_0=i] \\&=P[X_2=k \lvert X_1=j] \times P[X_1=j \lvert X_0=i] \\&=P_{jk} \times P_{ij} \\&=P_{ij} \times P_{jk} \end{aligned}

Note that P[X_2=k \lvert X_0=i,X_1=j]=P[X_2=k \lvert X_1=j] in the above derivation. This is Markov property in action since the probability of the future state depends only on the state immediately preceding it. Now that relation (5) is true for a path of length 2, the following shows how it is derived for length 3.

    \displaystyle \begin{aligned} P[X_1=j,X_2=k,X_3=l \lvert X_0=i]&=\frac{P[X_0=i,X_1=j,X_2=k,X_3=l]}{P[X_0=i]} \\&=\frac{P[X_0=i,X_1=j,X_2=k,X_3=l]}{P[X_0=i,X_1=j,X_2=k]} \times \frac{P[X_0=i,X_1=j,X_2=k]}{P[X_0=i]} \\&=P[X_3=l \lvert X_0=i,X_1=j,X_2=k] \times P[X_1=j,X_2=k \lvert X_0=i]  \\&=P[X_3=l \lvert X_2=k] \times P[X_1=j,X_2=k \lvert X_0=i] \\&=P_{ij} \times P_{jk} \times P_{kl} \end{aligned}

Note that going from the third line to the fourth line the Markov property is used. Relation (5) for length 2 is used at the end. The inductive proof keeps going and relation (5) is true for any path of arbitrary length.

Thus the conditional probability in (4) may not be special. With the additional assumption of Markov property, the conditional probability can be made to be a “chain” of one-step transition probabilities.

Using the idea of (3a), the unconditional probability of a path is:

    \displaystyle \begin{aligned} (6) \ \ \ \ \ P[X_0=x_0,X_1=x_1,X_2=x_2,\cdots,X_n=x_n]&=P[X_0=x_0] \times P_{x_0,x_1} \\& \ \ \times P_{x_1,x_2} \times \cdots \times P_{x_{n-1},x_n}   \end{aligned}

Relation (6) says that the unconditional probability of a path in a Markov chain is simply the conditional probability of the path conditioning on the initial state times the probability of the initial state.

More Examples

The earlier examples demonstrate relation (5) and relation (6). We now work a few more examples.

Example 4
There is a slightly different way to look at one unconditional probability P[X_1=0,X_2=0,X_3=1] in Example 2. The first state in this path is X_1=0. We can condition on the initial state X_0.

    \displaystyle P[X_1=0,X_2=0,X_3=1]=\sum \limits_{k=0}^2 P[X_0=k,X_1=0,X_2=0,X_3=1]

Then every one of the unconditional probability within the summation sign can be computed according to the idea in (6). Note that Example 2 gives the distribution of the initial state.

    \displaystyle \begin{aligned} P[X_1=0,X_2=0,X_3=1]&=P[X_0=0,X_1=0,X_2=0,X_3=1]+ \\& \ \ P[X_0=1,X_1=0,X_2=0,X_3=1]+P[X_0=2,X_1=0,X_2=0,X_3=1] \\&=P[X_0=0] \times P_{00} \times P_{00} \times P_{01} +\\&\ \ P[X_0=1] \times P_{10} \times P_{00} \times P_{01}+P[X_0=2] \times P_{20} \times P_{00} \times P_{01} \\&=0.4 \times 0.6 \times 0.6 \times 0.2+\\&\ \ 0.3 \times 0.3 \times 0.6 \times 0.2 + 0.3 \times 0.4 \times 0.6 \times 0.2\\&=0.054 \end{aligned}

In calculating probabilities in a Markov chain, conditioning on the first step is a useful technique that will be used often in the subsequent posts.

Example 5
Consider a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 \cr        0 & 0.5 & 0.2 & 0.2 & 0.1 \cr        1 & 0.3 & 0.3 & 0.2 & 0.2 \cr        2 & 0.3 & 0.2 & 0.2 & 0.3 \cr        3 & 0 & 0 & 0 & 1   } \qquad

State 3 is called an absorbing state since P_{33}=1. In other words, when the chain reaches state 3, it will never leave. Suppose that initially the Markov chain is in state 0 50% of the time, in state 1 30% of the time and in state 2 20% of the time. Calculate the probability P[X_1=0,X_2=1,X_3=0,X_4=2].

We can condition on the initial state as in Example 4.

    \displaystyle \begin{aligned} P[X_1=0,X_2=1,X_3=0,X_4=2]&=\sum \limits_{k=0}^2 P[X_0=k,X_1=0,X_2=1,X_3=0,X_4=2] \\&=P[X_0=0,X_1=0,X_2=1,X_3=0,X_4=2]+ \\&\ \ P[X_0=1,X_1=0,X_2=1,X_3=0,X_4=2]+\\&\ \ P[X_0=2,X_1=0,X_2=1,X_3=0,X_4=2] \\&=P[X_0=0] \times P_{00} \times P_{01} \times P_{10} \times P_{02}+ \\& \ \ P[X_0=1] \times P_{10} \times P_{01} \times P_{10} \times P_{02}+\\&\ \ P[X_0=2] \times P_{20} \times P_{01} \times P_{10} \times P_{02} \\&=0.0048\end{aligned}

Example 6
This is Example 1 in this previous post.

When a binary digit, 0 or 1, is transmitted through a communication system, it passes through several stages. At each stage, there is a probability p that the digit is transmitted in error. Let X_n be the digit that is the output at the nth stage. Then \left\{X_n: n \ge 0 \right\} is a two-state Markov chain with the following transition probability matrix.

    \mathbf{P} = \bordermatrix{     & 0 &  \text{ } & 1 \cr   0 & 1-p & \text{ } & p \cr  \text{ } & \text{ } & \text{ } &\text{ } \cr   1 & p & \text{ } & 1-p \cr  }

Determine the following:

  • If the digit to be sent is 0, determine the probability that no error occurs up to the second stage.
  • If the digit to be sent is 0, determine the probability that a correct message is received at stage 2 through stage 4.

The first probability is P[X_1=0,X_2=0 \lvert X_0=0]. This is P_{00} \times P_{00}=(1-p)^2. To calculate the second probability, we can condition on the state in time 1. We first calculate P[X_1=0] and P[X_1=1]. Since we know for certain that the initial state is 0, P[X_1=0]=P_{00}=1-p and P[X_1=1]=P_{01}=p.

    \displaystyle \begin{aligned} P[X_2=0,X_3=0,X_4=0]&=P[X_1=0,X_2=0,X_3=0,X_4=0]+P[X_1=1,X_2=0,X_3=0,X_4=0] \\&=P[X_2=0,X_3=0,X_4=0 \lvert X_1=0] \times P[X_1=0]+ \\&\ \ P[X_2=0,X_3=0,X_4=0 \lvert X_1=1] \times P[X_1=1] \\&=P_{00} \times P_{00} \times P_{00} \times (1-p)+ \\&\ \ P_{10} \times P_{00} \times P_{00} \times p\\&=(1-p)^4+p^2 \times (1-p)^2 \end{aligned}

Remarks

The calculation in this post involves very basic calculation with one-step transition probabilities, i.e. the elements of the transition probability matrix given in a Markov chain. The n-step transition probabilities are key to the development of the theory of Markov chains. The next post begins the discussion of Chapman-Kolmogorov equations.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Introducing Markov Chains

This post continues the discussion of the previous two posts – the previous post that looks at random walks informally through graphs and the previous post that defines random walks more formally. This post introduces the notion of Markov chains.

Stochastic Process

A stochastic process is a collection of random variables \left\{X_t: t \in T \right\}. Each element of the collection is a random variable X_t. The index t is often regarded as time. Thus we refer to X_t the state of the process at time t. Thus a stochastic process is a family of random variables that describes the evolution through time of some physical process.

The set T is called the index set of the stochastic process. When T is a countable set, e.g. T=\left\{0,1,2,3,\cdots \right\}, the stochastic process is said to be a discrete-time process. When the index set T is an interval of the real number line, the stochastic process is said to be a continuous-time process. For example, \left\{X_n: n=0,1,2,\cdots \right\} is a discrete-time stochastic process indexed by the non-negative integers whereas \left\{X_t: t \ge 0 \right\} is a continuous-time stochastic process indexed by the non-negative real numbers. The state space of a stochastic process is defined as the set of all possible values that can be taken by the random variables X_t.

In the present discussion, we focus on discrete-time stochastic processes \left\{X_n: n=0,1,2,\cdots \right\} where the state space S is finite or countably infinite. When the state space is finite, the process is said to be a finite-state stochastic process.

We also assume that the state space S consists of integers. Thus S is either \mathbb{N}=\left\{0,1,2,3,\cdots \right\} (the natural numbers) or \mathbb{Z}=\left\{\cdots,-3,-2,-1,0,1,2,3,\cdots \right\} (the integers), or some subset of \mathbb{N} or \mathbb{Z}.

If we know nothing about X_n other than the fact that they are random variables, then there is not much we can say about the stochastic process \left\{X_n: n=0,1,2,\cdots \right\}. To make the discussion more meaningful, it is necessary to impose some additional structure on these random variables.

The simplest structure we can assume is that the random variables X_n are independent random variables. This would be a good model for phenomena that can be regarded as random experiments in which the future states of the process are independent of past and present states. However, in many situations the assumption of independence is often unjustified. In many stochastic processes that arise in practice, past and present states can influence the future states.

Markov Chains

We consider stochastic processes that possess the following property: the future states of the process (conditional on both past and present states) depends only upon the present state, not on the past states leading up to the present state. This property is called the Markov property. Any stochastic process that possesses the Markov property is called a Markov chain. More specifically, to satisfy the Markov property, given the present state X_n and the past states X_{n-1},X_{n-2},\cdots,X_2,X_1,X_0, the conditional distribution of the future state X_{n+1} depends only on the present state X_n. Even more specifically, to satisfy the Markov property, the process must satisfy the following requirement:

    \displaystyle (1) \ \ \ \ P[X_{n+1}=j \lvert X_n=i,X_{n-1}=x_{n-1},\cdots,X_0=x_0]=P[X_{n+1}=j \lvert X_n=i]

for all possible states x_0,x_1, \cdots,x_{n-1},i,j and all non-negative integers n. Thus under the Markov property, the conditional distribution of the future state X_{n+1} depends only on the present state X_n and that all the states preceding the present state have no influence on the future state.

The conditional probabilities P[X_{n+1}=j \lvert X_n=i] in (1) are called transition probabilities of the Markov chain. In the present discussion, we consider only the Markov chains in which the transition probabilities P[X_{n+1}=j \lvert X_n=i] are independent of the current period n. Such Markov chains are called time-homogeneous Markov chains or stationary Markov chains. So the additional assumption of time-homogeneity means that the probability of transitioning into state j from state i is identical regardless of where the process is in the time scale (at the beginning in the process or later in the process).

With the transition probability P[X_{n+1}=j \lvert X_n=i] independent of n, we can denote this probability by the states i and j only:

    \displaystyle  (2) \ \ \ \ P_{ij}=P[X_{n+1}=j \lvert X_n=i]

The probability P_{ij} represents the probability that the process, starting in state i, will enter into state j in the next period. Since the process must transition into one of the states in the state space in the next time period, P_{ij}, with i fixed and j varying, must represent a probability distribution. Thus P_{i0}+  P_{i1}+\cdots=1. Here we assume that the state space is the set of all non-negative integers.

The probabilities P_{ij} are called one-step transition probabilities since they give the probabilities of transitioning from one state to another state in one period of time. The one-step transition probabilities are usually expressed in a matrix or a transition diagram. The following shows what a transition probability matrix looks like.

    \mathbf{P}= \left[\begin{array}{cccccc}      P_{0,0} & P_{0,1} & P_{0,2} & \cdots & P_{0,m} \\      P_{1,0} & P_{1,1} & P_{1,2} & \cdots & P_{1,m} \\      P_{2,0} & P_{2,1} & P_{2,2} & \cdots & P_{2,m} \\      \vdots & \vdots & \vdots & \vdots & \vdots \\  P_{m,0} & P_{m,1} & P_{m,2} & \cdots & P_{m,m} \\    \end{array}\right]

The above transition matrix is for a finite-state Markov chain with state space being \left\{0,1,\cdots,m \right\}. The probabilities in each row sum to 1. If the process is currently in state i, the row with the first index as i (in this case the (i+1)st row) will give the probabilities of transitioning into all the states. If the state space is countably infinite, then each row would have infinitely many terms and there would be infinitely many rows.

In many situations, it is helpful to have a column label and row label to specify the states in the Markov chain. For example, a particle moves through the states 0, 1, 2, 3 and 4 according to a Markov chain described by the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 \cr        0 & 0 & 0.6 & 0.2  & 0.1  & 0.1 \cr        1 & 0.5 & 0 & 0.2  & 0.2  & 0.1 \cr        2 & 0.4 & 0.2 & 0  & 0.1  & 0.3 \cr        3 & 0.2 & 0.3 & 0.2  & 0  & 0.3 \cr        4 & 0.6 & 0.2 & 0.1  & 0.1  & 0 \cr   } \qquad

In the above matrix, P_{31}=0.3, which means that when the Markov process is in state 3, it moves to state 1 with probability 0.3. As discussed earlier, the probabilities in a given row sum to 1, which makes sense since the probabilities in a row describes all possible moves when the process is a given state.

When it is helpful to do so, the transition probability matrices will have a row label and a column label. If the row label and column label are not given, we assume that the rows and columns are listed in ascending order (recall that the states are integers). The transition probability matrices can be displayed using square brackets or parentheses.

Examples

The key to working with Markov chains is the transition probabilities P_{ij}. These probabilities allow us describe the random transitions through the states over time. The remainder of this post gives several examples of Markov chains focusing on transition probability matrices.

Example 1
When a binary digit, 0 or 1, is transmitted through a communication system, it passes through several stages. At each stage, there is a probability p that the digit is transmitted in error. Let X_n be the digit that is the output at the nth stage. Then \left\{X_n: n \ge 0 \right\} is a two-state Markov chain with the following transition probability matrix.

    \mathbf{P} = \bordermatrix{     & 0 &  \text{ } & 1 \cr   0 & 1-p & \text{ } & p \cr  \text{ } & \text{ } & \text{ } &\text{ } \cr   1 & p & \text{ } & 1-p \cr  }

Some typical problems might be:

  • If the digit to be sent is 0, determine the probability that no error occurs up to the second stage.
  • If the digit to be sent is 0, determine the probability that a correct message is received at stage 2.
  • If the digit to be sent is 0, determine the probability that a correct message is received at stage n where n \ge 1.

Example 2
Let’s consider the example of gambler’s ruin with finitely many states. Suppose that a gambler makes a series of one-unit bets against the house. For the gambler, the probabilities of winning and losing each bet are p and 1-p, respectively. Whenever the capital reaches zero, the gambler is in ruin and his capital remains zero thereafter. On the other hand, if the capital of the gambler increases to m, a large level of capital, the gambler quits playing. Let X_n be the capital of the gambler after the nth bet. Then \left\{ X_n: n=0,1,2,\cdots \right\} is a random walk, as discussed in the previous post. Since the future capital X_{n+1} depends only on the current level of capital X_n (up one unit or down one unit), this is also a Markov chain. The following is the transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & \cdots & m \cr        0 & 1 & 0 & 0 & 0 & 0 & \cdots & 0 \cr        1 & 1-p & 0 & p & 0 & 0 & \cdots & 0 \cr        2 & 0 & 1-p & 0 & p & 0 &\cdots & 0 \cr        3 & 0 & 0 & 1-p & 0 & p &\cdots & 0 \cr        \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \cr        m & 0 & 0 & 0 & 0 & 0 &\cdots & 1    } \qquad

The above is an (m+1) \times (m+1) matrix. The states are of the process are 0,1,2,\cdots,m. The states 0 and m are absorbing states since the process would stay there and not leave once the process arrive at these two states. Note that P_{00}=1 and P_{mm}=1. State 0 would be the state of ruin. State m would mean the gambler becomes fabulously rich (if m is large).

For the rows in between, the transition probabilities are P_{i,i-1}=1-p and P_{i,i+1}=p. In other words, from state i, the process would go to state i+1 with probability p and would go down to i-1 with probability 1-p. More specifically, if m=5, the following is the transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 \cr        0 & 1 & 0 & 0  & 0  & 0 & 0 \cr        1 & 1-p & 0 & p  & 0  & 0 & 0 \cr        2 & 0 & 1-p & 0  & p  & 0 & 0 \cr        3 & 0 & 0 & 1-p  & 0  & p & 0 \cr        4 & 0 & 0 & 0  & 1-p  & 0 & p \cr        5 & 0 & 0 & 0  & 0  & 0 & 1   } \qquad

Assuming that the gambler has initial capital d which is positive but small in comparison to m. For example, d=10 and m=1000. Some interesting questions are:

  • If the staring capital is 10, what is the probability that the gambler’s capital will be 5 after 8 plays of the game?
  • What is the probability of ruin for the gambler, i.e. reaching state 0?
  • What is the mean time until reaching the absorbing states (ruin or fabulously rich)?

Example 3 (Restless Mouse)
A maze has eight areas as shown below.

The rat moves through the areas in the maze at random. That is, if an area has exits to w areas, the rat moves to each of these w areas with probability 1/w. Let X_n be the area the mouse is located in after the nth move. Then \left\{X_n: n \ge 0 \right\} is a Markov chain with the following transition probability matrix.

    \mathbf{P} =       \bordermatrix{ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \cr        1 & 0 & 1/2 & 0  & 1/2  & 0 & 0 & 0 & 0 \cr        2 & 1/2 & 0 & 1/2  & 0  & 0 & 0 & 0 & 0 \cr        3 & 0 & 1/2 & 0  & 0  & 1/2 & 0 & 0 & 0 \cr        4 & 1/3 & 0 & 0  & 0  & 1/3 & 1/3 & 0 & 0 \cr        5 & 0 & 0 & 1/3  & 1/3  & 0 & 0 & 0 & 1/3 \cr        6 & 0 & 0 & 0  & 1/2  & 0 & 0 & 1/2 & 0 \cr        7 & 0 & 0 & 0  & 0  & 0 & 1/2 & 0 & 1/2 \cr        8 & 0 & 0 & 0  & 0  & 1/2 & 0 & 1/2 & 0   } \qquad

Here’s some interesting questions about this Markov chain. Suppose that area 8 contains food and area 6 has a device for an electric shock.

  • Given that the rat is placed in area 1 at the beginning, what is the probability that the rat will reach area 8 in 4 moves, in general in n moves?
  • Given that the rat is placed in area 1 at the beginning, what is the mean number of moves for the rat to reach area 8 or area 6?
  • Given that the rat is placed in area 1 at the beginning, what is the probability that the rat will reach the food area before the electric shock area?

Next Steps

The concept of Markov chains has been introduced and illustrated with examples of random walks (in earlier posts) and other examples in this post. All the properties of Markov chains (in this case time-homogeneous Markov chains) are derived from the transition probability matrix. One of the most urgent tasks is to develop a way to calculation the transition probabilities. Recall that the elements of the transition probability matrix \mathbf{P} are the one-step transition probabilities P_{ij}, which gives the probability of going from state i into state j. The natural next step is to calculate the n-step transition probabilities P^n_{ij}, which is the probability that the process will in state j in the nth period given the process is initially in state i. The next post is a beginning look at transition probabilities. The post that follows the next post is on Chapman-Kolmogorov equations, a systematic way of calculating transition probabilities using matrix multiplication.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

More on random walks

This post continues the discussion of the previous post on random walks. The previous post looks at random walks graphically. This post introduces the notion of random walks more formally.

Consider \left\{X_n: n=0,1,2,\cdots \right\}, which is a sequence of random variables indexed by the non-negative integers. By definition this is a stochastic process, to be precise a discrete stochastic process since the random variables are indexed by integers. The integers in the subscripts can be interpreted as time. Thus the random variables X_n describe the states of some process over time. The stochastic processes discussed in the previous post, the future state X_{n+1} of the process is obtained by adding one (with probability p) or subtracting one (with probability 1-p) to the current state X_n. So a given state of the process X_n is one higher or lower than the state in the previous period. The random walks in the previous post go up or down with equal chance, i.e. p=0.5. The following graph is from the previous post, which shows 5 simulations of such random walks.

Figure 1 – Examples of Random Walks in One Dimension

Random Walks – A General Description

We now give a more general description of the random walk. Instead of a random one-unit up or down move, the moves in the random walk are determined by a predetermined discrete distribution. Let Y_1,Y_2,Y_3,\cdots be integer-valued random variables that are independent and identically distributed. Let P[Y=y] be the common probability function. Let X_0 be an integer-valued random variable that is independent of the random variables Y_n. The random variable X_0 will be the initial position of the random walk. The random variables Y_n are the increments (they are the amounts added to the stochastic process as time increases).

For n \ge 1, define X_n=X_0+Y_1+\cdots+Y_{n-1}+Y_n. Then \left\{X_n: n=0,1,2,\cdots \right\} is called a random walk. Wherever the process is at any given time, the next step in the process is always a walk in a distance and direction that is dictated by the independent random variables Y_n. For convenience, we will call the random variables Y_n the increments in the random walk since they tell the process how far (and what direction) to walk in the next step.

The state space is the set of all the values that can be taken on by the random variables X_n. In this case, the state space is either the set of all integers \mathbb{Z}=\left\{\cdots,-3,-2,-2,0,1,2,3,\cdots \right\} or the set of all non-negative integers \mathbb{N}=\left\{0,1,2,3,\cdots \right\} or some appropriate subset of \mathbb{Z} or \mathbb{N}.

Given the current state of the process X_n, the next state X_{n+1} is obtained by adding the random value Y_{n+1} to the previous state X_n. If the random variables Y_n are independent Bernoulli variables with P(Y_n=1)=p and P(Y_n=-1)=1-p. Then each state X_n is the result of an up or down move by one unit from the previous state. The examples in the previous post are based on this type of Bernoulli one-unit up and down move. In the general definition, the future state X_{n+1}, instead of being adjusted by a uniform up or down adjustment, is adjusted by the random variable Y_{n+1}.

One important task is to establish the probabilities of the transition from one state to another. For example, if the process is currently in state i, what is the probability that it will be in state j in the next period? More specifically, given that the random walk is in state i, i.e. X_n=i, what is the probability that the process in in state j in the next period, i.e. X_{n+1}=j? Such probabilities are called one-step transition probabilities.

The probabilities that we wish to obtain are P[X_{n+1}=j \lvert X_n=i] for n \ge 0. Recall that P[Y=y] is the common probability function for the random variables Y_n. Given states i,j, define P_{ij}=P[Y=j-i].

When X_n=i, in order for X_{n+1}=j to happen, the increment must be Y_{n+1}=j-i. Note that P[Y_{n+1}=j-i]=P[Y=j-i]=P_{ij}. Observe that this probability is independent of the time period n since the Y_n have a common distribution Y. Thus it is appropriate to use P_{ij} to express this probability, i.e. there is no need to include the time n in the subscript of P_{ij}. For all n \ge 0,

    P[X_{n+1}=j \lvert X_n=i]=P[Y_{n+1}=j-i]=P[Y=j-i]=P_{ij}

The number P_{ij} is the probability of the process transitioning from state i to state j in one step, which is evaluated based on the common probability function of the increments Y_n.

Now that we know the probability of the process going one state to another state in one step, other probability questions are: what is the probability of a path and what is the probability of going from state i to state j in n steps? The probabilities in the second question are called n-step transition probabilities. The probability of a path is discussed here. The n-step transition probabilities are discussed here.

The first question has straightforward answers. For example, if the process starts in state i_0 at time 0, moves to state i_1 at time 1 and so on until reaching state i_n at time n, then the following is the probability of this path:

    \displaystyle \begin{aligned} P[X_0=i_0,X_1=i_1,\cdots,X_n=i_n]&=P[X_0=i_0,Y_1=i_1-i_0,\cdots,Y_n=i_n-i_{n-1}] \\&=P[X_0=i_0] \times P[Y_1=i_1-i_0] \times \cdots \times P[Y_n=i_n-i_{n-1}] \\&=P[X_0=i_0] \times P_{i_0,i_1} \times \cdots \times P_{i_{n-1},i_n}  \end{aligned}

    \displaystyle \begin{aligned} P[X_1=i_1,\cdots,X_n=i_n \lvert X_0=i_0]&=\frac{P[X_0=i_0,X_1=i_1,\cdots,X_n=i_n]}{P[X_0=i_0]} \\&=P_{i_0,i_1} \times \cdots \times P_{i_{n-1},i_n}  \end{aligned}

Conditional on the initial state X_0=i_0, the probability of the process going through the path X_1=i_1,\cdots,X_n=i_n is simply the product of the one-step transition probabilities.

If the path is observed to start at a time period other than time 0, conditional on the first state in the path, the probability of the process going through the path would be:

    \displaystyle P[X_{k+1}=i_1,\cdots,X_{k+n}=i_n \lvert X_k=i_0]=P_{i_0,i_1} \times \cdots \times P_{i_{n-1},i_n}

Note that the transition probabilities P_{ij} and the probabilities of paths are independent of the current period n. Random walks with such property are called time-homogeneous or stationary. Thus the probability of transitioning into state j from state i or the probability of transitioning through a path is identical regardless of where the process is in the time scale (at the beginning in the process or later in the process).

Specific Examples

For special cases of the random walks discussed above, let the increments Y_n be defined by a Bernoulli random variable with P[Y=1]=p and P[Y=-1]=1-p. Then the resulting random walk is a series of up and down moves as shown in Figure 1 above. The one-step transition probabilities are:

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  p &\ \ \ \ \ \ j=i+1 \\            \text{ } & \text{ } \\          \displaystyle  1-p &\ \ \ \ \ \ j=i-1 \\           \text{ } & \text{ } \\           0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

If the increments Y_n are defined by the distribution where P[Y=1]=p, P[Y=-1]=q and P[Y=0]=1-p-q, then the following gives the transition probabilities:

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  p &\ \ \ \ \ \ j=i+1 \\            \text{ } & \text{ } \\          \displaystyle  q &\ \ \ \ \ \ j=i-1 \\           \text{ } & \text{ } \\          \displaystyle  1-p-q &\ \ \ \ \ \ j=i \\           \text{ } & \text{ } \\           0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

This random walk has three moves at any given state, an up move by one unit, a down move by one unit and a possibility that the process stays at the same state. Of course, the possibility for modeling is limitless. We can use other distributions for the common distribution of the increments Y_n, e.g. binomial distribution, Poisson, preferably having these distributions shifted in order to account for up moves and down moves.

Finite-State Random Walks

The random walks discussed above potentially have an infinite state space, i.e. the process can potentially reach far from the initial state without bounds. Depending on the common distribution of the increments Y_1,\cdots,Y_n,\cdots, the process can go up indefinitely to +\infty or go down to -\infty. The random walks in Figure 1 have increments 1 and -1 and can reach states that are arbitrarily high or low, though it would take a great many steps to reach such high levels.

Any random walk with unbounded state space can be modified to have a finite state space. Pick two states that serve as boundaries, one lower and one upper. The states in between the boundary states are called interior states. Whenever the process reaches the upper boundary or the lower boundary, the process either stays there and not move any further or transition back into an interior state. The transition probabilities in the interior states remain the same as discussed above. If a boundary state is such that the process always stays there after reaching it, it is called an absorbing state.

A handy example of a finite-state random walk is the gambler’s ruin. Suppose that a gambler starts out with d units in capital (in dollars or other monetary units) and makes a series of one-unit bets against the house. For the gambler, the probabilities of winning and losing each bet are p and 1-p, respectively. Whenever the capital reaches zero, the gambler is in ruin and his capital remains zero thereafter. On the other hand, if the capital of the gambler increases to m where d<m, then the gambler quits playing. Let X_n be the capital of the gambler after the nth bet. Then \left\{X_n: n=0,1,2,3,\cdots \right\} is a random walk. The starting state is X_0=d. The states 0 and m are absorbing states. The following gives the transition probabilities.

    \displaystyle  P_{ij} = \left\{ \begin{array}{ll}           \displaystyle  p &\ \ \ \ \ \ j=i+1,i \ne 0, i \ne m \\            \text{ } & \text{ } \\          \displaystyle  1-p &\ \ \ \ \ \ j=i-1,i \ne 0, i \ne m \\           \text{ } & \text{ } \\           \displaystyle  1 &\ \ \ \ \ \ i=0,j=0 \\           \text{ } & \text{ } \\           \displaystyle  1 &\ \ \ \ \ \ i=m,j=m \\           \text{ } & \text{ } \\           0 &\ \ \ \ \ \ \text{otherwise}           \end{array} \right.

Whenever the gambler reaches the absorbing state of 0, the gambler is in ruin. If the process reaches the absorbing state of m, the gambler strikes gold. If the initial capital of the gambler is small relative to the casino, there is a virtually 100% chance that the gambler will be in ruin even if each bet has even odds (see here for the calculation). One interesting questions: on average how long does it take for the gambler to be in ruin? Such questions will be answered after necessary tools are built in subsequent posts.

Remarks

A random walk is a stochastic process \left\{X_n: n=0,1,2,\cdots \right\} such that at each time point, the process walks a certain distance that is dictated by an independent and identically distributed sequence of random variables. In such a stochastic process, the future state depends only on the preceding state and not on the past states before the preceding state. For example, in the gambler’s ruin, the gambler’s capital at any given point depends only on the his capital in the previous play.

In a stochastic process \left\{X_n: n=0,1,2,\cdots \right\}, the property that the future state X_{n+1} depends only on the preceding state X_n and not on the past states X_0,X_1,\cdots,X_{n-1} is called the Markov property. Any stochastic process with the Markov property is called a Markov chain. A random walk is a Markov chain. There are Markov chains that are not random walk. The notion of Markov chains is introduced in this post.

\copyright 2017 – Dan Ma