Markov chain - Transition Structure and Formal Definitions
Understand transition matrices and probabilities, stationary and absorbing state concepts, and key state classifications such as irreducibility, periodicity, and recurrence.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What does the element in row $i$ and column $j$ of a transition matrix represent?
1 of 13
Summary
Markov Chains: Transitions and Key Properties
Introduction
Markov chains are a fundamental tool for modeling systems that evolve randomly over time. The key insight is that the future behavior of the system depends only on its current state, not on how it arrived there. This "memoryless" property makes Markov chains mathematically tractable while still capturing realistic dynamics in many real-world applications. In this section, we'll build up the formal framework and examine the properties that determine how Markov chains behave in the long run.
Transition Probability Matrix
The transition probability matrix is the core mathematical object for describing a Markov chain. For a chain with states $\{1, 2, \ldots, n\}$, we define the transition matrix $P$ as:
$$P = \begin{pmatrix} P{11} & P{12} & \cdots & P{1n} \\ P{21} & P{22} & \cdots & P{2n} \\ \vdots & \vdots & \ddots & \vdots \\ P{n1} & P{n2} & \cdots & P{nn} \end{pmatrix}$$
The entry $P{ij}$ in row $i$ and column $j$ represents the probability of transitioning from state $i$ to state $j$ in exactly one time step. Since these are probabilities, each row must sum to 1: $\sum{j} P{ij} = 1$ for all $i$.
Example: Consider a simple two-state system with states E and A:
The transition matrix would be:
$$P = \begin{pmatrix} 0.3 & 0.7 \\ 0.4 & 0.6 \end{pmatrix}$$
Here, from state E: you stay in E with probability 0.3 or move to A with probability 0.7. From state A: you move to E with probability 0.4 or stay in A with probability 0.6.
Initial Distribution
The initial distribution is a probability vector that specifies where the chain starts at time 0. If we denote the initial distribution as a row vector $\mu^{(0)}$, then $\mu^{(0)}i$ represents the probability that the chain begins in state $i$.
For instance, you might start with probability 1 in state E and probability 0 in state A, written as $\mu^{(0)} = (1, 0)$. Alternatively, you might start in each state with equal probability: $\mu^{(0)} = (0.5, 0.5)$.
The initial distribution, combined with the transition matrix, allows us to compute the probability distribution at any later time. After one step:
$$\mu^{(1)} = \mu^{(0)} P$$
After $k$ steps:
$$\mu^{(k)} = \mu^{(0)} P^k$$
where $P^k$ is the transition matrix multiplied by itself $k$ times.
The Markov Property and Formal Definition
The defining characteristic of a Markov chain is the Markov property: the future depends only on the present, not the past. Formally, a sequence of random variables $X0, X1, X2, \ldots$ is a discrete-time Markov chain if for all $n$ and all states $i, j$:
$$P(X{n+1} = j \mid Xn = i, X{n-1} = i{n-1}, \ldots, X0 = i0) = P(X{n+1} = j \mid Xn = i)$$
This equation says: given that we know the current state $Xn = i$, knowledge of all previous states provides no additional information about where we'll go next. Only the current state matters.
This is why these chains are called "memoryless"—they don't need to remember their history. This property is what makes them computationally manageable and leads to the elegant theory we explore below.
Time-Homogeneous Chains
A chain is time-homogeneous (or stationary) if the transition probabilities don't change over time. In other words, the probability of going from state $i$ to state $j$ is the same whether we're at time 0, time 1, time 100, or any other time.
This is a powerful assumption because it allows us to use the same transition matrix $P$ at every step. As a consequence, the $k$-step transition probabilities—the probability of moving from state $i$ to state $j$ in exactly $k$ steps—are given by the $(i,j)$ entry of $P^k$:
$$P{ij}^{(k)} = (P^k){ij}$$
Unless otherwise stated, we assume all Markov chains are time-homogeneous.
Communicating States and Irreducibility
Understanding the long-run behavior of a Markov chain requires knowing which states can reach which other states. We say two states communicate if each can be reached from the other with positive probability (in some number of steps). The communication relation partitions the state space into communicating classes—groups of states that can all reach each other.
A chain is irreducible if there is only one communicating class, meaning every state can reach every other state (not necessarily in one step, but eventually).
Irreducibility is important because in an irreducible chain, the long-run behavior doesn't depend on where you start—you'll eventually visit all states. In contrast, if a chain has multiple communicating classes, you'll be trapped in whichever class you start in.
Periodicity
The period of a state is the greatest common divisor (GCD) of all possible return times to that state. More precisely, if $f{ii}^{(n)}$ denotes the probability of returning to state $i$ in exactly $n$ steps, then the period $di$ is:
$$di = \gcd\{n : f{ii}^{(n)} > 0\}$$
A state with period 1 is called aperiodic. Intuitively, an aperiodic state is one where you can return to it in both odd and even numbers of steps (not locked into a fixed rhythm).
An important fact: if an irreducible chain has even one aperiodic state, then all states are aperiodic. This is because in an irreducible chain, you can reach any state from any other state, so they all "behave the same way."
Why does periodicity matter? A periodic chain may never settle into a steady distribution—it might oscillate. An aperiodic chain is more likely to converge to a long-run distribution.
Recurrence and Transience
A state is recurrent if, starting from that state, the chain returns to it with probability 1. Conversely, a state is transient if there's a positive probability of never returning.
For recurrent states, we can further classify them:
Positive recurrent: The expected return time is finite. In practical terms, you'll return to the state, on average, in a finite number of steps.
Null recurrent: The chain returns with probability 1, but the expected return time is infinite. This is rare in applications.
A key result: in an irreducible chain, either all states are transient, or all states are recurrent. You can't have some states that are transient and others that are recurrent in an irreducible chain.
The intuition: in a transient state, you eventually leave and never come back. So if one state is transient in an irreducible chain, you'd eventually leave the entire state space—but the chain has to stay somewhere. So all states must be recurrent.
Stationary Distribution
A stationary distribution (also called a steady-state distribution) is a probability distribution $\pi = (\pi1, \pi2, \ldots, \pin)$ that satisfies:
$$\pi P = \pi$$
This equation means: if the chain is distributed according to $\pi$ at time $n$, it will still be distributed according to $\pi$ at time $n+1$. The distribution doesn't change.
Mathematically, $\pi$ is a left eigenvector of $P$ with eigenvalue 1. (We use "left eigenvector" because $\pi$ multiplies $P$ on the left, rather than $P$ multiplying a column vector on the right.)
Why does this matter? If a Markov chain reaches its stationary distribution, it will remain there. More importantly, for many well-behaved chains (irreducible, aperiodic, positive recurrent), the long-run proportion of time spent in each state converges to the stationary distribution, regardless of where the chain started.
To find $\pi$, solve the system:
$\pi P = \pi$ (equivalently, $\pi(P - I) = 0$)
$\sumi \pii = 1$ (probabilities sum to 1)
Example: For the 2-state chain with $P = \begin{pmatrix} 0.3 & 0.7 \\ 0.4 & 0.6 \end{pmatrix}$, we'd solve for $\pi = (\pi1, \pi2)$ such that:
$$(\pi1, \pi2) \begin{pmatrix} 0.3 & 0.7 \\ 0.4 & 0.6 \end{pmatrix} = (\pi1, \pi2)$$
This gives us $0.3\pi1 + 0.4\pi2 = \pi1$ and $0.7\pi1 + 0.6\pi2 = \pi2$, along with $\pi1 + \pi2 = 1$.
Absorbing States
A state $i$ is absorbing if $P{ii} = 1$ and $P{ij} = 0$ for all $j \neq i$. In other words, once you enter an absorbing state, you can never leave—you stay there forever.
Absorbing states are important in applications like gambler's ruin (where "ruin" is an absorbing state) or epidemic models (where "recovered and immune" might be absorbing). In an irreducible chain, there can be no absorbing states, because an absorbing state would form its own communicating class.
When a chain has absorbing states, an important question is: what is the probability of eventually being absorbed into each absorbing state, starting from a given initial state? This can be computed by setting up a system of linear equations based on the structure of the transition matrix.
Flashcards
What does the element in row $i$ and column $j$ of a transition matrix represent?
The probability of moving from state $i$ to state $j$ in one time step.
What is the name for the probability distribution over states at time zero?
Initial distribution.
In a time-homogeneous chain, how are the $k$-step transition probabilities calculated from the transition matrix $P$?
By taking the $k$-th power of the transition matrix ($P^k$).
Formally, what condition must the sequence of random variables $X1, X2, \dots$ satisfy to have the Markov property?
$P(X{n+1}=j \mid Xn=i, X{n-1}, \dots, X1) = P(X{n+1}=j \mid Xn=i)$ for all states $i, j$.
What matrix equation defines a row vector $\pi$ as a stationary distribution for a transition matrix $P$?
$\pi P = \pi$.
In terms of linear algebra, what is a stationary distribution relative to the transition matrix?
A left eigenvector associated with eigenvalue $1$.
When are two states in a Markov chain said to communicate?
When each state can be reached from the other with positive probability.
When is a Markov chain considered irreducible?
When there is only one communicating class (every state can reach every other state).
How is the period of a state in a Markov chain defined?
The greatest common divisor (GCD) of the lengths of all possible return paths to that state.
What is a state called if its period is $1$?
Aperiodic.
What is the difference between a recurrent state and a transient state?
A recurrent state returns with probability $1$; a transient state has a probability less than $1$ of returning.
What are the two types of recurrent states based on expected return time?
Positive recurrent (finite expected return time)
Null recurrent (infinite expected return time)
What defines an absorbing state in a Markov chain?
A state where the probability of leaving is zero.
Quiz
Markov chain - Transition Structure and Formal Definitions Quiz Question 1: In a transition probability matrix, what does the entry in row i and column j represent?
- The probability of moving from state i to state j in one time step (correct)
- The long‑run proportion of time spent in state j
- The probability of being in state i after j steps
- The expected number of transitions from state i to state j
Markov chain - Transition Structure and Formal Definitions Quiz Question 2: What term describes the probability distribution over the states of a Markov chain at time 0?
- Initial distribution (correct)
- Transition matrix
- Stationary distribution
- Absorbing distribution
Markov chain - Transition Structure and Formal Definitions Quiz Question 3: For a row vector π to be a stationary distribution of a transition matrix P, which equation must hold?
- πP = π (correct)
- πP = 1
- πP = 0
- πP = Pπ
In a transition probability matrix, what does the entry in row i and column j represent?
1 of 3
Key Concepts
Markov Chain Fundamentals
Transition probability matrix
Initial distribution
Time‑homogeneous Markov chain
Discrete‑time Markov chain
Markov property
State Properties
Stationary distribution
Irreducibility (communicating classes)
Periodicity
Recurrence
Absorbing state
Definitions
Transition probability matrix
A matrix whose entry in row i, column j gives the probability of moving from state i to state j in one time step.
Initial distribution
The probability distribution over the states of a Markov chain at time zero.
Time‑homogeneous Markov chain
A Markov chain whose transition matrix does not change over time, so k‑step transitions are given by the k‑th power of the matrix.
Discrete‑time Markov chain
A stochastic process with a countable state space that satisfies the Markov property at discrete time steps.
Stationary distribution
A probability vector π that satisfies πP = π, making it an invariant left eigenvector of the transition matrix with eigenvalue 1.
Irreducibility (communicating classes)
A property where every state can be reached from every other state, meaning the state space consists of a single communicating class.
Periodicity
The greatest common divisor of the lengths of all possible return paths to a state; a state with period 1 is aperiodic.
Recurrence
The property of a state being visited infinitely often with probability 1; positive recurrence means finite expected return time, while null recurrence means infinite expected return time.
Absorbing state
A state that, once entered, cannot be left because the probability of transitioning to any other state is zero.
Markov property
The memoryless condition that the future state depends only on the present state and not on the sequence of events that preceded it.