Subjects/Math/Statistics and Discrete Math/Probability/Markov chain

Introduction to Markov Chains

Understand the definition and core concepts of Markov chains, how transition matrices and stationary distributions work, and their common applications.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What are the two primary components required to define a Markov chain?

1 of 10

Summary

Introduction to Markov Chains Markov chains are fundamental mathematical models for studying random processes that evolve over time. They are widely used across disciplines—from predicting weather patterns to modeling financial markets—because they capture the essential idea that the future behavior of a system depends only on its current state, not on its entire history. This property makes Markov chains both tractable to analyze and powerful enough to model real-world phenomena. Definition and Core Idea A Markov chain is a random process that evolves step-by-step through time, transitioning between different states according to specific probabilistic rules. The key insight is that the process is memory-less: what happens next depends only on where you are now, not on how you got there. Think of a gambler moving between different betting positions, or weather changing from day to day. At each time step, the system is in some state, and at the next time step, it transitions to another state (or possibly stays in the same state) according to fixed probabilities. The Markov Property The Markov property is the defining characteristic of a Markov chain. It states that: $$P(\text{future state} \mid \text{current state, past history}) = P(\text{future state} \mid \text{current state})$$ In other words, given the present state, the future is independent of the past. The chain has no memory—it doesn't matter how the system arrived at its current state; only the current state matters for predicting what comes next. This property is what makes Markov chains tractable. Without it, you would need to track the entire history of the system, making analysis intractable. With the Markov property, you only need to know the current state. Components of a Markov Chain Every Markov chain requires two essential components: States: The system must have a well-defined set of states that it can occupy. This set may be finite (like "sunny," "cloudy," "rainy") or countably infinite (like the number of customers in a queue). We typically denote the set of states as $\{1, 2, 3, \ldots, n\}$ for a chain with $n$ states. Transition Probabilities: For each pair of states $i$ and $j$, there is a fixed probability $p{ij}$ that represents the likelihood of transitioning from state $i$ to state $j$ in one time step. These probabilities are always non-negative and reflect the fundamental randomness of the process. The Transition Matrix The transition probabilities are organized into a transition matrix $P$, where the entry in row $i$ and column $j$ is: $$p{ij} = P(\text{next state is } j \mid \text{current state is } i)$$ A crucial property of the transition matrix is that each row sums to 1. This makes sense: from any state $i$, the chain must transition to some state at the next step, so the probabilities over all possible next states must sum to 1: $$\sum{j=1}^{n} p{ij} = 1 \text{ for each row } i$$ Let's see a concrete example. Consider a simple two-state Markov chain with states A and E (perhaps representing "Excellent" and "Average" conditions). The transition matrix might look like: $$P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}$$ This means: from state E, there's a 0.7 probability of staying in E and 0.3 probability of moving to A. From state A, there's a 0.4 probability of moving to E and 0.6 probability of staying in A. Example: Generating Weather Sequences To generate a sequence of states over time, you start with an initial state distribution (which state you begin in) and repeatedly apply the transition matrix. If you begin in state E with certainty, your initial distribution is: $$\mu^{(0)} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$$ After one step, your distribution becomes: $$\mu^{(1)} = \mu^{(0)} P = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.7 & 0.3 \end{pmatrix}$$ This means after one step, you have a 70% chance of being in E and 30% chance of being in A. After two steps: $$\mu^{(2)} = \mu^{(1)} P = \begin{pmatrix} 0.7 & 0.3 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.63 & 0.37 \end{pmatrix}$$ By repeatedly multiplying by $P$, you can evolve the distribution forward through time. n-Step Transition Probabilities While $p{ij}$ tells you the one-step transition probability, you often want to know the probability of reaching state $j$ from state $i$ in exactly $n$ steps. These are called $n$-step transition probabilities, and they're computed by raising the transition matrix to the $n$-th power: $$P^{(n)} = P^n$$ The entry $p{ij}^{(n)}$ (row $i$, column $j$ of $P^n$) gives the probability of being in state $j$ after $n$ steps, starting from state $i$. For example, with our transition matrix above, $P^2$ gives all two-step transition probabilities: $$P^2 = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.61 & 0.39 \\ 0.52 & 0.48 \end{pmatrix}$$ So the probability of starting in E and being in A after exactly two steps is 0.39. Stationary Distribution Over many steps, many Markov chains converge to a stationary distribution (also called the steady-state distribution). A distribution $\pi = \begin{pmatrix} \pi1 & \pi2 & \cdots & \pin \end{pmatrix}$ is stationary if it satisfies: $$\pi P = \pi$$ This equation says that if the system starts in the stationary distribution, it will remain in that distribution after taking one step (and after any number of steps). The stationary distribution represents the long-run proportion of time the chain spends in each state. To find the stationary distribution, you solve $\pi P = \pi$ along with the constraint that $\sumi \pii = 1$ (the probabilities must sum to 1). For our example transition matrix, we can verify that $\pi = \begin{pmatrix} 4/7 & 3/7 \end{pmatrix}$ is stationary: $$\begin{pmatrix} 4/7 & 3/7 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 4/7 & 3/7 \end{pmatrix}$$ Ergodic Chains: Irreducible and Aperiodic Not all Markov chains converge to a stationary distribution. The chains that do are called ergodic chains. For a chain to be ergodic, it must satisfy two conditions: Irreducibility: The chain is irreducible if every state can eventually be reached from every other state. In other words, there's a nonzero probability of traveling from any state to any other state (possibly in multiple steps). If a state is unreachable from some other state, the chain is not irreducible. Aperiodicity: The chain is aperiodic if it doesn't get locked into cycles. More precisely, the greatest common divisor of the lengths of all possible paths from a state back to itself must be 1. For example, if a chain can only return to a state in an even number of steps, it would be periodic (specifically, with period 2). A simple way to ensure aperiodicity is to have at least one state with a nonzero self-loop (a positive probability of staying in the same state). Most "natural" Markov chains are aperiodic. Convergence to Stationary Distribution Here's a powerful and useful result: If a Markov chain is ergodic, then regardless of the initial state, the distribution of the chain converges to the stationary distribution $\pi$ as the number of steps goes to infinity. This means: The long-run behavior of an ergodic chain is independent of where it started. Eventually, the chain "forgets" its initial state. The probability of being in each state stabilizes at the values given by $\pi$. This is why the stationary distribution is practically important: even if you don't know the exact initial state, you can predict the long-run behavior by computing $\pi$. Absorbing States Not all Markov chains have stationary distributions that describe long-run behavior. Some chains have absorbing states: states that, once entered, can never be left. Formally, state $i$ is absorbing if $p{ii} = 1$ (the probability of staying in state $i$ is 1). In the transition matrix, the row corresponding to an absorbing state contains a 1 on the diagonal and 0s everywhere else. For example, in a gambler's ruin model, bankruptcy is an absorbing state: once you run out of money, you can't continue gambling. Another example is death in population models: death is an absorbing state. For chains with absorbing states, the question of interest is often: "What is the probability of eventually being absorbed into each absorbing state?" rather than "What is the stationary distribution?" These chains don't converge to a steady-state distribution; instead, they eventually reach an absorbing state and remain there forever. <extrainfo> Applications of Markov Chains Markov chains are used across many fields: Queueing Theory: Markov chains model customers moving through service stations, helping businesses optimize staffing and service times. Genetics: Markov chains describe how allele frequencies change over generations in a population, important for understanding evolution and inheritance patterns. Economics: Stock price movements and economic states can be modeled with Markov chains, aiding in risk assessment and forecasting. Computer Science: Markov chains underpin random walks on networks and are fundamental to PageRank, the algorithm Google uses to rank web pages. They also appear in machine learning and information retrieval. </extrainfo>

Flashcards

What are the two primary components required to define a Markov chain?

A set of states and transition probabilities.

What does the Markov property state regarding the future behavior of a process?

It depends only on the present state, not on the past path.

In a transition matrix $P$, what does the entry $p{ij}$ represent?

The probability of moving from state $i$ to state $j$ in one step.

What must each row of a transition matrix $P$ sum to?

How are $n$‑step transition probabilities $P^{(n)}$ calculated using the one-step transition matrix $P$?

$P^{(n)} = P^{n}$

What equation defines a stationary distribution $\pi$ for a Markov chain with transition matrix $P$?

$\pi P = \pi$

What two conditions must a Markov chain meet to be considered ergodic?

It must be irreducible and aperiodic.

What happens to the distribution of states in an ergodic chain over time?

It converges to the stationary distribution $\pi$ regardless of the initial state.

What is the defining characteristic of an absorbing state in a Markov chain?

Once entered, the state cannot be left.

How is an absorbing state represented in the transition matrix $P$?

The row contains a 1 on the diagonal and 0 elsewhere.

Quiz

Introduction to Markov Chains Quiz Question 1: What best describes a Markov chain?

A random process that evolves step‑by‑step over time (correct)
A deterministic sequence of states
A process where the future depends on all past states
A static set of states without transitions

Introduction to Markov Chains Quiz Question 2: In economics, Markov chains are frequently used to model which of the following?

Stock price movements (correct)
Consumer satisfaction surveys
Fixed interest rates over time
Population growth without randomness

Introduction to Markov Chains Quiz Question 3: What does the Markov property assert about the dependence of future states?

The future depends only on the present state (correct)
The future depends on the entire past sequence
The future depends on the initial state only
The future is independent of all states

Introduction to Markov Chains Quiz Question 4: Why does each row of a transition matrix sum to 1?

Because the process must transition to some state at each step (correct)
Because probabilities are expressed as percentages
Because the matrix is symmetric
Because it represents a probability density function

Introduction to Markov Chains Quiz Question 5: How are $n$‑step transition probabilities expressed in terms of the transition matrix $P$?

As the $n$‑th power $P^{n}$ (correct)
As the $n$‑th root $P^{1/n}$
As the matrix exponential $e^{nP}$
As the transpose $P^{\mathsf{T}}$ repeated $n$ times

Introduction to Markov Chains Quiz Question 6: What two properties must a Markov chain have to be considered ergodic?

Irreducibility and aperiodicity (correct)
Absorbing states and periodicity
Finite state space and symmetry
Deterministic transitions and reversibility

Introduction to Markov Chains Quiz Question 7: What happens to the state distribution of an ergodic chain as the number of steps grows?

It converges to the stationary distribution $\pi$ (correct)
It cycles with a fixed period
It diverges to infinity
It becomes uniformly random over all states

Introduction to Markov Chains Quiz Question 8: How is an absorbing state identified in the transition matrix?

Its row has a 1 on the diagonal and 0 elsewhere (correct)
Its column sums to 0
All entries in its row are equal
Its diagonal entry is 0 and off‑diagonal entries are positive

Introduction to Markov Chains Quiz Question 9: Which computer‑science algorithm relies on Markov chains and random walks on graphs?

PageRank (correct)
QuickSort
Dijkstra's shortest‑path
Binary search

Introduction to Markov Chains Quiz Question 10: What is the definition of an absorbing state in a Markov chain?

A state that, once entered, cannot be left (correct)
A state that is visited infinitely often
A state with probability zero of being entered
A state that transitions equally to all other states

What best describes a Markov chain?

1 of 10

Key Concepts

Markov Chain Fundamentals

Markov chain

Markov property

Transition matrix

n‑step transition probability

Stationary distribution

Ergodic chain

Applications of Markov Chains

Absorbing state

Random walk

PageRank algorithm

Queueing theory

Genetic drift (Markov model)

Markov model in finance

Definitions

Markov chain

A stochastic process that moves between states step‑by‑step with probabilities depending only on the current state.

Markov property

The principle that the future evolution of a process is independent of its past given its present state.

Transition matrix

A square matrix whose entries give the one‑step probabilities of moving from each state to every other state.

n‑step transition probability

The probability of transitioning between states after n steps, obtained by raising the transition matrix to the n‑th power.

Stationary distribution

A probability vector π that remains unchanged after multiplication by the transition matrix (πP = π).

Ergodic chain

A Markov chain that is both irreducible (all states communicate) and aperiodic, guaranteeing convergence to a stationary distribution.

Absorbing state

A state that, once entered, cannot be left; its transition‑matrix row has a 1 on the diagonal and 0 elsewhere.

Random walk

A type of Markov chain where each step moves to a neighboring state according to fixed probabilities, often used on graphs.

PageRank algorithm

A web‑ranking method that models the random surfer as a Markov chain on the hyperlink graph of the Internet.

Queueing theory

The study of waiting lines that uses Markov chains to model arrivals, services, and system states.

Genetic drift (Markov model)

A population‑genetics model where allele frequencies evolve as a Markov chain over generations.

Markov model in finance

An application of Markov chains to model stochastic movements of asset prices and market regimes.