RemNote Community
Community

Introduction to Markov Chains

Understand the definition and core concepts of Markov chains, how transition matrices and stationary distributions work, and their common applications.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What are the two primary components required to define a Markov chain?
1 of 10

Summary

Introduction to Markov Chains Markov chains are fundamental mathematical models for studying random processes that evolve over time. They are widely used across disciplines—from predicting weather patterns to modeling financial markets—because they capture the essential idea that the future behavior of a system depends only on its current state, not on its entire history. This property makes Markov chains both tractable to analyze and powerful enough to model real-world phenomena. Definition and Core Idea A Markov chain is a random process that evolves step-by-step through time, transitioning between different states according to specific probabilistic rules. The key insight is that the process is memory-less: what happens next depends only on where you are now, not on how you got there. Think of a gambler moving between different betting positions, or weather changing from day to day. At each time step, the system is in some state, and at the next time step, it transitions to another state (or possibly stays in the same state) according to fixed probabilities. The Markov Property The Markov property is the defining characteristic of a Markov chain. It states that: $$P(\text{future state} \mid \text{current state, past history}) = P(\text{future state} \mid \text{current state})$$ In other words, given the present state, the future is independent of the past. The chain has no memory—it doesn't matter how the system arrived at its current state; only the current state matters for predicting what comes next. This property is what makes Markov chains tractable. Without it, you would need to track the entire history of the system, making analysis intractable. With the Markov property, you only need to know the current state. Components of a Markov Chain Every Markov chain requires two essential components: States: The system must have a well-defined set of states that it can occupy. This set may be finite (like "sunny," "cloudy," "rainy") or countably infinite (like the number of customers in a queue). We typically denote the set of states as $\{1, 2, 3, \ldots, n\}$ for a chain with $n$ states. Transition Probabilities: For each pair of states $i$ and $j$, there is a fixed probability $p{ij}$ that represents the likelihood of transitioning from state $i$ to state $j$ in one time step. These probabilities are always non-negative and reflect the fundamental randomness of the process. The Transition Matrix The transition probabilities are organized into a transition matrix $P$, where the entry in row $i$ and column $j$ is: $$p{ij} = P(\text{next state is } j \mid \text{current state is } i)$$ A crucial property of the transition matrix is that each row sums to 1. This makes sense: from any state $i$, the chain must transition to some state at the next step, so the probabilities over all possible next states must sum to 1: $$\sum{j=1}^{n} p{ij} = 1 \text{ for each row } i$$ Let's see a concrete example. Consider a simple two-state Markov chain with states A and E (perhaps representing "Excellent" and "Average" conditions). The transition matrix might look like: $$P = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix}$$ This means: from state E, there's a 0.7 probability of staying in E and 0.3 probability of moving to A. From state A, there's a 0.4 probability of moving to E and 0.6 probability of staying in A. Example: Generating Weather Sequences To generate a sequence of states over time, you start with an initial state distribution (which state you begin in) and repeatedly apply the transition matrix. If you begin in state E with certainty, your initial distribution is: $$\mu^{(0)} = \begin{pmatrix} 1 \\ 0 \end{pmatrix}$$ After one step, your distribution becomes: $$\mu^{(1)} = \mu^{(0)} P = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.7 & 0.3 \end{pmatrix}$$ This means after one step, you have a 70% chance of being in E and 30% chance of being in A. After two steps: $$\mu^{(2)} = \mu^{(1)} P = \begin{pmatrix} 0.7 & 0.3 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.63 & 0.37 \end{pmatrix}$$ By repeatedly multiplying by $P$, you can evolve the distribution forward through time. n-Step Transition Probabilities While $p{ij}$ tells you the one-step transition probability, you often want to know the probability of reaching state $j$ from state $i$ in exactly $n$ steps. These are called $n$-step transition probabilities, and they're computed by raising the transition matrix to the $n$-th power: $$P^{(n)} = P^n$$ The entry $p{ij}^{(n)}$ (row $i$, column $j$ of $P^n$) gives the probability of being in state $j$ after $n$ steps, starting from state $i$. For example, with our transition matrix above, $P^2$ gives all two-step transition probabilities: $$P^2 = \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 0.61 & 0.39 \\ 0.52 & 0.48 \end{pmatrix}$$ So the probability of starting in E and being in A after exactly two steps is 0.39. Stationary Distribution Over many steps, many Markov chains converge to a stationary distribution (also called the steady-state distribution). A distribution $\pi = \begin{pmatrix} \pi1 & \pi2 & \cdots & \pin \end{pmatrix}$ is stationary if it satisfies: $$\pi P = \pi$$ This equation says that if the system starts in the stationary distribution, it will remain in that distribution after taking one step (and after any number of steps). The stationary distribution represents the long-run proportion of time the chain spends in each state. To find the stationary distribution, you solve $\pi P = \pi$ along with the constraint that $\sumi \pii = 1$ (the probabilities must sum to 1). For our example transition matrix, we can verify that $\pi = \begin{pmatrix} 4/7 & 3/7 \end{pmatrix}$ is stationary: $$\begin{pmatrix} 4/7 & 3/7 \end{pmatrix} \begin{pmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{pmatrix} = \begin{pmatrix} 4/7 & 3/7 \end{pmatrix}$$ Ergodic Chains: Irreducible and Aperiodic Not all Markov chains converge to a stationary distribution. The chains that do are called ergodic chains. For a chain to be ergodic, it must satisfy two conditions: Irreducibility: The chain is irreducible if every state can eventually be reached from every other state. In other words, there's a nonzero probability of traveling from any state to any other state (possibly in multiple steps). If a state is unreachable from some other state, the chain is not irreducible. Aperiodicity: The chain is aperiodic if it doesn't get locked into cycles. More precisely, the greatest common divisor of the lengths of all possible paths from a state back to itself must be 1. For example, if a chain can only return to a state in an even number of steps, it would be periodic (specifically, with period 2). A simple way to ensure aperiodicity is to have at least one state with a nonzero self-loop (a positive probability of staying in the same state). Most "natural" Markov chains are aperiodic. Convergence to Stationary Distribution Here's a powerful and useful result: If a Markov chain is ergodic, then regardless of the initial state, the distribution of the chain converges to the stationary distribution $\pi$ as the number of steps goes to infinity. This means: The long-run behavior of an ergodic chain is independent of where it started. Eventually, the chain "forgets" its initial state. The probability of being in each state stabilizes at the values given by $\pi$. This is why the stationary distribution is practically important: even if you don't know the exact initial state, you can predict the long-run behavior by computing $\pi$. Absorbing States Not all Markov chains have stationary distributions that describe long-run behavior. Some chains have absorbing states: states that, once entered, can never be left. Formally, state $i$ is absorbing if $p{ii} = 1$ (the probability of staying in state $i$ is 1). In the transition matrix, the row corresponding to an absorbing state contains a 1 on the diagonal and 0s everywhere else. For example, in a gambler's ruin model, bankruptcy is an absorbing state: once you run out of money, you can't continue gambling. Another example is death in population models: death is an absorbing state. For chains with absorbing states, the question of interest is often: "What is the probability of eventually being absorbed into each absorbing state?" rather than "What is the stationary distribution?" These chains don't converge to a steady-state distribution; instead, they eventually reach an absorbing state and remain there forever. <extrainfo> Applications of Markov Chains Markov chains are used across many fields: Queueing Theory: Markov chains model customers moving through service stations, helping businesses optimize staffing and service times. Genetics: Markov chains describe how allele frequencies change over generations in a population, important for understanding evolution and inheritance patterns. Economics: Stock price movements and economic states can be modeled with Markov chains, aiding in risk assessment and forecasting. Computer Science: Markov chains underpin random walks on networks and are fundamental to PageRank, the algorithm Google uses to rank web pages. They also appear in machine learning and information retrieval. </extrainfo>
Flashcards
What are the two primary components required to define a Markov chain?
A set of states and transition probabilities.
What does the Markov property state regarding the future behavior of a process?
It depends only on the present state, not on the past path.
In a transition matrix $P$, what does the entry $p{ij}$ represent?
The probability of moving from state $i$ to state $j$ in one step.
What must each row of a transition matrix $P$ sum to?
1
How are $n$‑step transition probabilities $P^{(n)}$ calculated using the one-step transition matrix $P$?
$P^{(n)} = P^{n}$
What equation defines a stationary distribution $\pi$ for a Markov chain with transition matrix $P$?
$\pi P = \pi$
What two conditions must a Markov chain meet to be considered ergodic?
It must be irreducible and aperiodic.
What happens to the distribution of states in an ergodic chain over time?
It converges to the stationary distribution $\pi$ regardless of the initial state.
What is the defining characteristic of an absorbing state in a Markov chain?
Once entered, the state cannot be left.
How is an absorbing state represented in the transition matrix $P$?
The row contains a 1 on the diagonal and 0 elsewhere.

Quiz

What best describes a Markov chain?
1 of 10
Key Concepts
Markov Chain Fundamentals
Markov chain
Markov property
Transition matrix
n‑step transition probability
Stationary distribution
Ergodic chain
Applications of Markov Chains
Absorbing state
Random walk
PageRank algorithm
Queueing theory
Genetic drift (Markov model)
Markov model in finance