Foundations of the Central Limit Theorem
Understand the core statement, assumptions, and extensions of the Central Limit Theorem—from the classical i.i.d. case to Lyapunov, Lindeberg–Feller, and multivariate formulations.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What happens to the distribution of a normalized sample mean as the sample size grows according to the central limit theorem?
1 of 18
Summary
Introduction to the Central Limit Theorem
The Central Limit Theorem (CLT) is one of the most powerful and important results in statistics. It explains why normal distributions appear so frequently in real-world data, even when we're measuring things that don't naturally follow a normal distribution. Understanding the CLT is essential for hypothesis testing, confidence intervals, and many statistical inference methods you'll encounter.
What is the Central Limit Theorem?
The Central Limit Theorem states that when you take the average of a random sample from a population, that average follows an approximately normal (bell-shaped) distribution—provided your sample is large enough. Remarkably, this holds true regardless of what the original population distribution looks like.
Here's the key insight: imagine you repeated an experiment many times, calculating the sample mean each time. If you plotted all those sample means, they would form a bell curve. This is powerful because it lets us use normal distribution methods for inference even when dealing with populations that aren't normally distributed.
The image shows this beautifully: on the left is a non-normal population distribution, and on the right is the sampling distribution of the sample mean—which is approximately normal, even though the population wasn't.
The Mathematical Formulation
Let's define the problem formally. Suppose you have a random sample $X1, X2, \ldots, Xn$ from a population with:
Mean: $\mu$
Variance: $\sigma^2$ (which must be positive and finite)
The sample mean is:
$$\bar{X}n = \frac{1}{n}\sum{i=1}^{n}Xi$$
The Central Limit Theorem tells us that the standardized sample mean converges to a standard normal distribution as $n$ grows large:
$$\frac{\sqrt{n}\,(\bar{X}n-\mu)}{\sigma} \xrightarrow{d} N(0,1)$$
The notation $\xrightarrow{d}$ means "converges in distribution to," and $N(0,1)$ is the standard normal distribution with mean 0 and variance 1.
Why standardize? Without standardization, as $n$ increases, the sample mean gets closer and closer to $\mu$, and its distribution would collapse to a point. By multiplying by $\sqrt{n}$, we "rescale" the problem to see how the sample mean fluctuates around $\mu$.
The Classical Central Limit Theorem
The classical CLT is the version you'll use most often. It applies when your random variables satisfy two key requirements.
Required Assumptions
The classical CLT requires that your random variables satisfy these conditions:
Independence and Identical Distribution (i.i.d.): Each $Xi$ must be independent of all others, and they must all come from the same population distribution.
Finite Mean and Variance: The population must have a finite mean $\mu$ and finite positive variance $\sigma^2$.
These assumptions are crucial. If they're violated, the theorem may not apply, or a different version of the CLT may be needed.
The Statement
For i.i.d. random variables, we can also express the CLT in terms of the standardized sum:
$$Sn^{} = \frac{\sum{i=1}^{n}Xi - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0,1)$$
This is equivalent to the formulation using the sample mean (since the sum equals $n$ times the mean). Both forms tell us the same thing: the properly standardized average converges to a normal distribution.
Practical Implication
In practice, this means that for reasonably large $n$, you can approximate the distribution of $\bar{X}n$ as:
$$\bar{X}n \approx N\left(\mu, \frac{\sigma^2}{n}\right)$$
Notice that the variance of the sample mean is $\frac{\sigma^2}{n}$, which decreases as $n$ increases. This is why larger samples give us more precise estimates—the sample mean varies less.
Connection to the Law of Large Numbers
The Law of Large Numbers (LLN) and the CLT work together but describe different things:
Law of Large Numbers: Tells us that $\bar{X}n \xrightarrow{p} \mu$ (the sample mean converges to the true mean as $n \to \infty$)
Central Limit Theorem: Tells us how fast this convergence happens and how the sample mean fluctuates around $\mu$ during finite samples
Think of it this way: the LLN guarantees that your sample average eventually hits the bullseye (the true mean), while the CLT describes the pattern of shots around the bullseye as you increase the number of shots.
Uniformity of Convergence
One important technical detail: the convergence of the cumulative distribution functions (CDFs) is uniform. This means that the approximation quality doesn't depend on what specific value you're evaluating—the normal approximation works equally well across the entire distribution, not just in the center. This uniformity ensures that the CLT is reliable for practical applications.
When the Classical Assumptions Don't Hold: Generalizations
The classical CLT assumes i.i.d. variables with finite variance. But what if your data violates these assumptions? Fortunately, mathematicians have developed generalizations.
The Lyapunov Central Limit Theorem
The Lyapunov CLT relaxes the requirement that variables be identically distributed. This is important because in real applications, you often have independent measurements that come from slightly different distributions.
What Changed?
Variables need only be independent, not identical
Each variable can have a different distribution
All variables must have finite means $\mui$ (which may differ)
The Lyapunov Condition
The Lyapunov CLT replaces the "identical distribution" requirement with a technical condition called the Lyapunov condition.
Define the sum of variances:
$$sn^{2} = \sum{i=1}^{n}\operatorname{Var}(Xi)$$
The Lyapunov condition requires that for some $r > 2$:
$$\lim{n\to\infty}\frac{1}{sn^{r}}\sum{i=1}^{n}E\left[|Xi-\mui|^{\,r}\right]=0$$
What does this mean intuitively? The condition ensures that no single variable's variability dominates the sum. In other words, the variability is "spread out" across many roughly comparable terms, with no single term being extreme.
The Conclusion
If the Lyapunov condition holds, then:
$$\frac{\sum{i=1}^{n}(Xi-\mui)}{sn} \xrightarrow{d} N(0,1)$$
The Lindeberg–Feller Central Limit Theorem
<extrainfo>
The Lindeberg–Feller CLT provides an even weaker (and more general) condition than Lyapunov.
Lindeberg Condition
For each $\varepsilon > 0$, the condition requires:
$$\frac{1}{sn^{2}}\sum{i=1}^{n}E\left[(Xi-\mui)^{2}\mathbf{1}{|Xi-\mui|>\varepsilon sn}\right]\to 0$$
This condition says: among all the variability, the contribution from "large deviations" (values far from their means) must vanish relative to the total variation.
Relationship to Lyapunov
An important theoretical fact: satisfying the Lyapunov condition implies the Lindeberg condition, but not vice versa. This means Lindeberg is the weaker (more general) condition—it applies in more situations. However, the Lyapunov condition is easier to check in practice.
</extrainfo>
The Multidimensional Central Limit Theorem
The CLT extends naturally to situations where you're measuring multiple variables simultaneously. This is important for understanding how multiple measurements behave together.
Random Vectors
Suppose you observe random vectors (points in $d$-dimensional space):
$$\mathbf{X}1, \mathbf{X}2, \ldots, \mathbf{X}n$$
each coming from the same distribution in $\mathbb{R}^{d}$ with:
Mean vector: $\boldsymbol{\mu}$
Covariance matrix: $\Sigma$
The Multivariate CLT
The sample mean vector:
$$\bar{\mathbf{X}}n = \frac{1}{n}\sum{i=1}^{n}\mathbf{X}i$$
satisfies:
$$\sqrt{n}(\bar{\mathbf{X}}n - \boldsymbol{\mu}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$$
This means the sample mean vector is approximately normally distributed with the given covariance structure.
How This is Proven
The clever proof technique is called the Cramér–Wold device. Instead of proving the multivariate result directly, mathematicians project the vector onto arbitrary one-dimensional directions and show that each projection satisfies the one-dimensional CLT. Since this works for all possible directions, the multivariate result follows. This is an elegant example of reducing a complex problem to simpler, known cases.
<extrainfo>
Alternative Formulations: The Local Limit Theorem
Besides the distributional convergence we've discussed, there's another perspective on the CLT called the Local Limit Theorem.
Density Function Perspective
Instead of asking "what is the distribution of the sample mean?", we can ask "what does the probability density function look like?"
The Local Limit Theorem states that under suitable regularity conditions, the probability density function of the normalized sample mean approaches the normal density function.
More concretely: if you convolve (combine through repeated averaging) many probability distributions together, the resulting distribution becomes increasingly normal. This shows why the normal distribution is so natural—it's what you get when you combine many independent random influences.
</extrainfo>
Summary of Key Takeaways
The Central Limit Theorem is foundational to statistics because it justifies using normal-based inference methods broadly:
Classical CLT: For i.i.d. samples with finite variance, the sample mean is approximately normal for large $n$
Extensions exist: Lyapunov and Lindeberg generalizations handle non-identical distributions
Works in multiple dimensions: The theorem extends to multivariate settings naturally
Remarkably general: The theorem applies regardless of the original population distribution—as long as the assumptions hold
This universality is what makes the CLT so powerful for practical statistics.
Flashcards
What happens to the distribution of a normalized sample mean as the sample size grows according to the central limit theorem?
It converges to a standard normal distribution.
Does the central limit theorem require the original random variables to be normally distributed?
No, it applies even when they are not normally distributed.
What is the primary practical significance of the central limit theorem regarding non-normal distributions?
Methods that assume normality can often be used for many other distributions.
In the basic statistical formulation of the CLT, what expression represents the random variable that converges to $Z \sim N(0,1)$?
$\frac{\sqrt{n}(\bar{X}n - \mu)}{\sigma}$ (where $n$ is sample size, $\bar{X}n$ is sample mean, $\mu$ is population mean, and $\sigma$ is standard deviation).
Intuitively, what distribution is formed by repeating an experiment many times and computing the average each time?
An approximately normal distribution (for large sample sizes).
What are the two primary assumptions for the random variables in the classical CLT?
They must be independent and identically distributed (i.i.d.).
Each must have a finite mean $\mu$ and finite variance $\sigma^2$.
What is the formula for the normalized sum $Sn^$ that converges to $N(0,1)$ in the classical CLT?
$Sn^ = \frac{\sum{i=1}^{n}Xi - n\mu}{\sigma\sqrt{n}}$
While the Law of Large Numbers guarantees the sample mean converges to $\mu$, what does the CLT describe specifically?
How the fluctuations around $\mu$ behave when scaled by $\sqrt{n}$.
What is the nature of the convergence of the cumulative distribution functions in the classical CLT?
The convergence is uniform in the argument.
What may happen to the limiting distribution if the underlying variables have infinite variance?
The CLT may fail and other stable laws (like the Cauchy distribution) become the limits.
How does the requirement for random variables in the Lyapunov CLT differ from the classical CLT?
They only need to be independent; they do not need to be identically distributed.
What moment condition must be satisfied for the Lyapunov CLT?
Each variable must have a finite $r$-th moment for some $r > 2$.
In the Lyapunov CLT, if the Lyapunov condition holds, what does the standardized sum $\frac{\sum{i=1}^{n}(Xi-\mui)}{sn}$ converge to?
$N(0,1)$ (the standard normal distribution).
What is the relationship between the Lyapunov condition and the Lindeberg condition?
The Lyapunov condition implies the Lindeberg condition, but not vice versa.
Is the Lindeberg condition stronger or weaker than the Lyapunov condition?
It is a weaker condition.
What is the limiting distribution for a normalized sum of i.i.d. random vectors in $\mathbb{R}^d$?
A multivariate normal distribution $N(\mathbf{0}, \Sigma)$ (where $\Sigma$ is the covariance matrix).
Which mathematical tool is used to reduce the multivariate CLT case to a one-dimensional CLT by projecting onto arbitrary directions?
The Cramér–Wold device.
What does the density-function view (Local Limit Theorem) state regarding the convolution of many probability densities?
It approaches the normal density under suitable regularity conditions.
Quiz
Foundations of the Central Limit Theorem Quiz Question 1: What does the central limit theorem assert about the distribution of a normalized sample mean when the sample size becomes large?
- It converges to a standard normal distribution (correct)
- It converges to the original population distribution
- It becomes a uniform distribution
- It diverges to infinity
Foundations of the Central Limit Theorem Quiz Question 2: In the classical CLT, what is the limiting distribution of the normalized sum \(S_n^{*}= \frac{\sum_{i=1}^{n}X_i - n\mu}{\sigma\sqrt{n}}\) for i.i.d. variables?
- Standard normal distribution \(N(0,1)\) (correct)
- T‑distribution with \(n-1\) degrees of freedom
- Chi‑squared distribution with 1 degree of freedom
- Exponential distribution
Foundations of the Central Limit Theorem Quiz Question 3: In the local limit theorem view of the CLT, what does the convolution of many probability densities approach under suitable regularity conditions?
- The normal (Gaussian) density (correct)
- The uniform density
- The exponential density
- The Poisson mass function
Foundations of the Central Limit Theorem Quiz Question 4: If the Lyapunov condition holds for a sequence of independent variables, which other condition is automatically satisfied?
- The Lindeberg condition (correct)
- The uniform integrability condition
- The Cramér condition
- The Kolmogorov condition
Foundations of the Central Limit Theorem Quiz Question 5: Which device is used to reduce the multivariate Central Limit Theorem to the one‑dimensional case?
- The Cramér–Wold device (correct)
- The Slutsky theorem
- The Delta method
- The Skorokhod representation
Foundations of the Central Limit Theorem Quiz Question 6: How does the Lindeberg condition compare to the Lyapunov condition?
- It is weaker (less restrictive) (correct)
- It is stronger (more restrictive)
- It is equivalent
- It is unrelated
Foundations of the Central Limit Theorem Quiz Question 7: Which statement correctly describes the asymptotic behavior of the standardized sample mean \(\displaystyle \frac{\sqrt{n}\,(\bar{X}_n-\mu)}{\sigma}\) as the sample size \(n\) grows?
- It converges in distribution to a standard normal variable \(N(0,1)\). (correct)
- It converges almost surely to zero.
- It diverges to infinity in probability.
- It converges in probability to the population mean \(\mu\).
Foundations of the Central Limit Theorem Quiz Question 8: When the Lindeberg condition is satisfied, to which distribution does the standardized sum \(\displaystyle \frac{\sum_{i=1}^{n}(X_i-\mu_i)}{s_n}\) converge?
- Standard normal distribution \(N(0,1)\). (correct)
- Student’s t‑distribution with \(n-1\) degrees of freedom.
- Cauchy distribution.
- Chi‑square distribution with \(n\) degrees of freedom.
Foundations of the Central Limit Theorem Quiz Question 9: In the multidimensional Central Limit Theorem, the limiting multivariate normal distribution has covariance matrix equal to which of the following?
- \(\Sigma\) (correct)
- \(I\) (the identity matrix)
- \(2\Sigma\)
- \(\Sigma^{2}\) (the matrix square of \(\Sigma\))
Foundations of the Central Limit Theorem Quiz Question 10: For a sequence of i.i.d. random vectors in ℝⁿ with mean vector μ and covariance matrix Σ, to which distribution does the scaled sum (1/√n)∑_{i=1}^{n}(X_i‑μ) converge?
- Multivariate normal N(0, Σ) (correct)
- Multivariate t‑distribution
- Multivariate uniform distribution
- Multivariate exponential distribution
Foundations of the Central Limit Theorem Quiz Question 11: In the Lyapunov Central Limit Theorem, how is the quantity \(s_n^{2}\) defined?
- \(s_n^{2}= \displaystyle\sum_{i=1}^{n}\operatorname{Var}(X_i)\) (correct)
- \(s_n^{2}= \displaystyle\frac{1}{n}\sum_{i=1}^{n}\operatorname{Var}(X_i)\)
- \(s_n^{2}= \bigg(\displaystyle\sum_{i=1}^{n}E\!\big|X_i-\mu_i\big|\bigg)^{2}\)
- \(s_n^{2}= \displaystyle\prod_{i=1}^{n}\operatorname{Var}(X_i)\)
Foundations of the Central Limit Theorem Quiz Question 12: When the Lyapunov condition holds, to what distribution does the standardized sum \(\displaystyle\frac{\sum_{i=1}^{n}(X_i-\mu_i)}{s_n}\) converge?
- Standard normal distribution \(N(0,1)\) (correct)
- Student's t‑distribution with \(n-1\) degrees of freedom
- Cauchy distribution
- Chi‑square distribution with \(n\) degrees of freedom
Foundations of the Central Limit Theorem Quiz Question 13: Which of the following is an example of a stable distribution that can arise as the limiting law when the underlying variables have infinite variance?
- Cauchy distribution (correct)
- Standard normal distribution
- Poisson distribution
- Exponential distribution
Foundations of the Central Limit Theorem Quiz Question 14: In the Lyapunov Central Limit Theorem, the moment condition involves an exponent $r$. Which statement about $r$ is correct?
- $r$ must be greater than 2 (correct)
- $r$ must equal 2
- $r$ can be any positive number
- $r$ must be less than 2
Foundations of the Central Limit Theorem Quiz Question 15: When an experiment is performed repeatedly and the average of each large sample is plotted, what shape does the resulting histogram tend to approach?
- Approximately normal (bell‑shaped) distribution (correct)
- Uniform distribution
- Skewed distribution matching the original population
- Exponential decay distribution
Foundations of the Central Limit Theorem Quiz Question 16: For each random variable in the classical CLT, which two quantities must be finite?
- A finite mean and a finite positive variance (correct)
- An infinite variance but finite mean
- A finite mean only; variance may be infinite
- No finiteness requirement; only independence matters
Foundations of the Central Limit Theorem Quiz Question 17: Which theorem guarantees that the sample mean \(\bar{X}_n\) converges almost surely to the population mean \(\mu\) as the number of observations grows?
- The Law of Large Numbers (correct)
- The Central Limit Theorem
- The Chebyshev Inequality
- The Markov Inequality
Foundations of the Central Limit Theorem Quiz Question 18: In the Central Limit Theorem, the deviation of the sample mean from \(\mu\) must be multiplied by which factor to obtain a distribution that approaches the standard normal?
- \(\sqrt{n}\) (correct)
- \(n\)
- \(\frac{1}{\sqrt{n}}\)
- \(\log n\)
Foundations of the Central Limit Theorem Quiz Question 19: How does the convergence of the cumulative distribution functions of the standardized sum \(S_n^{*}\) to the standard normal cdf \(\Phi\) occur?
- Uniformly in the argument (correct)
- Only pointwise for each fixed \(x\)
- In probability but not uniformly
- In the mean‑square sense
What does the central limit theorem assert about the distribution of a normalized sample mean when the sample size becomes large?
1 of 19
Key Concepts
Central Limit Theorems
Central Limit Theorem
Lyapunov Central Limit Theorem
Lindeberg–Feller Central Limit Theorem
Multivariate Central Limit Theorem
Cramér–Wold device
Convergence Theorems
Law of Large Numbers
Uniform convergence of distribution functions
Local limit theorem
Special Distributions
Stable distribution
Definitions
Central Limit Theorem
A fundamental result stating that the normalized sum (or average) of a large number of independent, identically distributed random variables with finite variance converges in distribution to a standard normal distribution.
Law of Large Numbers
A theorem asserting that the sample mean of independent, identically distributed random variables converges almost surely to the true population mean as the sample size grows.
Lyapunov Central Limit Theorem
An extension of the CLT that requires only independence (not identical distribution) and a Lyapunov moment condition to guarantee convergence of the standardized sum to a normal distribution.
Lindeberg–Feller Central Limit Theorem
A more general CLT that replaces Lyapunov’s moment condition with the Lindeberg condition, a weaker requirement on the tails of the summands.
Multivariate Central Limit Theorem
The generalization of the CLT to vector‑valued random variables, stating that the normalized sum of i.i.d. random vectors converges to a multivariate normal distribution.
Cramér–Wold device
A technique that proves convergence in distribution of random vectors by showing convergence of all one‑dimensional linear projections.
Stable distribution
A class of probability distributions that remain stable under convolution; they appear as limiting distributions when the CLT fails due to infinite variance.
Uniform convergence of distribution functions
The property that the cumulative distribution functions of standardized sums converge to the normal cdf uniformly over all real arguments.
Local limit theorem
A refinement of the CLT describing the pointwise convergence of the probability density (or mass) functions of normalized sums to the normal density under additional regularity conditions.