RemNote Community
Community

Foundations of the Central Limit Theorem

Understand the core statement, assumptions, and extensions of the Central Limit Theorem—from the classical i.i.d. case to Lyapunov, Lindeberg–Feller, and multivariate formulations.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What happens to the distribution of a normalized sample mean as the sample size grows according to the central limit theorem?
1 of 18

Summary

Introduction to the Central Limit Theorem The Central Limit Theorem (CLT) is one of the most powerful and important results in statistics. It explains why normal distributions appear so frequently in real-world data, even when we're measuring things that don't naturally follow a normal distribution. Understanding the CLT is essential for hypothesis testing, confidence intervals, and many statistical inference methods you'll encounter. What is the Central Limit Theorem? The Central Limit Theorem states that when you take the average of a random sample from a population, that average follows an approximately normal (bell-shaped) distribution—provided your sample is large enough. Remarkably, this holds true regardless of what the original population distribution looks like. Here's the key insight: imagine you repeated an experiment many times, calculating the sample mean each time. If you plotted all those sample means, they would form a bell curve. This is powerful because it lets us use normal distribution methods for inference even when dealing with populations that aren't normally distributed. The image shows this beautifully: on the left is a non-normal population distribution, and on the right is the sampling distribution of the sample mean—which is approximately normal, even though the population wasn't. The Mathematical Formulation Let's define the problem formally. Suppose you have a random sample $X1, X2, \ldots, Xn$ from a population with: Mean: $\mu$ Variance: $\sigma^2$ (which must be positive and finite) The sample mean is: $$\bar{X}n = \frac{1}{n}\sum{i=1}^{n}Xi$$ The Central Limit Theorem tells us that the standardized sample mean converges to a standard normal distribution as $n$ grows large: $$\frac{\sqrt{n}\,(\bar{X}n-\mu)}{\sigma} \xrightarrow{d} N(0,1)$$ The notation $\xrightarrow{d}$ means "converges in distribution to," and $N(0,1)$ is the standard normal distribution with mean 0 and variance 1. Why standardize? Without standardization, as $n$ increases, the sample mean gets closer and closer to $\mu$, and its distribution would collapse to a point. By multiplying by $\sqrt{n}$, we "rescale" the problem to see how the sample mean fluctuates around $\mu$. The Classical Central Limit Theorem The classical CLT is the version you'll use most often. It applies when your random variables satisfy two key requirements. Required Assumptions The classical CLT requires that your random variables satisfy these conditions: Independence and Identical Distribution (i.i.d.): Each $Xi$ must be independent of all others, and they must all come from the same population distribution. Finite Mean and Variance: The population must have a finite mean $\mu$ and finite positive variance $\sigma^2$. These assumptions are crucial. If they're violated, the theorem may not apply, or a different version of the CLT may be needed. The Statement For i.i.d. random variables, we can also express the CLT in terms of the standardized sum: $$Sn^{} = \frac{\sum{i=1}^{n}Xi - n\mu}{\sigma\sqrt{n}} \xrightarrow{d} N(0,1)$$ This is equivalent to the formulation using the sample mean (since the sum equals $n$ times the mean). Both forms tell us the same thing: the properly standardized average converges to a normal distribution. Practical Implication In practice, this means that for reasonably large $n$, you can approximate the distribution of $\bar{X}n$ as: $$\bar{X}n \approx N\left(\mu, \frac{\sigma^2}{n}\right)$$ Notice that the variance of the sample mean is $\frac{\sigma^2}{n}$, which decreases as $n$ increases. This is why larger samples give us more precise estimates—the sample mean varies less. Connection to the Law of Large Numbers The Law of Large Numbers (LLN) and the CLT work together but describe different things: Law of Large Numbers: Tells us that $\bar{X}n \xrightarrow{p} \mu$ (the sample mean converges to the true mean as $n \to \infty$) Central Limit Theorem: Tells us how fast this convergence happens and how the sample mean fluctuates around $\mu$ during finite samples Think of it this way: the LLN guarantees that your sample average eventually hits the bullseye (the true mean), while the CLT describes the pattern of shots around the bullseye as you increase the number of shots. Uniformity of Convergence One important technical detail: the convergence of the cumulative distribution functions (CDFs) is uniform. This means that the approximation quality doesn't depend on what specific value you're evaluating—the normal approximation works equally well across the entire distribution, not just in the center. This uniformity ensures that the CLT is reliable for practical applications. When the Classical Assumptions Don't Hold: Generalizations The classical CLT assumes i.i.d. variables with finite variance. But what if your data violates these assumptions? Fortunately, mathematicians have developed generalizations. The Lyapunov Central Limit Theorem The Lyapunov CLT relaxes the requirement that variables be identically distributed. This is important because in real applications, you often have independent measurements that come from slightly different distributions. What Changed? Variables need only be independent, not identical Each variable can have a different distribution All variables must have finite means $\mui$ (which may differ) The Lyapunov Condition The Lyapunov CLT replaces the "identical distribution" requirement with a technical condition called the Lyapunov condition. Define the sum of variances: $$sn^{2} = \sum{i=1}^{n}\operatorname{Var}(Xi)$$ The Lyapunov condition requires that for some $r > 2$: $$\lim{n\to\infty}\frac{1}{sn^{r}}\sum{i=1}^{n}E\left[|Xi-\mui|^{\,r}\right]=0$$ What does this mean intuitively? The condition ensures that no single variable's variability dominates the sum. In other words, the variability is "spread out" across many roughly comparable terms, with no single term being extreme. The Conclusion If the Lyapunov condition holds, then: $$\frac{\sum{i=1}^{n}(Xi-\mui)}{sn} \xrightarrow{d} N(0,1)$$ The Lindeberg–Feller Central Limit Theorem <extrainfo> The Lindeberg–Feller CLT provides an even weaker (and more general) condition than Lyapunov. Lindeberg Condition For each $\varepsilon > 0$, the condition requires: $$\frac{1}{sn^{2}}\sum{i=1}^{n}E\left[(Xi-\mui)^{2}\mathbf{1}{|Xi-\mui|>\varepsilon sn}\right]\to 0$$ This condition says: among all the variability, the contribution from "large deviations" (values far from their means) must vanish relative to the total variation. Relationship to Lyapunov An important theoretical fact: satisfying the Lyapunov condition implies the Lindeberg condition, but not vice versa. This means Lindeberg is the weaker (more general) condition—it applies in more situations. However, the Lyapunov condition is easier to check in practice. </extrainfo> The Multidimensional Central Limit Theorem The CLT extends naturally to situations where you're measuring multiple variables simultaneously. This is important for understanding how multiple measurements behave together. Random Vectors Suppose you observe random vectors (points in $d$-dimensional space): $$\mathbf{X}1, \mathbf{X}2, \ldots, \mathbf{X}n$$ each coming from the same distribution in $\mathbb{R}^{d}$ with: Mean vector: $\boldsymbol{\mu}$ Covariance matrix: $\Sigma$ The Multivariate CLT The sample mean vector: $$\bar{\mathbf{X}}n = \frac{1}{n}\sum{i=1}^{n}\mathbf{X}i$$ satisfies: $$\sqrt{n}(\bar{\mathbf{X}}n - \boldsymbol{\mu}) \xrightarrow{d} N(\mathbf{0}, \Sigma)$$ This means the sample mean vector is approximately normally distributed with the given covariance structure. How This is Proven The clever proof technique is called the Cramér–Wold device. Instead of proving the multivariate result directly, mathematicians project the vector onto arbitrary one-dimensional directions and show that each projection satisfies the one-dimensional CLT. Since this works for all possible directions, the multivariate result follows. This is an elegant example of reducing a complex problem to simpler, known cases. <extrainfo> Alternative Formulations: The Local Limit Theorem Besides the distributional convergence we've discussed, there's another perspective on the CLT called the Local Limit Theorem. Density Function Perspective Instead of asking "what is the distribution of the sample mean?", we can ask "what does the probability density function look like?" The Local Limit Theorem states that under suitable regularity conditions, the probability density function of the normalized sample mean approaches the normal density function. More concretely: if you convolve (combine through repeated averaging) many probability distributions together, the resulting distribution becomes increasingly normal. This shows why the normal distribution is so natural—it's what you get when you combine many independent random influences. </extrainfo> Summary of Key Takeaways The Central Limit Theorem is foundational to statistics because it justifies using normal-based inference methods broadly: Classical CLT: For i.i.d. samples with finite variance, the sample mean is approximately normal for large $n$ Extensions exist: Lyapunov and Lindeberg generalizations handle non-identical distributions Works in multiple dimensions: The theorem extends to multivariate settings naturally Remarkably general: The theorem applies regardless of the original population distribution—as long as the assumptions hold This universality is what makes the CLT so powerful for practical statistics.
Flashcards
What happens to the distribution of a normalized sample mean as the sample size grows according to the central limit theorem?
It converges to a standard normal distribution.
Does the central limit theorem require the original random variables to be normally distributed?
No, it applies even when they are not normally distributed.
What is the primary practical significance of the central limit theorem regarding non-normal distributions?
Methods that assume normality can often be used for many other distributions.
In the basic statistical formulation of the CLT, what expression represents the random variable that converges to $Z \sim N(0,1)$?
$\frac{\sqrt{n}(\bar{X}n - \mu)}{\sigma}$ (where $n$ is sample size, $\bar{X}n$ is sample mean, $\mu$ is population mean, and $\sigma$ is standard deviation).
Intuitively, what distribution is formed by repeating an experiment many times and computing the average each time?
An approximately normal distribution (for large sample sizes).
What are the two primary assumptions for the random variables in the classical CLT?
They must be independent and identically distributed (i.i.d.). Each must have a finite mean $\mu$ and finite variance $\sigma^2$.
What is the formula for the normalized sum $Sn^$ that converges to $N(0,1)$ in the classical CLT?
$Sn^ = \frac{\sum{i=1}^{n}Xi - n\mu}{\sigma\sqrt{n}}$
While the Law of Large Numbers guarantees the sample mean converges to $\mu$, what does the CLT describe specifically?
How the fluctuations around $\mu$ behave when scaled by $\sqrt{n}$.
What is the nature of the convergence of the cumulative distribution functions in the classical CLT?
The convergence is uniform in the argument.
What may happen to the limiting distribution if the underlying variables have infinite variance?
The CLT may fail and other stable laws (like the Cauchy distribution) become the limits.
How does the requirement for random variables in the Lyapunov CLT differ from the classical CLT?
They only need to be independent; they do not need to be identically distributed.
What moment condition must be satisfied for the Lyapunov CLT?
Each variable must have a finite $r$-th moment for some $r > 2$.
In the Lyapunov CLT, if the Lyapunov condition holds, what does the standardized sum $\frac{\sum{i=1}^{n}(Xi-\mui)}{sn}$ converge to?
$N(0,1)$ (the standard normal distribution).
What is the relationship between the Lyapunov condition and the Lindeberg condition?
The Lyapunov condition implies the Lindeberg condition, but not vice versa.
Is the Lindeberg condition stronger or weaker than the Lyapunov condition?
It is a weaker condition.
What is the limiting distribution for a normalized sum of i.i.d. random vectors in $\mathbb{R}^d$?
A multivariate normal distribution $N(\mathbf{0}, \Sigma)$ (where $\Sigma$ is the covariance matrix).
Which mathematical tool is used to reduce the multivariate CLT case to a one-dimensional CLT by projecting onto arbitrary directions?
The Cramér–Wold device.
What does the density-function view (Local Limit Theorem) state regarding the convolution of many probability densities?
It approaches the normal density under suitable regularity conditions.

Quiz

What does the central limit theorem assert about the distribution of a normalized sample mean when the sample size becomes large?
1 of 19
Key Concepts
Central Limit Theorems
Central Limit Theorem
Lyapunov Central Limit Theorem
Lindeberg–Feller Central Limit Theorem
Multivariate Central Limit Theorem
Cramér–Wold device
Convergence Theorems
Law of Large Numbers
Uniform convergence of distribution functions
Local limit theorem
Special Distributions
Stable distribution