RemNote Community
Community

Normal distribution - Statistical Inference Assessment and Computational Techniques

Understand how to estimate and infer normal distribution parameters, assess normality, apply Bayesian analysis, and generate normal random variables using computational techniques.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What is the maximum‑likelihood estimate (MLE) of the mean $\mu$ for a sample $x1,\dots,xn$ from a normal distribution?
1 of 25

Summary

Statistical Inference for the Normal Distribution Introduction Statistical inference allows us to learn about population parameters from sample data. When we assume our data follow a normal distribution, we can precisely estimate the population mean and variance, construct confidence intervals, and test hypotheses. This chapter covers the key techniques for inference with normally distributed data, including both classical frequentist and Bayesian approaches. Estimating the Mean and Variance Maximum Likelihood Estimation When we have a sample $x1, x2, \ldots, xn$ drawn from a normal distribution, we can estimate the population parameters using maximum likelihood estimation (MLE). The maximum likelihood estimator (MLE) of the mean is simply the sample mean: $$\hat{\mu} = \frac{1}{n}\sum{i=1}^{n} xi$$ The MLE of the variance uses a divisor of $n$: $$\hat{\sigma}^2{\text{MLE}} = \frac{1}{n}\sum{i=1}^{n}(xi - \hat{\mu})^2$$ Unbiased Estimation of Variance Here's an important distinction: while $\hat{\sigma}^2{\text{MLE}}$ is the maximum likelihood estimator, it systematically underestimates the population variance. To correct for this bias, we use the sample variance with divisor $n-1$: $$s^2 = \frac{1}{n-1}\sum{i=1}^{n}(xi - \hat{\mu})^2$$ This estimator is unbiased, meaning that on average across all possible samples, it equals the true variance $\sigma^2$. The divisor $n-1$ corrects for the fact that we estimated the mean from the same data—we lose one "degree of freedom" in this process. Properties of These Estimators The sample mean $\hat{\mu}$ has several desirable properties: It is the uniformly minimum-variance unbiased (UMVU) estimator for $\mu$. This means among all unbiased estimators, it has the smallest variance. It is consistent, meaning as $n$ increases, $\hat{\mu}$ converges in probability to the true mean $\mu$. For variance estimation, the situation is more nuanced: $s^2$ is unbiased but has slightly higher variance than $\hat{\sigma}^2{\text{MLE}}$. $\hat{\sigma}^2{\text{MLE}}$ is biased (underestimates) but has lower mean-squared error overall. Both are consistent estimators—they converge to $\sigma^2$ as $n \to \infty$. In practice, $s^2$ is preferred because unbiasedness is valued in inference, particularly when constructing confidence intervals. Confidence Intervals for the Mean The t-Distribution Approach When data are normally distributed, we can construct exact confidence intervals (not just approximate ones). A key fact from Cochran's theorem is that $\hat{\mu}$ and $s^2$ are independent, which allows us to construct a pivotal quantity that follows a known distribution. The pivotal quantity is: $$t = \frac{\hat{\mu} - \mu}{s/\sqrt{n}}$$ This follows a Student's $t$ distribution with $n-1$ degrees of freedom. (The degrees of freedom equals $n-1$ because we estimated the mean from the data.) Constructing the Interval A $(1-\alpha)$ confidence interval for $\mu$ is: $$\hat{\mu} \pm t{\alpha/2, n-1} \cdot \frac{s}{\sqrt{n}}$$ where $t{\alpha/2, n-1}$ is the critical value from the $t$-distribution such that the area in each tail is $\alpha/2$. What this means: We can be $(1-\alpha) \times 100\%$ confident that the true population mean falls within this interval. For example, with $\alpha = 0.05$, we have a 95% confidence interval. Important note: The $t$-distribution has heavier tails than the standard normal distribution. For small sample sizes, $t$-critical values are larger than standard normal critical values, resulting in wider confidence intervals. As $n$ increases, the $t$-distribution approaches the standard normal distribution. Confidence Intervals for the Variance Chi-Square Distribution Confidence intervals for the variance use the chi-square distribution. The key fact is that the quantity: $$\frac{(n-1)s^2}{\sigma^2}$$ follows a chi-square distribution with $n-1$ degrees of freedom. Computing the Interval Here's where care is needed: the chi-square distribution is asymmetric. A $(1-\alpha)$ confidence interval for $\sigma^2$ is: $$\frac{(n-1)s^2}{\chi^2{\alpha/2, n-1}} < \sigma^2 < \frac{(n-1)s^2}{\chi^2{1-\alpha/2, n-1}}$$ Notice the inversion: the lower bound uses the upper critical value $\chi^2{\alpha/2, n-1}$ (in the denominator), and the upper bound uses the lower critical value $\chi^2{1-\alpha/2, n-1}$ (in the denominator). This inverse relationship can be confusing, but it arises naturally from rearranging the inequality. Standard Deviation Confidence Interval To obtain a confidence interval for the standard deviation $\sigma$, simply take the square root of the variance interval bounds. Hypothesis Testing The t-Test for the Mean To test whether the population mean equals some hypothesized value $\mu0$, we compute the test statistic: $$t = \frac{\hat{\mu} - \mu0}{s/\sqrt{n}}$$ Under the null hypothesis that $\mu = \mu0$, this statistic follows a $t$-distribution with $n-1$ degrees of freedom. We reject $H0$ when $|t|$ exceeds the critical value $t{\alpha/2, n-1}$. The F-Test for Equality of Variances When comparing variances from two independent normal samples, the F-test uses the ratio of the sample variances. Since each sample variance, when scaled by $\sigma^2$, follows a chi-square distribution, their ratio follows an F-distribution. A ratio far from 1 suggests unequal variances. Assessing Normality Before using normal-distribution-based inference, we should verify that our data are approximately normal. Several diagnostic and formal tests are available. Q-Q Plot A Q-Q plot (quantile-quantile plot) is a visual diagnostic tool. It plots: The ordered sample values $x{(1)}, x{(2)}, \ldots, x{(n)}$ on the vertical axis The corresponding theoretical normal quantiles $\Phi^{-1}(pk)$ on the horizontal axis, where $pk$ are plotting positions (typically $pk = k/(n+1)$) If the data are truly normal, the points follow approximately a straight line. Deviations from linearity—such as curvature at the ends, or a systematic S-shape—suggest departures from normality (e.g., heavy tails, skewness, or bimodality). P-P Plot A P-P plot (probability-probability plot) offers an alternative visual check: Plot the empirical cumulative probabilities $\Phi(z{(k)})$ on the vertical axis Plot the theoretical probabilities $pk$ on the horizontal axis For normal data, the plot should follow the 45° line from $(0,0)$ to $(1,1)$. This is less sensitive to departures in the tails than the Q-Q plot. Formal Goodness-of-Fit Tests <extrainfo> Moment-based tests detect non-normality using sample skewness and kurtosis: D'Agostino's $K^2$ test combines a skewness component and a kurtosis component into a single statistic. Jarque-Bera test similarly uses sample skewness and kurtosis; it's particularly popular in time-series analysis. Empirical distribution tests measure overall discrepancy between the empirical and theoretical CDFs: Anderson-Darling test gives extra weight to deviations in the tails of the distribution, making it sensitive to outliers. Lilliefors test adapts the classical Kolmogorov-Smirnov test for the case where mean and variance are estimated from the data (rather than being known in advance). </extrainfo> Shapiro-Wilk test is widely recommended. It compares the least-squares slope of the Q-Q plot (essentially, the correlation between ordered data and theoretical quantiles) with the sample standard deviation. Large deviations suggest the data deviate from normality. The Shapiro-Wilk test is particularly powerful for detecting non-normality in small to moderate samples. Bayesian Inference for Normal Data Bayesian inference uses prior distributions to combine with data for posterior inference. The normal distribution admits particularly tractable conjugate priors, where the posterior is in the same family as the prior. Known Variance, Unknown Mean When the variance $\sigma^2$ is known, the conjugate prior for the mean $\mu$ is a normal distribution. Specifically, suppose: Prior: $\mu \sim N(\mu0, \tau0^2)$ with mean $\mu0$ and variance $\tau0^2$ Data: $xi \sim N(\mu, \sigma^2)$ It's convenient to work with precision, defined as $\tau = 1/\sigma^2$ (inverse variance). The posterior is also normal: $$\mu \mid \text{data} \sim N(\mun, \taun^{-2})$$ with updated posterior precision: $$\taun = \tau0 + n\tau = \frac{1}{\tau0^2} + \frac{n}{\sigma^2}$$ and posterior mean (a precision-weighted average of prior and sample information): $$\mun = \frac{\tau0 \mu0 + n\tau \bar{x}}{\taun} = \frac{\tau0 \mu0 + n\bar{x}/\sigma^2}{\tau0 + n/\sigma^2}$$ The posterior mean is a compromise between the prior mean $\mu0$ and the sample mean $\bar{x}$, with weights determined by their respective precisions. Known Mean, Unknown Variance When the mean $\mu$ is known, the conjugate prior for the variance is a scaled inverse-chi-squared distribution (equivalently, an inverse-gamma distribution). If the prior has scale parameter $s0^2$ and degrees of freedom $\nu0$, the posterior has: Degrees of freedom: $\nun = \nu0 + n$ Scale: $sn^2 = \frac{\nu0 s0^2 + \sum{i=1}^n(xi - \mu)^2}{\nun}$ The posterior combines prior information ($\nu0$ "pseudo-observations") with sample information. Unknown Mean and Unknown Variance The most realistic case has both parameters unknown. The conjugate prior is the normal-inverse-gamma distribution, which is the product of: A normal prior for $\mu$ (conditional on $\sigma^2$) An inverse-gamma prior for $\sigma^2$ Key hyperparameters include: $\mu0$: prior mean for $\mu$ $\lambda0$: prior precision for $\mu$ (can be interpreted as the number of pseudo-observations informing the prior mean) $\alpha0$, $\beta0$: shape and scale of the inverse-gamma prior for $\sigma^2$ The posterior is also normal-inverse-gamma with updated parameters: Updated shape: $\alphan = \alpha0 + \frac{n}{2}$ Updated scale: $\betan = \beta0 + \frac{1}{2}\sum{i=1}^n(xi - \bar{x})^2 + \frac{\lambda0 n}{2(\lambda0 + n)}(\bar{x} - \mu0)^2$ The final term in the scale accounts for discrepancy between the sample mean and prior mean. Why the Normal Distribution Arises The Central Limit Theorem Many real-world phenomena produce approximately normal distributions through the Central Limit Theorem: when many small, independent additive effects combine, their sum approaches a normal distribution regardless of the distribution of the individual effects. This explains why normal distributions appear so commonly in nature. <extrainfo> Classic applications: Binomial random variables approach normality for large trial numbers, and Poisson random variables approach normality when the mean is large. These convergences can be understood through the Central Limit Theorem—a binomial is the sum of independent Bernoulli trials, and a Poisson can be approximated as a sum of many rare events. </extrainfo>
Flashcards
What is the maximum‑likelihood estimate (MLE) of the mean $\mu$ for a sample $x1,\dots,xn$ from a normal distribution?
The sample mean $\hat\mu = \frac{1}{n}\sum xi$
What divisor is used in the maximum‑likelihood estimate (MLE) of the variance $\hat\sigma^{2}{\text{MLE}}$?
$n$
What is the formula for the unbiased estimator of the variance $s^{2}$?
$s^{2} = \frac{1}{n-1}\sum (xi-\hat\mu)^{2}$
How do the biased MLE variance estimator and the unbiased $s^2$ estimator compare in terms of mean-squared error?
The biased $\hat\sigma^{2}{\text{MLE}}$ has a lower mean‑squared error.
Which estimator is considered the uniformly minimum‑variance unbiased (UMVU) estimator for the mean $\mu$?
The sample mean $\hat\mu$
What property describes $\hat\mu$ and $s^{2}$ as they converge in probability to the true parameters as $n \to \infty$?
Consistency
According to Cochran's theorem, what distribution does the pivotal quantity $t = \frac{\hat\mu-\mu}{s/\sqrt{n}}$ follow?
Student’s $t$ distribution with $n-1$ degrees of freedom
What is the formula for a $(1-\alpha)$ confidence interval for the mean $\mu$?
$\hat\mu \pm t{\alpha/2, n-1} \frac{s}{\sqrt{n}}$
Which distribution is used to calculate the confidence interval for the variance $\sigma^{2}$?
The chi‑square distribution
What are the lower and upper bounds for a $(1-\alpha)$ confidence interval for the variance $\sigma^2$?
Lower: $\frac{(n-1)s^{2}}{\chi^{2}{\alpha/2, n-1}}$; Upper: $\frac{(n-1)s^{2}}{\chi^{2}{1-\alpha/2, n-1}}$
What specific test is used to compare the equality of variances between two independent groups?
The $F$‑test
What is the conjugate prior for the mean of a normal distribution when the variance is known?
A normal distribution
What is the conjugate prior for the variance $\sigma^{2}$ of a normal distribution when the mean is known?
A scaled inverse‑chi‑squared distribution (or inverse‑gamma distribution)
How is the confidence interval for the standard deviation $\sigma$ derived from the variance interval?
By taking the square root of the variance interval bounds
In a $(1-\alpha)$ confidence interval, what does $\alpha$ represent?
The probability that the true parameters fall outside the interval
For a large sample size $n$, what distribution quantiles can replace chi‑square quantiles in variance confidence intervals?
Standard‑normal quantiles $z{\alpha/2}$
In a Q–Q plot, what is the ordered data $x{(k)}$ graphed against?
Theoretical normal quantiles $\Phi^{-1}(p{k})$
What is the expected appearance of a P–P plot if the data follows a normal distribution?
Points should follow the 45° line from $(0,0)$ to $(1,1)$
Which normality test specifically compares the least‑squares slope of the Q–Q plot with the sample variance?
The Shapiro‑Wilk test
Which normality test is known for giving more weight to the tails of the distribution?
The Anderson–Darling test
If $\tau$ is data precision and $\tau0$ is prior precision, what is the formula for the posterior precision $\taun$?
$\tau{n} = \tau{0} + n\tau$
How is the posterior mean $\mun$ calculated in the case of a known variance?
As a precision-weighted average: $\mu{n} = \frac{\tau{0}\mu{0} + n\tau\bar{x}}{\tau{0} + n\tau}$
According to the Central Limit Theorem, when does a resulting distribution become approximately normal?
When many small, independent, additive effects combine
What is the central-limit approximation (Irwin–Hall) method for generating a standard normal variable?
Sum twelve independent $U(0,1)$ variables and subtract six
What is the relationship between the standard normal CDF $\Phi(z)$ and the error function $\operatorname{erf}(x)$?
$\Phi(z) = \frac{1}{2}[1 + \operatorname{erf}(\frac{z}{\sqrt{2}})]$

Quiz

Which expression gives a $(1-\alpha)$ confidence interval for the variance $\sigma^{2}$ of a normal population?
1 of 16
Key Concepts
Estimation and Inference
Maximum likelihood estimation
Unbiased estimator
Confidence interval
Bayesian conjugate prior
Statistical Distributions
Student's t-distribution
Chi-square distribution
Central limit theorem
Error function
Normality Tests
Shapiro–Wilk test
Anderson–Darling test
Box–Muller transform
Marsaglia polar method