Subjects/Math/Statistics and Discrete Math/Statistics/Statistical inference

Statistical inference Study Guide

Study Guide

📖 Core Concepts Statistical inference – using a sample to draw conclusions about an (unobserved) population distribution. Descriptive statistics – summarizes only the data you have; makes no population claim. Model – a mathematical description of how data are generated (e.g., normal with unknown µ, σ²). Point estimate – single‑value best guess of a population parameter (e.g., sample mean $\bar{x}$). Interval estimate – range likely to contain the true parameter; can be a confidence interval (frequentist) or credible interval (Bayesian). Null hypothesis (H₀) – a specific statement about a parameter; rejection means data are inconsistent with H₀. Randomization distribution – sampling distribution obtained by enumerating all possible random assignments under H₀. Central Limit Theorem (CLT) – for large $n$, the sampling distribution of $\bar{X}$ ≈ normal, regardless of the original shape (provided tails aren’t extreme). --- 📌 Must Remember Parametric vs. non‑parametric: parametric → finite‑dimensional parameter vector; non‑parametric → minimal shape assumptions. CLT rule of thumb: $n \ge 10$ often enough for a normal approximation of means. Confidence level: a $95\%$ CI means 95 % of such intervals (over repeated samples) will cover the true parameter. Credible interval: $95\%$ of the posterior probability lies inside; interpretation is subjective belief. AIC: $\text{AIC}=2k-2\ln(L{\max})$; lower AIC → preferred model (balances fit vs. complexity). Maximum Likelihood Estimate (MLE): $\hat{\theta}{\text{MLE}} = \arg\max{\theta} L(\theta\mid x)$. p‑value: probability, under H₀, of obtaining a statistic at least as extreme as observed. --- 🔄 Key Processes Statistical Inference Workflow Choose a model that reflects the data‑generating mechanism. Estimate parameters (point estimate, MLE, Bayesian posterior). Quantify uncertainty (CI, credible interval, bootstrap). Test hypotheses or make predictions. Constructing a Frequentist CI (e.g., for mean) Compute sample mean $\bar{x}$ and standard error $SE = s/\sqrt{n}$. Choose confidence level → critical value $z{\alpha/2}$ (or $t{df,\alpha/2}$). CI = $\displaystyle \bar{x} \pm z{\alpha/2}\,SE$. Bayesian Updating Prior $p(\theta)$ → likelihood $L(\theta\mid x)$ → posterior $p(\theta\mid x) \propto L(\theta\mid x)p(\theta)$. Summarize posterior (mean, median, credible interval). Randomization Test Compute observed statistic $T{\text{obs}}$. Generate all (or many) permutations of treatment labels consistent with the randomization scheme. Calculate $T$ for each permutation → randomization distribution. p‑value = proportion of permuted $|T| \ge |T{\text{obs}}|$. --- 🔍 Key Comparisons Confidence interval vs. Credible interval CI: long‑run frequency property; does not assign probability to the parameter. Credible: posterior probability; directly answers “what is the probability the parameter lies in this range?” Parametric vs. Non‑parametric models Parametric: assumes specific distribution (e.g., Normal), efficient when correct. Non‑parametric: few shape assumptions; more robust but often less powerful. Frequentist vs. Bayesian inference Frequentist: probability = long‑run frequency; inference based on sampling distributions, p‑values, CIs. Bayesian: probability = degree of belief; updates prior with data → posterior, uses credible intervals, Bayes factors. MLE vs. Method‑of‑moments (though not explicit, MLE is highlighted) MLE: maximizes likelihood; asymptotically efficient under correct model. Method‑of‑moments: matches sample moments; simpler but generally less efficient. --- ⚠️ Common Misunderstandings “95 % CI means there is a 95 % chance the true parameter is inside.” Incorrect for frequentist CIs; the interval either contains the parameter or not. “If the CLT holds, the original data must be normal.” CLT works for any distribution with finite variance; normality of data is not required. “A low p‑value proves the null hypothesis is false.” It only indicates data are unlikely under H₀; it does not prove H₀ false. “AIC can be compared across completely different data sets.” AIC values are only comparable within the same data set. --- 🧠 Mental Models / Intuition Sampling distribution as a “cloud”: imagine repeatedly drawing samples; the cloud of point estimates shrinks as $n$ grows (variance ∝ $1/n$). Likelihood surface: think of a topographic map where height = likelihood; the highest point is the MLE. Randomization as “fair dice”: under H₀, every possible assignment is equally likely—your test statistic’s distribution is just the dice roll outcomes. --- 🚩 Exceptions & Edge Cases Heavy‑tailed populations: CLT may converge very slowly; $n\ge10$ may be insufficient. Small sample, unknown variance: use $t$‑distribution, not normal. Model misspecification: Even with a large $n$, biased estimates persist if the assumed model (e.g., normal) is wrong. AIC with non‑nested models: AIC still works, but differences <2 are considered negligible. --- 📍 When to Use Which Use a parametric model when you have strong theoretical reason (or prior data) to assume a specific distribution; gains efficiency. Switch to non‑parametric if diagnostics (e.g., Q‑Q plot) show severe departure from assumed form. Choose a confidence interval for regulatory or classical reporting; choose a credible interval when you need a probability statement about the parameter. Apply randomization tests when the randomization scheme is known and model assumptions are doubtful. Use AIC for model selection among candidates fitted to the same data set; prefer lower AIC unless theory dictates otherwise. --- 👀 Patterns to Recognize “n ≥ 30” or “n ≥ 10” appearing with normal approximations → check CLT applicability. “p‑value < α” paired with “reject H₀” → verify the direction of the test (one‑ vs two‑tailed). Likelihood ratio > 1 → model 1 fits better than model 2; often leads to chi‑square test. AIC difference < 2 → models are essentially equivalent; consider parsimony or subject‑matter reasoning. --- 🗂️ Exam Traps Confusing CI and credible interval – exam may present a 95 % interval and ask for its interpretation; answer with the frequentist coverage property, not a probability statement. Using $z$ critical value for small $n$ – if $n<30$ and σ unknown, $t{df}$ is required; choosing $z$ yields overly narrow CIs. Assuming normality of the sample mean regardless of tail heaviness – heavy‑tailed data may violate CLT approximations; the correct answer often mentions “check tail behavior”. Selecting the model with the lowest AIC without considering over‑fitting – traps may list a model with slightly lower AIC but unrealistic parameters; the right choice balances AIC with plausibility. Interpreting a non‑significant p‑value as “evidence for H₀” – the correct stance is “insufficient evidence to reject H₀”. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or