Subjects/Math/Statistics and Discrete Math/Statistics/Hypothesis test

Hypothesis test Study Guide

Study Guide

📖 Core Concepts Statistical hypothesis test – a procedure that uses sample data to decide whether there is enough evidence to reject a stated null hypothesis (\(H0\)). Null hypothesis (\(H0\)) – the default claim (often “no effect” or a specific parameter value). Alternative hypothesis (\(H1\)) – the claim researchers hope to support (any effect different from \(H0\)). Test statistic (\(T\)) – a single number calculated from the data; its sampling distribution under \(H0\) is known (e.g., Student’s t, normal). p‑value – probability, assuming \(H0\) true, of obtaining a test‑statistic at least as extreme as the observed value. Significance level (\(\alpha\)) – pre‑chosen maximum tolerable Type I error rate (commonly 0.05). Decision rule – reject \(H0\) if the observed statistic falls in the critical (rejection) region (or equivalently if p ≤ α). Error types: Type I error: falsely rejecting a true \(H0\) (probability = \(\alpha\)). Type II error: failing to reject a false \(H0\) (probability = \(\beta\)). Power – \(1-\beta\); the chance of correctly rejecting a false \(H0\). Fisher vs. Neyman–Pearson – Fisher emphasized p‑values as evidence, no explicit alternative; Neyman–Pearson added \(H1\), fixed \(\alpha\) and \(\beta\), and a decision‑theoretic framework. NHST – the hybrid practice used today (Fisher’s p‑value + Neyman–Pearson error rates). 📌 Must Remember p‑value ≠ probability that \(H0\) is true. Rejecting \(H0\) does not prove \(H1\) is true; it only shows data are unlikely under \(H0\). Not rejecting \(H0\) does not prove \(H0\) true; it indicates insufficient evidence. \(\Pr(p \le \alpha \mid H0) \le \alpha\). One‑sided test is appropriate only when theory predicts the direction of an effect. Multiple unadjusted tests inflate the overall Type I error rate (family‑wise error). As n → ∞, even trivial differences can become statistically significant (paradox of large samples). 🔄 Key Processes Formulate hypotheses – write \(H0\) (often “no difference”) and \(H1\) (the effect of interest). Choose test & statistic – e.g., two‑sample t test → statistic \(t\). Derive null distribution – know the sampling distribution of \(T\) under \(H0\). Set \(\alpha\) – decide acceptable Type I error (0.05, 0.01, etc.). Compute observed statistic – calculate \(t{\text{obs}}\) from data. Find p‑value or critical value – p‑value: \(p = P(T \ge t{\text{obs}} \mid H0)\). Critical value: from \(\alpha\) and null distribution. Apply decision rule – reject \(H0\) if \(p \le \alpha\) or \(t{\text{obs}}\) lies in rejection region. Interpret – state result in terms of evidence, not truth of hypotheses. 🔍 Key Comparisons Fisher significance testing vs. Neyman–Pearson decision theory Fisher: no explicit \(H1\), no Type II error, p‑value is a continuous measure of evidence. Neyman–Pearson: includes \(H1\), defines \(\alpha\) and \(\beta\), uses likelihood‑ratio test for optimality. One‑sided vs. Two‑sided tests One‑sided: critical region only on one tail; used when direction is pre‑specified. Two‑sided: critical regions on both tails; default when direction is unknown. Exact test vs. Approximate (asymptotic) test Exact: computes true null distribution (e.g., Fisher’s exact test). Approximate: relies on large‑sample theory (e.g., normal approximation). ⚠️ Common Misunderstandings “A p‑value of .03 means there is a 3 % chance the null hypothesis is true.” – false; p‑value is conditional on \(H0\) being true. “If the result is not significant, the effect does not exist.” – false; could be low power. “Statistical significance equals practical importance.” – false; effect size and confidence intervals are needed. “The test’s α automatically controls the overall error when many tests are run.” – false; need Bonferroni, Holm, FDR, etc. 🧠 Mental Models / Intuition Evidence as a weight: Think of the p‑value as the “weight” of evidence against \(H0\); the smaller the weight, the stronger the push to reject. Error trade‑off: Raising \(\alpha\) makes it easier to reject (more false positives), but reduces \(\beta\) (fewer false negatives). Visualize a seesaw between Type I and Type II errors. Critical region as a “danger zone”: If the test statistic lands in the danger zone, we conclude the data are too unlikely under \(H0\) to stay there. 🚩 Exceptions & Edge Cases Composite null hypotheses – the null does not specify all parameters; the test’s size is the worst‑case Type I error over all null parameter values. Conservative tests – actual \(\alpha\) is smaller than nominal; reduces false positives but also power. Uniformly most powerful (UMP) tests – exist only for certain families of distributions (e.g., exponential family with monotone likelihood ratio). Bootstrap hypothesis testing – useful when parametric null distribution is unknown; relies on resampling under the null. 📍 When to Use Which Student’s t vs. normal – use t when population variance is unknown and sample size is small; normal when variance known or n large. Exact test vs. asymptotic – choose exact for small samples or discrete data (e.g., Fisher’s exact test). One‑sided test – only when theory a priori predicts direction; otherwise default to two‑sided. Bootstrap test – when assumptions (normality, equal variances) are questionable or sample size is moderate. Likelihood‑ratio test (Neyman–Pearson) – optimal for simple vs. simple hypothesis comparison; extend to composite via generalized LR. 👀 Patterns to Recognize p‑value tiny + large sample → may indicate a statistically significant but practically trivial effect. Non‑significant result + small n → suspect low power; consider a power analysis. Multiple related outcomes tested together → look for a pattern of inflated Type I error; expect a correction method in the question. Reported “trend” (p ≈ 0.07) – often a hint that the test is under‑powered or that the author is stretching significance. 🗂️ Exam Traps Distractor: “p = 0.04 means 4 % chance H₀ is true.” – wrong interpretation of p‑value. Distractor: “Failing to reject H₀ proves there is no effect.” – confuses lack of evidence with evidence of no effect. Distractor: “A one‑tailed test is always more powerful than a two‑tailed test.” – only true when the direction is truly known before looking at the data. Distractor: “If α = 0.05, the probability of a Type I error is always 5 % regardless of the test.” – only holds when the test’s size equals the nominal α (conservative tests may be lower). Distractor: “Multiple testing does not affect the α level if each test uses p < 0.05.” – ignores family‑wise error inflation. --- Keep this guide handy; the bullet format makes it easy to scan quickly before the exam.

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or