RemNote Community
Community

Study Guide

📖 Core Concepts Statistics – the science of collecting, organizing, analyzing, interpreting, and presenting data. Population vs. Sample – a population is the full set of interest; a sample is a subset used to infer about the population. Measurement Scales – Nominal: categories only (e.g., gender). Ordinal: ordered categories, unequal gaps (e.g., Likert rating). Interval: equal gaps, arbitrary zero (e.g., Celsius). Ratio: equal gaps, true zero (e.g., weight). Descriptive Statistics – summarise data: mean (center), standard deviation (spread), variance, range, IQR, percentiles. Inferential Statistics – draw conclusions about a population from a sample using estimators, hypothesis tests, and confidence intervals. Estimator Properties – unbiased, consistent, efficient; the UMVUE has the smallest variance among unbiased estimators. Hypothesis Testing – \(H0\) (no effect) vs. \(H1\) (effect); decide by test statistic, critical region, \(p\)-value, and significance level \(\alpha\). Errors & Power – Type I (false positive), Type II (false negative); power = \(1-\beta\). Confidence vs. Credible Intervals – frequentist CI covers the true parameter in repeated sampling; Bayesian credible interval gives a direct probability statement about the parameter. Experimental vs. Observational – experiments manipulate variables; observational studies only record them. Sampling Designs – census (full count), simple random, stratified, purposive (quota); representative sampling is essential for valid inference. --- 📌 Must Remember Mean \(\bar{x} = \frac{1}{n}\sum{i=1}^{n} xi\). Standard deviation \(s = \sqrt{\frac{1}{n-1}\sum (xi-\bar{x})^2}\). Unbiased estimator → \(E[\hat\theta]=\theta\). \(p\)-value = probability of ≥ observed statistic under \(H0\). Significance level \(\alpha\) = largest \(p\) that still leads to rejection. Type I error rate = \(\alpha\); Type II error rate = \(\beta\). Power = \(1-\beta\). 95 % CI = \(\hat\theta \pm z{0.975}\, SE(\hat\theta)\) (for large‑sample normal). ANOVA tests equality of means across >2 groups; t‑test compares two means. Chi‑squared test evaluates independence of categorical variables or goodness‑of‑fit. Pearson correlation measures linear association; Spearman measures monotonic rank association. --- 🔄 Key Processes Designing a Study Choose experimental (with manipulation) or observational (no manipulation). Define population, decide on census vs. sampling. Select representative sampling method (simple random, stratified). Plan replicates, blocking, and randomized assignment to control confounding. Conducting a Hypothesis Test State \(H0\) and \(H1\). Choose appropriate test (t, ANOVA, chi‑sq, etc.). Compute test statistic. Determine critical value or \(p\)-value. Compare to \(\alpha\); reject or fail to reject \(H0\). Building a Confidence Interval Estimate parameter \(\hat\theta\). Obtain standard error \(SE(\hat\theta)\). Choose confidence level (e.g., 95 %). Compute bounds: \(\hat\theta \pm z{\alpha/2} SE\) (or \(t\) for small samples). Maximum Likelihood Estimation (MLE) Write likelihood \(L(\theta)=\prod f(xi|\theta)\). Take log, differentiate, set derivative = 0, solve for \(\hat\theta\). --- 🔍 Key Comparisons Nominal vs. Ordinal – No order vs. ordered categories (but unequal intervals). Interval vs. Ratio – Interval has arbitrary zero; ratio has true zero and allows ratio statements. Experimental vs. Observational – Manipulation of variables vs. passive data collection; causality inference stronger in experiments. t‑test vs. ANOVA – t compares two means; ANOVA compares three or more means simultaneously. Frequentist CI vs. Bayesian Credible Interval – CI: long‑run coverage; Credible: direct probability given data and prior. --- ⚠️ Common Misunderstandings “p < 0.05 proves a real effect.” – It only indicates evidence against \(H0\) at the chosen \(\alpha\); practical significance may be negligible. “Correlation = Causation.” – Correlation can arise from lurking variables; experiments are needed for causal claims. “A 95 % CI means there’s a 95 % chance the true value lies inside.” – After data are collected, the interval is fixed; the 95 % refers to the procedure’s long‑run success rate. “Sampling bias is harmless if the sample is large.” – Bias is systematic and does not disappear with size; a large biased sample can be more misleading than a small unbiased one. “Standard deviation = standard error.” – SD measures spread of data; SE measures spread of the sampling distribution of a statistic (usually \(SD/\sqrt{n}\)). --- 🧠 Mental Models / Intuition Sampling Distribution – Imagine repeatedly drawing samples; the distribution of a statistic (e.g., sample mean) centers on the true parameter and narrows as \(n\) grows (Law of Large Numbers). Blocking – Think of blocking as “pairing” similar experimental units so that the only remaining variation is due to the treatment. Power Curve – Visualize a curve rising as the true effect size grows; higher \(\alpha\) or larger \(n\) shift the curve left, increasing power. --- 🚩 Exceptions & Edge Cases Non‑normal small samples – Use non‑parametric tests (Mann–Whitney, Wilcoxon) instead of t‑test. Unequal variances – Apply Welch’s t‑test or transform data. One‑sided vs. two‑sided tests – One‑sided tests are justified only when the direction of effect is pre‑specified. Zero‑inflated or heavily skewed data – Median, IQR, or robust estimators may be more informative than the mean/SD. --- 📍 When to Use Which Choose t‑test when comparing means of two groups with roughly normal, equal‑variance data. Choose ANOVA for ≥3 groups; follow with post‑hoc (Tukey) if significant. Choose chi‑squared for categorical independence or goodness‑of‑fit with expected counts ≥5. Choose Mann–Whitney when the outcome is ordinal or the distribution is non‑normal. Use regression (OLS) for continuous outcome with linear relationship and independent errors. Use logistic regression for binary outcomes. Use bootstrap when analytic SEs are unavailable or the sampling distribution is unknown. --- 👀 Patterns to Recognize “Mean ± SD” reported → assume normality; check histograms or Q‑Q plots. Large \(p\) but small effect size → may indicate low power, not lack of importance. Significant ANOVA but non‑significant pairwise tests → possible multiple‑comparison correction issue. Stratified sampling → look for subgroup rows in data; variance often reduced versus simple random. Hawthorne effect → any study where participants know they’re being observed may inflate performance. --- 🗂️ Exam Traps Mistaking “≤ α” for “≥ α” – Remember rejection occurs when \(p \le \alpha\). Confusing standard deviation with standard error – SE = SD/√n; using the wrong one mis‑scales confidence intervals. Applying a parametric test to heavily skewed data – The exam may present a small‑sample, skewed dataset; the correct answer is a non‑parametric alternative. Overlooking blocking/randomization – Questions about experimental design often penalize missing the need for random assignment to control confounding. Assuming a 95 % CI implies 95 % probability – The correct interpretation is about the method’s long‑run coverage, not the single interval. Choosing a one‑tailed test when not justified – Exams frequently include distractors that claim higher power for a one‑tailed test; the correct choice is “only if direction was pre‑specified.” ---
or

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or