RemNote Community
Community

Study Guide

📖 Core Concepts Biostatistics – application of statistical methods to biology, clinical medicine, and public health (design, collection, analysis, interpretation). Population vs. Sample – population = all units of interest; sample = randomly chosen subset used to infer population traits. Hypotheses – H₀: no association/effect; H₁: there is an association/effect. Type I error (α) – false‑positive (rejecting a true H₀). Type II error (β) & Power – false‑negative (failing to reject a false H₀); power = $1-\beta$. p‑value – probability of observing data as extreme as ours if H₀ is true; compare to α. Confidence Interval (CI) – range that likely contains the true population parameter at a chosen confidence level. Correlation (Pearson r/ρ) – linear association strength; –1 to +1. Model selection (AIC, BIC) – trade‑off between goodness‑of‑fit and complexity; lower = better. Multiple‑testing corrections – Bonferroni (familywise error) vs. FDR (expected false discoveries). --- 📌 Must Remember Mean: $\displaystyle \text{Mean} = \frac{\sum xi}{n}$ Standard Error of the Mean (SEM): $ \displaystyle \text{SEM} = \frac{s}{\sqrt{n}}$ (where s = sample SD). Bonferroni α: $\displaystyle \alpha{\text{Bon}} = \frac{\alpha^{}}{m}$ ( m = number of tests). Power: $1-\beta$; increase by larger n, larger effect size, higher α. AIC: $ \displaystyle \text{AIC}=2k - 2\ln(L)$ (k = # parameters, L = likelihood). BIC: $ \displaystyle \text{BIC}=k\ln(n) - 2\ln(L)$. FDR control – e.g., Benjamini‑Hochberg procedure ranks p‑values and sets a cutoff based on $\frac{i}{m}\alpha$. Randomized Controlled Trial (RCT) – gold‑standard for causal inference; randomization eliminates confounding. --- 🔄 Key Processes Formulating a Research Question → concise, novel, scientifically valuable. Defining Hypotheses → write H₀ (no effect) and H₁ (effect). Sampling Define target population. Choose random sampling method. Determine sample size (consider scope, resources, trial type). Experimental Design Selection Simple: completely randomized, randomized block, factorial. Complex: lattice, split‑plot, augmented block, Latin‑square/row‑column. Descriptive Analysis Build frequency tables → absolute & relative frequencies. Create appropriate graphs (line, bar, histogram, scatter, box plot). Compute mean, median, mode, quartiles. Inferential Steps Estimate SE, construct CI, calculate p‑value. Decide significance (p < α). If multiple tests → apply Bonferroni or FDR. Model Building & Selection Fit candidate models. Compute AIC/BIC → pick lowest. Check assumptions → run robustness checks. Validation for High‑Dimensional Data Reduce dimensionality (PCA). Split data → training & independent test set. Compute residual sum of squares and $R^{2}$ on test set. --- 🔍 Key Comparisons H₀ vs. H₁ – H₀ = “no association”; H₁ = “association exists”. Type I vs. Type II Error – α = false positive; β = false negative. Bonferroni vs. FDR – Bonferroni = strict familywise control (more conservative); FDR = allows some false positives for greater power. AIC vs. BIC – AIC penalizes complexity less (better for predictive focus); BIC penalizes more heavily (better for true model selection). Supervised vs. Unsupervised Learning – Supervised uses labeled outcomes (e.g., classification, regression); unsupervised finds structure without labels (e.g., k‑means clustering). --- ⚠️ Common Misunderstandings p‑value = probability H₀ is true – false; it’s the probability of the data given H₀. Statistical significance = practical importance – a tiny p‑value can correspond to a trivial effect size. Correlation implies causation – correlation only measures linear association, not directionality. Higher R² always means a better model – can be inflated by over‑fitting; check AIC/BIC and validation performance. Bonferroni is always the safest correction – overly conservative when many tests, leading to many false negatives. --- 🧠 Mental Models / Intuition Sampling as a “microscope” – a random sample lets you see the whole population’s features without looking at every individual. Confidence interval as a “net” – the net’s width reflects uncertainty; a narrow net (small SE) catches the true parameter more precisely. AIC/BIC as “price tags” – you pay for fit (lower residuals) but also for extra parameters; the cheapest (lowest score) balances both. Multicollinearity = “crowded hallway” – when predictors are tightly packed (highly correlated), it’s hard to see each one’s individual effect. --- 🚩 Exceptions & Edge Cases Small sample sizes → SE estimates unreliable; use exact tests (e.g., Fisher’s exact) or resampling (bootstrapping). Non‑normal data → Pearson correlation may be misleading; consider Spearman rank correlation. Zero‑inflated or highly skewed outcomes → use generalized linear models with appropriate link functions. High‑dimensional data (p ≫ n) → traditional regression fails; employ regularization (LASSO) or dimensionality reduction (PCA). --- 📍 When to Use Which Choose R vs. Python – R for extensive statistical packages & graphics; Python for integration with machine‑learning pipelines & image analysis. Bonferroni vs. FDR – Bonferroni when the cost of any false positive is high (e.g., clinical safety); FDR for exploratory ‘omics’ studies with many tests. Parametric vs. Non‑parametric – parametric (t‑test, Pearson) when assumptions (normality, equal variance) hold; non‑parametric (Mann‑Whitney, Spearman) otherwise. Randomized Block vs. Completely Randomized – block design when known sources of variation (e.g., batch, location) exist; completely randomized when no such structure. PCA vs. Variable Selection – PCA when you need to reduce many correlated predictors while preserving variance; variable selection (stepwise, LASSO) when you need interpretable individual predictors. --- 👀 Patterns to Recognize “Large n, tiny p‑value” → check effect size; may be statistically but not clinically important. “High correlation + non‑linear scatter plot” → Pearson may underestimate association; consider transformation or non‑linear models. “Many tests, many borderline p‑values” → likely need FDR control rather than Bonferroni. “Model with lower AIC but higher BIC” – indicates modest improvement in fit that may not justify added complexity. --- 🗂️ Exam Traps Choosing “significant” because p < 0.05 without looking at confidence interval – CI may include values of no practical relevance. Assuming randomization eliminates all bias – still possible selection bias, measurement error, or protocol deviations. Treating “mean = median” as evidence of normality – can also occur in symmetric but non‑normal distributions; verify with plots or normality tests. Using Bonferroni correction for a small number of tests – over‑conservative, reduces power unnecessarily. Interpreting a high R² from a model fitted on the training set as proof of predictive ability – ignore validation; look at test‑set $R^{2}$ or cross‑validated metrics. ---
or

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or