Subjects/Technology/Data and AI/Machine Learning/Bias–variance tradeoff

Bias–variance tradeoff Study Guide

Study Guide

📖 Core Concepts Bias–Variance Tradeoff – Balances model complexity, prediction accuracy, and generalization. Bias – Systematic error from wrong assumptions; leads to under‑fitting (high bias, low variance). Variance – Sensitivity to small data fluctuations; leads to over‑fitting (low bias, high variance). Irreducible Error (σ²) – Noise inherent in the problem; a lower bound on achievable error. Expected Generalization Error = Bias² + Variance + Irreducible Error. Accuracy vs. Precision Analogy – Accuracy ≈ low bias; Precision ≈ low variance. 📌 Must Remember Decomposition Formula $$\mathbb{E}D\!\big[(\hat f(x)-y)^2\big] = \underbrace{(\mathbb{E}D[\hat f(x)]-f(x))^2}{\text{Bias}^2} + \underbrace{\mathbb{E}D\!\big[(\hat f(x)-\mathbb{E}D[\hat f(x)])^2\big]}{\text{Variance}} + \sigma^2{\text{irreducible}}$$ Model Complexity Trend – ↑Complexity → ↓Bias, ↑Variance. Training‑Set Size – Larger data ↓Variance, does not change bias. Regularization – Adds penalty → ↑Bias, ↓Variance. k‑NN – Larger k → high bias, low variance; k = 1 → low bias, high variance. Decision Trees – Deeper trees → low bias, high variance; pruning → opposite. Ensembles – Bagging → reduces variance (averages strong learners). Boosting → reduces bias (combines weak learners). 🔄 Key Processes Bias‑Variance Decomposition (MSE) Assume true function f(x), observation y = f(x) + ε, with E[ε]=0, Var(ε)=σ². Compute average predictor over training sets: $\bar f(x)=ED[\hat f(x)]$. Separate error: Bias² = $(\bar f(x)-f(x))^2$ Variance = $ED[(\hat f(x)-\bar f(x))^2]$ Add irreducible σ². Controlling Variance via Data Size Gather more labeled examples. Retrain model; observe reduction in variance (prediction becomes more stable across splits). Regularization Tuning Choose penalty (L1/Lasso or L2/Ridge). Increase regularization strength λ → coefficients shrink → model simpler → bias up, variance down. k‑NN Bias‑Variance Adjustment Pick k. If model overfits → increase k (raise bias, lower variance). If model underfits → decrease k (lower bias, raise variance). Tree Pruning Grow full tree. Evaluate validation error. Prune nodes that cause large variance spikes (increase bias, reduce variance). 🔍 Key Comparisons k‑NN vs. Decision Tree Depth k‑NN: Increase k → ↑Bias, ↓Variance. Tree: Increase depth → ↓Bias, ↑Variance. Bagging vs. Boosting Bagging: Targets variance reduction by averaging many strong learners. Boosting: Targets bias reduction by sequentially focusing on errors of weak learners. Regularization vs. Feature Reduction Regularization: Keeps all features, shrinks coefficients → bias ↑, variance ↓. Feature Reduction: Removes features → model simpler → similar bias‑variance shift but may discard useful information. ⚠️ Common Misunderstandings “More data always improves accuracy.” – It mainly reduces variance; bias stays unchanged. “Regularization only helps with over‑fitting.” – It increases bias; if over‑regularized you can under‑fit. “Ensembles always outperform a single model.” – If base learners are already low‑variance, bagging adds little; boosting can over‑fit noisy data. “Bias = error due to under‑fitting only.” – Bias also includes systematic error from model assumptions, not just under‑fitting. 🧠 Mental Models / Intuition Target Shooting Analogy – Bias: Aim point consistently off target (systematic). Variance: Shots scatter widely around aim point (random). Ideal: Aim point on bullseye (low bias) with tight cluster (low variance). Bias‑Variance Slider – Visualize a horizontal slider where moving right adds model complexity: bias bar shrinks, variance bar grows. 🚩 Exceptions & Edge Cases Neural Networks – Adding hidden units usually ↓bias, ↑variance, but modern deep nets can exhibit double‑descent where very large models reduce variance after a certain size. High‑dimensional Small‑sample Regime – Feature reduction may increase variance if important features are removed. Noisy Labels – Irreducible error σ² dominates; further reducing variance yields diminishing returns. 📍 When to Use Which Choose k‑NN when data is low‑dimensional and you need a simple, non‑parametric model; tune k to balance bias/variance. Use Decision Trees for interpretability; apply pruning if validation shows high variance. Apply Bagging (e.g., Random Forest) when base learners have high variance (deep trees). Apply Boosting (e.g., AdaBoost, Gradient Boosting) when base learners have high bias (shallow trees, stumps). Add Regularization when model shows high variance on validation but acceptable bias. Increase Training Set as first remedy for high variance before modifying model architecture. 👀 Patterns to Recognize Training vs. Validation Error Gap – Large gap → high variance; both high → high bias. k‑NN Curve – As k increases, training error rises, validation error first drops then rises (U‑shape). Tree Depth Curve – Similar U‑shape: shallow → underfit (bias), deep → overfit (variance). Ensemble Benefit – Bagging improves performance mainly when single model’s variance is high (e.g., deep trees). 🗂️ Exam Traps Confusing “bias” with “variance” – Remember: bias = systematic offset; variance = spread of predictions. Assuming more regularization always helps – Over‑regularization raises bias; look for validation error minimum, not the smallest variance. Picking “large k” in k‑NN for all problems – Large k can oversmooth and cause high bias, especially when the decision boundary is complex. Believing bagging reduces bias – It primarily reduces variance; a bagged high‑bias model stays high‑bias. Ignoring irreducible error – Even a perfect model can’t beat σ²; questions may ask for the lower bound on error. --- This guide condenses the essential bias‑variance material into bite‑size, exam‑ready nuggets. Review each bullet before the test, and you’ll spot the right trade‑off choices instantly.

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or