Subjects/Math/Statistics and Discrete Math/Probability/Bayes' theorem

Bayes' theorem Study Guide

Study Guide

📖 Core Concepts Bayes’ theorem: a rule for re‑versing conditional probabilities – it tells you how to update the probability of a cause \(A\) after observing an effect \(B\). Prior \(P(A)\): your initial degree‑of‑belief (or long‑run frequency) in the hypothesis before seeing data. Likelihood \(P(B\mid A)\): how probable the new evidence is if the hypothesis is true. Marginal (evidence) probability \(P(B)\): overall chance of seeing the evidence, obtained by summing/ integrating over all possible hypotheses. Posterior \(P(A\mid B)\): the updated belief after incorporating the evidence. Bayes factor / Likelihood ratio \(\displaystyle \frac{P(B\mid A)}{P(B\mid \neg A)}\): quantifies how strongly the evidence favors \(A\) over its complement. Interpretations Bayesian (epistemic) – probability = personal belief; priors can be subjective. Frequentist (objective) – probability = long‑run relative frequency; Bayes’ theorem links two ways of partitioning the same outcome set. --- 📌 Must Remember General two‑event formula \[ P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)} . \] Mutually exclusive & exhaustive hypotheses \[ P(Ai\mid B)=\frac{P(B\mid Ai)P(Ai)}{\displaystyle\sum{j=1}^{n}P(B\mid Aj)P(Aj)} . \] Two‑hypothesis (complement) form \[ P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B\mid A)P(A)+P(B\mid \neg A)P(\neg A)} . \] Odds form \[ \text{Odds}(A\mid B)=\text{Odds}(A)\times\frac{P(B\mid A)}{P(B\mid \neg A)} . \] Continuous‑variable form \[ f{X\mid Y}(x\mid y)=\frac{f{Y\mid X}(y\mid x)\,fX(x)}{fY(y)} . \] Three‑event (conditional) form \[ P(A\mid B,C)=\frac{P(B\mid A,C)\,P(A\mid C)}{P(B\mid C)} . \] Key terminology: prior, likelihood, evidence (marginal), posterior, Bayes factor, odds. --- 🔄 Key Processes Identify hypotheses \(Ai\) (must be mutually exclusive & exhaustive if using the sum‑denominator form). Assign priors \(P(Ai)\) based on background knowledge or prevalence. Compute likelihoods \(P(B\mid Ai)\) from the model or test characteristics (sensitivity, specificity). Find the marginal \[ P(B)=\sumi P(B\mid Ai)P(Ai) \quad\text{(or integrate for continuous)} . \] Apply Bayes to obtain posterior \(P(Ai\mid B)\). (Optional) Convert to odds for sequential updating: multiply current odds by the Bayes factor each time new evidence arrives. Example (Medical test): Prior \(P(D)=\) disease prevalence. Likelihood \(P(T\mid D)=\) sensitivity. Marginal \(P(T)=P(T\mid D)P(D)+P(T\mid \neg D)P(\neg D)\). Posterior \(P(D\mid T)=\dfrac{P(T\mid D)P(D)}{P(T)}\). --- 🔍 Key Comparisons Bayesian vs Frequentist interpretation Bayesian: probability = belief; priors are allowed. Frequentist: probability = long‑run frequency; priors are not part of the model. Two‑hypothesis vs Mutually exclusive forms Two‑hypothesis: only \(A\) and its complement \(\neg A\). Mutually exclusive: handles \(n\) competing hypotheses simultaneously (denominator sums over all). Probability form vs Odds form Probability: gives a value in \([0,1]\). Odds: useful for multiplicative updates; odds > 1 means belief > 50 %. --- ⚠️ Common Misunderstandings “\(P(A\mid B)=P(B\mid A)\)” – they are generally different; you must include the prior and marginal. Confusing sensitivity with \(P(T)\) – \(P(T)\) (the marginal) also depends on disease prevalence and specificity. Neglecting the base‑rate (prior) – leads to over‑estimating rare‑disease probabilities (the classic false‑positive paradox). Using the two‑hypothesis formula without the complement term – omitting \(P(B\mid \neg A)P(\neg A)\) gives a wrong denominator. Assuming Bayes works when \(P(B)=0\) – the theorem is undefined; you must have observed evidence with non‑zero probability. --- 🧠 Mental Models / Intuition Evidence as a “scale”: the Bayes factor tells you how many “steps” the evidence moves the odds. A factor > 1 pushes belief toward the hypothesis; < 1 pushes it away. Posterior = “prior weighted by evidence”: think of the prior as a base weight, then tilt the balance according to how likely the new data are under each hypothesis. Odds multiplication: each new piece of data multiplies the current odds by its Bayes factor – like compounding interest on belief. --- 🚩 Exceptions & Edge Cases Continuous variables – replace probabilities with probability densities; the denominator is the marginal density \(fY(y)\). Zero marginal probability \(P(B)=0\) – Bayes’ theorem cannot be applied; this occurs if the observed evidence is impossible under the model. Rare events – even tests with high sensitivity/specificity can yield many false positives; the posterior may still be low if the prior (base‑rate) is tiny. --- 📍 When to Use Which Odds form – best for sequential updating (e.g., multiple independent test results). Mutually exclusive form – when you have more than two competing hypotheses (e.g., which coin type was drawn). Two‑hypothesis form – quick calculation when only a hypothesis and its complement matter (e.g., disease vs no disease). Continuous form – whenever the variables are continuous (e.g., estimating a parameter’s density). Three‑event form – useful when conditioning on an already‑known background variable \(C\) (e.g., adjusting for demographic subgroup). --- 👀 Patterns to Recognize Base‑rate + likelihood = posterior – always look for a prior term multiplied by a likelihood term in the numerator. Denominator as a weighted sum – the marginal \(P(B)\) is the sum (or integral) of each hypothesis’s likelihood × prior. Bayes factor > 1 ⇒ posterior > prior – evidence that is more likely under the hypothesis pushes the belief up. Symmetry in odds form – swapping hypothesis and complement swaps the numerator and denominator of the Bayes factor. --- 🗂️ Exam Traps Distractor: \(P(A\mid B)=\frac{P(A)P(B)}{P(A\mid B)}\) – incorrect algebra; the correct denominator is \(P(B)\). Choosing sensitivity instead of \(P(T)\) – forgetting to include the false‑positive rate (1 − specificity) and prevalence in the marginal. Omitting the complement term in the two‑hypothesis denominator – leads to a value > 1. Assuming priors are 0.5 when the problem gives a different prevalence. Treating odds as probabilities without converting: odds of 3 correspond to a probability of \(3/(1+3)=0.75\). ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or