Bayes' theorem Study Guide
Study Guide
📖 Core Concepts
Bayes’ theorem: a rule for re‑versing conditional probabilities – it tells you how to update the probability of a cause \(A\) after observing an effect \(B\).
Prior \(P(A)\): your initial degree‑of‑belief (or long‑run frequency) in the hypothesis before seeing data.
Likelihood \(P(B\mid A)\): how probable the new evidence is if the hypothesis is true.
Marginal (evidence) probability \(P(B)\): overall chance of seeing the evidence, obtained by summing/ integrating over all possible hypotheses.
Posterior \(P(A\mid B)\): the updated belief after incorporating the evidence.
Bayes factor / Likelihood ratio \(\displaystyle \frac{P(B\mid A)}{P(B\mid \neg A)}\): quantifies how strongly the evidence favors \(A\) over its complement.
Interpretations
Bayesian (epistemic) – probability = personal belief; priors can be subjective.
Frequentist (objective) – probability = long‑run relative frequency; Bayes’ theorem links two ways of partitioning the same outcome set.
---
📌 Must Remember
General two‑event formula
\[
P(A\mid B)=\frac{P(B\mid A)\,P(A)}{P(B)} .
\]
Mutually exclusive & exhaustive hypotheses
\[
P(Ai\mid B)=\frac{P(B\mid Ai)P(Ai)}{\displaystyle\sum{j=1}^{n}P(B\mid Aj)P(Aj)} .
\]
Two‑hypothesis (complement) form
\[
P(A\mid B)=\frac{P(B\mid A)P(A)}{P(B\mid A)P(A)+P(B\mid \neg A)P(\neg A)} .
\]
Odds form
\[
\text{Odds}(A\mid B)=\text{Odds}(A)\times\frac{P(B\mid A)}{P(B\mid \neg A)} .
\]
Continuous‑variable form
\[
f{X\mid Y}(x\mid y)=\frac{f{Y\mid X}(y\mid x)\,fX(x)}{fY(y)} .
\]
Three‑event (conditional) form
\[
P(A\mid B,C)=\frac{P(B\mid A,C)\,P(A\mid C)}{P(B\mid C)} .
\]
Key terminology: prior, likelihood, evidence (marginal), posterior, Bayes factor, odds.
---
🔄 Key Processes
Identify hypotheses \(Ai\) (must be mutually exclusive & exhaustive if using the sum‑denominator form).
Assign priors \(P(Ai)\) based on background knowledge or prevalence.
Compute likelihoods \(P(B\mid Ai)\) from the model or test characteristics (sensitivity, specificity).
Find the marginal
\[
P(B)=\sumi P(B\mid Ai)P(Ai) \quad\text{(or integrate for continuous)} .
\]
Apply Bayes to obtain posterior \(P(Ai\mid B)\).
(Optional) Convert to odds for sequential updating: multiply current odds by the Bayes factor each time new evidence arrives.
Example (Medical test):
Prior \(P(D)=\) disease prevalence.
Likelihood \(P(T\mid D)=\) sensitivity.
Marginal \(P(T)=P(T\mid D)P(D)+P(T\mid \neg D)P(\neg D)\).
Posterior \(P(D\mid T)=\dfrac{P(T\mid D)P(D)}{P(T)}\).
---
🔍 Key Comparisons
Bayesian vs Frequentist interpretation
Bayesian: probability = belief; priors are allowed.
Frequentist: probability = long‑run frequency; priors are not part of the model.
Two‑hypothesis vs Mutually exclusive forms
Two‑hypothesis: only \(A\) and its complement \(\neg A\).
Mutually exclusive: handles \(n\) competing hypotheses simultaneously (denominator sums over all).
Probability form vs Odds form
Probability: gives a value in \([0,1]\).
Odds: useful for multiplicative updates; odds > 1 means belief > 50 %.
---
⚠️ Common Misunderstandings
“\(P(A\mid B)=P(B\mid A)\)” – they are generally different; you must include the prior and marginal.
Confusing sensitivity with \(P(T)\) – \(P(T)\) (the marginal) also depends on disease prevalence and specificity.
Neglecting the base‑rate (prior) – leads to over‑estimating rare‑disease probabilities (the classic false‑positive paradox).
Using the two‑hypothesis formula without the complement term – omitting \(P(B\mid \neg A)P(\neg A)\) gives a wrong denominator.
Assuming Bayes works when \(P(B)=0\) – the theorem is undefined; you must have observed evidence with non‑zero probability.
---
🧠 Mental Models / Intuition
Evidence as a “scale”: the Bayes factor tells you how many “steps” the evidence moves the odds. A factor > 1 pushes belief toward the hypothesis; < 1 pushes it away.
Posterior = “prior weighted by evidence”: think of the prior as a base weight, then tilt the balance according to how likely the new data are under each hypothesis.
Odds multiplication: each new piece of data multiplies the current odds by its Bayes factor – like compounding interest on belief.
---
🚩 Exceptions & Edge Cases
Continuous variables – replace probabilities with probability densities; the denominator is the marginal density \(fY(y)\).
Zero marginal probability \(P(B)=0\) – Bayes’ theorem cannot be applied; this occurs if the observed evidence is impossible under the model.
Rare events – even tests with high sensitivity/specificity can yield many false positives; the posterior may still be low if the prior (base‑rate) is tiny.
---
📍 When to Use Which
Odds form – best for sequential updating (e.g., multiple independent test results).
Mutually exclusive form – when you have more than two competing hypotheses (e.g., which coin type was drawn).
Two‑hypothesis form – quick calculation when only a hypothesis and its complement matter (e.g., disease vs no disease).
Continuous form – whenever the variables are continuous (e.g., estimating a parameter’s density).
Three‑event form – useful when conditioning on an already‑known background variable \(C\) (e.g., adjusting for demographic subgroup).
---
👀 Patterns to Recognize
Base‑rate + likelihood = posterior – always look for a prior term multiplied by a likelihood term in the numerator.
Denominator as a weighted sum – the marginal \(P(B)\) is the sum (or integral) of each hypothesis’s likelihood × prior.
Bayes factor > 1 ⇒ posterior > prior – evidence that is more likely under the hypothesis pushes the belief up.
Symmetry in odds form – swapping hypothesis and complement swaps the numerator and denominator of the Bayes factor.
---
🗂️ Exam Traps
Distractor: \(P(A\mid B)=\frac{P(A)P(B)}{P(A\mid B)}\) – incorrect algebra; the correct denominator is \(P(B)\).
Choosing sensitivity instead of \(P(T)\) – forgetting to include the false‑positive rate (1 − specificity) and prevalence in the marginal.
Omitting the complement term in the two‑hypothesis denominator – leads to a value > 1.
Assuming priors are 0.5 when the problem gives a different prevalence.
Treating odds as probabilities without converting: odds of 3 correspond to a probability of \(3/(1+3)=0.75\).
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or