Subjects/Math/Statistics and Discrete Math/Probability/Probability theory

Probability theory Study Guide

Study Guide

📖 Core Concepts Probability space – Triple \((\Omega,\mathcal{F},P)\): sample space \(\Omega\), σ‑algebra \(\mathcal{F}\) (event space), and probability measure \(P\) with \(P(\Omega)=1\). Event – Any subset of \(\Omega\); the whole space has probability 1, the empty set has probability 0. Random variable (RV) – Function \(X:\Omega\to\mathbb{R}\) that assigns a numeric value to each elementary outcome. Probability mass function (PMF) – For discrete RVs, \(p(x)=P(X=x)\); satisfies \(p(x)\ge0\) and \(\sumx p(x)=1\). Probability density function (PDF) – For continuous RVs, \(f(x)=\frac{dF(x)}{dx}\) when the CDF \(F\) is absolutely continuous; probability on \([a,b]\) is \(\inta^b f(x)\,dx\). Expectation – Mean value \(E[X]=\sumx x\,p(x)\) (discrete) or \(E[X]=\int{-\infty}^{\infty} x\,f(x)\,dx\) (continuous). Variance – \(Var(X)=E[(X-E[X])^2]\); measures spread around the mean. Independence – Events \(A,B\) are independent if \(P(A\cap B)=P(A)P(B)\); RVs are independent when all joint events factor. Law of Large Numbers (LLN) – Sample average converges to the true expectation as the number of i.i.d. trials grows. Central Limit Theorem (CLT) – Sum/average of many i.i.d. RVs with finite variance tends toward a normal distribution, regardless of the original distribution. --- 📌 Must Remember Kolmogorov’s axioms: (1) \(0\le P(A)\le1\); (2) \(P(\Omega)=1\); (3) For disjoint \(Ai\), \(P\big(\cupi Ai\big)=\sumi P(Ai)\). PMF conditions: \(0\le p(x)\le1\), \(\sumx p(x)=1\). PDF condition: \(\int{-\infty}^{\infty} f(x)\,dx = 1\). Key discrete distributions – Bernoulli(\(p\)), Binomial(\(n,p\)), Geometric(\(p\)), Negative Binomial(\(r,p\)), Poisson(\(\lambda\)), Discrete Uniform. Key continuous distributions – Uniform\([a,b]\), Normal(\(\mu,\sigma^2\)), Exponential(\(\lambda\)), Gamma(\(k,\theta\)), Beta(\(\alpha,\beta\)). LLN – \(\displaystyle \frac{1}{n}\sum{i=1}^n Xi \xrightarrow{a.s.} E[X]\). CLT – \(\displaystyle \frac{1}{\sqrt{n}}\sum{i=1}^{n}(Xi-\mu) \xrightarrow{d} N(0,\sigma^2)\). Convergence hierarchy – a.s. ⇒ in probability ⇒ in distribution (weak). --- 🔄 Key Processes Constructing a probability model Identify \(\Omega\). Define σ‑algebra \(\mathcal{F}\). Assign probabilities using axioms (or known PMF/PDF). Finding expectation/variance Choose discrete or continuous formula. Compute sums or integrals; use linearity of expectation. Applying the Binomial model Verify fixed number of independent Bernoulli trials. Use \(P(X=k)=\binom{n}{k}p^{k}(1-p)^{n-k}\). Using the Poisson approximation When \(n\) large, \(p\) small, \(\lambda = np\). Approximate Binomial by \(P(X=k)=e^{-\lambda}\frac{\lambda^{k}}{k!}\). Executing a CLT approximation Compute sample mean \(\bar X\) and standard error \(\sigma/\sqrt{n}\). Approximate \( \bar X \) with \( N(\mu,\sigma^2/n) \). --- 🔍 Key Comparisons Discrete vs. Continuous RV Support: countable vs. uncountable. Function: PMF \(p(x)\) vs. PDF \(f(x)\). Probability of a point: \(p(x)>0\) possible; \(f(x)=0\) for any single point. Bernoulli vs. Binomial Bernoulli: single trial, outcomes {0,1}. Binomial: sum of \(n\) independent Bernoulli trials. Poisson vs. Exponential Poisson: counts events in a time/space interval. Exponential: waiting time between successive events. Almost sure vs. in probability vs. in distribution a.s.: convergence for every outcome except a null set. In probability: probability of large deviation → 0. In distribution: CDFs converge; no guarantee about individual outcomes. --- ⚠️ Common Misunderstandings “Probability = frequency” – Frequency approaches probability only under the LLN and for i.i.d. trials. PDF value = probability – \(f(x)\) is a density; only integrals over intervals give probabilities. Independence ≠ mutual exclusivity – Disjoint events cannot both occur, while independent events can occur together. CLT works for any sample size – Approximation quality improves with larger \(n\); small \(n\) may be poor. “All distributions have a mean” – Heavy‑tailed distributions (e.g., Cauchy) lack finite expectation; CLT requires finite variance. --- 🧠 Mental Models / Intuition Probability space as a “dice box” – \(\Omega\) is every face that could appear; \(\mathcal{F}\) are the subsets you care about; \(P\) tells you how likely each subset is. LLN = “law of averages” – Think of repeatedly tossing a fair coin; the proportion of heads stabilizes around 0.5. CLT = “bell‑curve attractor” – No matter the original shape, piling many independent random “tiles” smooths the outline into a bell. Convergence hierarchy – Like nesting dolls: strongest (a.s.) contains the weaker (in probability) which contains the weakest (in distribution). --- 🚩 Exceptions & Edge Cases Zero‑probability events – An event can have probability 0 yet be possible (e.g., picking an exact real number from \([0,1]\)). Discrete‑continuous mixtures – E.g., a distribution that has a point mass at 0 plus a continuous part; requires measure‑theoretic treatment. Non‑i.i.d. data – LLN and CLT need independence (or specific weak dependence); correlated data violate the standard forms. Infinite variance – CLT does not apply; sums may converge to a stable (non‑normal) law. --- 📍 When to Use Which Use Bernoulli when modeling a single yes/no trial. Use Binomial for a fixed number of independent yes/no trials with constant success probability. Use Poisson when counting rare events over a large number of trials or a continuous interval with known rate \(\lambda\). Use Normal approximation when \(n\ge30\) (rule of thumb) and the underlying distribution has finite variance. Use Exponential for modeling waiting times between independent Poisson events. Use Gamma when modeling the sum of multiple independent exponential waiting times (e.g., total service time of \(k\) stages). --- 👀 Patterns to Recognize “Number of successes in fixed trials” → Binomial. “First success after k‑1 failures” → Geometric. “Counts in a fixed interval with constant rate” → Poisson. “Sum of many i.i.d. variables → bell shape” → Central Limit Theorem applies. “Variance proportional to mean (equal for Poisson)” → Identify Poisson distribution. “PDF constant on an interval → Uniform distribution. --- 🗂️ Exam Traps Choosing Poisson vs. Binomial – Test may give large \(n\) and small \(p\); the correct shortcut is Poisson with \(\lambda = np\). Misreading “PDF at a point” – Selecting the density value as the probability of that exact outcome is wrong; need to integrate over an interval. Confusing independence with mutual exclusivity – An answer stating “if A and B are independent then \(P(A\cap B)=0\)” is a trap. Applying CLT to heavy‑tailed data – If variance is infinite, normal approximation is invalid; look for a stable‑law option. For variance of a Bernoulli – Remember \(Var(X)=p(1-p)\); many distractors replace \(p\) with \(1-p\) or forget the product. “Almost sure convergence = convergence in probability” – True hierarchy is one‑way only; a.s. ⇒ in probability, but not the reverse. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or