Subjects/Math/Statistics and Discrete Math/Probability/Probability distribution

Probability distribution - Core Distribution Families

Understand the distinction between discrete and absolutely continuous distributions, key distribution families and their properties, and how univariate and multivariate distributions relate through marginals and conditionals.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

How does a discrete probability distribution assign probabilities to its set of outcomes?

1 of 12

Summary

Probability Distributions Introduction A probability distribution is a mathematical function that describes how probabilities are assigned to outcomes of a random variable. Understanding distributions is central to statistics because they allow us to model uncertainty, make predictions, and calculate probabilities for real-world phenomena. The key distinction we'll make is between discrete distributions (outcomes that are countable) and absolutely continuous distributions (outcomes spanning a continuum). We'll also explore how distributions behave when describing single variables versus multiple variables jointly. Discrete Probability Distributions What is a Discrete Distribution? A discrete probability distribution assigns probabilities to a countable set of outcomes. "Countable" means the outcomes can be listed out—like the numbers 0, 1, 2, 3,... or the outcomes {heads, tails}. The key property is that all probabilities must sum to one: $$\sum{x} P(X = x) = 1$$ This ensures that one of the outcomes will definitely occur. Why this matters: Discrete distributions are used whenever outcomes are naturally finite or can be counted individually. Examples include the number of customer complaints received (0, 1, 2,...), the result of a dice roll (1, 2, 3, 4, 5, 6), or pass/fail outcomes. Common Discrete Distribution Families Several discrete distributions appear repeatedly across applications. Here are the most important ones you should know: Binomial distribution: Models the number of successes in a fixed number of independent trials, each with the same probability of success. Example: the number of heads in 10 coin flips. Poisson distribution: Models the number of rare events occurring in a fixed interval of time or space. Example: the number of emails arriving in an hour. Geometric distribution: Models the number of trials needed to get the first success. Example: how many times you must roll a die to see a 6. Negative binomial distribution: Generalizes the geometric distribution—models the number of trials needed to achieve a fixed number of successes. Example: how many shots a basketball player takes to make 10 baskets. Hypergeometric distribution: Models the number of successes when sampling without replacement from a finite population. Example: drawing 5 cards from a standard deck and counting how many are hearts. Categorical distribution: Assigns probabilities to a set of distinct categories. Example: the outcome of a single roll of a die, where each face has probability 1/6. The Empirical Distribution When you collect sample data, you create an empirical (or sample) distribution, which is a discrete distribution that places equal probability on each observed data point. If you observe $n$ data points, each point gets probability $1/n$. Why this matters: The empirical distribution represents what your data actually shows. It's the foundation for many statistical methods. For example, if you measure the heights of 100 students and observe each height once, the empirical distribution puts 1% probability on each observed height. This bridges the gap between your sample and the true underlying distribution. Absolutely Continuous Probability Distributions What is an Absolutely Continuous Distribution? An absolutely continuous distribution is fundamentally different from a discrete distribution—instead of assigning probability to individual points, it assigns probability to intervals. It is characterized by a probability density function (PDF), denoted $fX(x)$, with the property that: $$P(a \le X \le b) = \int{a}^{b} fX(x)\,dx$$ This says: the probability that $X$ falls in an interval equals the area under the PDF curve over that interval. Important properties of a PDF: $fX(x) \ge 0$ for all $x$ (probability density is non-negative) $\int{-\infty}^{\infty} fX(x)\,dx = 1$ (total area under the curve is one) The image shows a PDF (left) with the area under the curve between two points labeled, and the corresponding CDF (right). Zero Probability at a Single Point Here's a property that often surprises students: for an absolutely continuous random variable, $P(X = x) = 0$ for any specific value $x$. Why? Because a single point has no width, so the integral over it is zero: $$P(X = x) = \int{x}^{x} fX(t)\,dt = 0$$ This changes how we interpret probability. With discrete distributions, $P(X = 5)$ might be 0.2. With continuous distributions, we never ask "what's the probability of exactly 5?" Instead, we ask "what's the probability of being between 4.9 and 5.1?" This makes the distinction practical: with continuous data, the probability of any single precise value is zero because there are infinitely many possible values. The Relationship Between PDF and CDF The cumulative distribution function (CDF), denoted $FX(x)$, tells us the probability that the random variable is less than or equal to $x$: $$FX(x) = P(X \le x) = \int{-\infty}^{x} fX(t)\,dt$$ In other words, the CDF is the integral of the PDF. Conversely, the PDF is the derivative of the CDF: $$fX(x) = \frac{d}{dx}FX(x)$$ Why this relationship matters: The CDF is cumulative (it grows monotonically from 0 to 1), making it useful for calculating tail probabilities. The PDF directly shows where probability density is concentrated. They're two ways of describing the same distribution. The image above illustrates this: the left side shows a PDF with a shaded area between two points, and the right side shows the corresponding CDF with a marked step showing the same probability. Example: The Normal Distribution The normal (Gaussian) distribution is the most important continuous distribution in statistics. Its PDF is: $$fX(x) = \frac{1}{\sigma\sqrt{2\pi}}\,e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$ where $\mu$ is the mean and $\sigma$ is the standard deviation. Key features: Bell-shaped and symmetric around $\mu$ Characterized by two parameters: mean ($\mu$) and standard deviation ($\sigma$) Widely applicable because of the Central Limit Theorem (which states that averages of many independent measurements tend to be normally distributed) The normal distribution is absolutely continuous—probability is spread over all real numbers, with zero probability at any single point. <extrainfo> Technical Note: "Continuous" vs. "Absolutely Continuous" In rigorous probability theory, "continuous distribution" and "absolutely continuous distribution" are technically different concepts. All absolutely continuous distributions are continuous, but the reverse is not true. A singular continuous distribution has a CDF that is continuous and strictly increasing (so it assigns zero probability to any single point), but it has no probability density function. The classic example is the Cantor distribution, which is a pathological case that assigns all its probability to a set with measure zero. For practical purposes in an introductory statistics course, you can treat "continuous" and "absolutely continuous" as the same thing. The distinction matters in advanced probability theory but rarely comes up in applications. </extrainfo> Univariate Versus Multivariate Distributions Univariate Distributions A univariate distribution describes the probability behavior of a single random variable with outcomes in one dimension. All the distributions we've discussed so far—binomial, normal, Poisson—are univariate. They tell us how one quantity is distributed. Examples: Height of a randomly selected person Number of defective items in a batch of 100 Daily temperature Multivariate (Joint) Distributions A multivariate distribution (also called a joint distribution) describes the probability behavior of two or more random variables together. Instead of asking "how is $X$ distributed?", we ask "how are $X$ and $Y$ distributed jointly?" For continuous random variables, a multivariate distribution is characterized by a joint PDF $f{X,Y}(x,y)$ where: $$P(a \le X \le b, \, c \le Y \le d) = \int{a}^{b} \int{c}^{d} f{X,Y}(x,y)\,dx\,dy$$ This is a double integral—you integrate over the rectangular region defined by the constraints on both variables. Why this matters: In reality, variables are usually related. Your height and weight are correlated. Stock prices and interest rates move together. Multivariate distributions allow us to model these relationships. Example: The Multivariate Normal Distribution The multivariate normal distribution is the extension of the normal distribution to multiple variables. For a vector of random variables $\mathbf{X} = (X1, X2, \ldots, Xp)$, it is characterized by: A mean vector $\boldsymbol{\mu}$ (containing the mean of each variable) A covariance matrix $\boldsymbol{\Sigma}$ (containing variances on the diagonal and correlations off the diagonal) The joint PDF is: $$f{\mathbf{X}}(\mathbf{x}) = \frac{1}{\sqrt{(2\pi)^p |\boldsymbol{\Sigma}|}} \exp\left(-\frac{1}{2}(\mathbf{x} - \boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})\right)$$ This is a natural generalization of the univariate normal distribution and is widely used in multivariate statistics. Marginal Distributions A marginal distribution is obtained from a multivariate distribution by "collapsing" over the unwanted variables. If you have a joint distribution of $(X, Y)$ and you want just the distribution of $X$, you compute: For discrete variables: $$P(X = x) = \sum{y} P(X = x, Y = y)$$ For continuous variables: $$fX(x) = \int{-\infty}^{\infty} f{X,Y}(x, y)\,dy$$ Intuitive interpretation: The marginal distribution of $X$ tells you how $X$ is distributed if you ignore information about $Y$. If you know the joint distribution of height and weight, the marginal distribution of height is found by integrating over all possible weights. Conditional Distributions A conditional distribution specifies the probability distribution of some variables given that you know the values of other variables. For continuous variables: $$f{X|Y}(x|y) = \frac{f{X,Y}(x,y)}{fY(y)}$$ This reads as: "the PDF of $X$ given $Y = y$ equals the joint PDF divided by the marginal PDF of $Y$." Intuitive interpretation: Conditional distributions capture how knowing one piece of information changes our beliefs about another. If you learn that someone's weight is 180 pounds, the conditional distribution of their height is different from (and more concentrated than) the marginal distribution of height alone. For discrete variables: $$P(X = x | Y = y) = \frac{P(X = x, Y = y)}{P(Y = y)}$$ Why this matters: Conditional distributions are essential for understanding relationships between variables and are foundational to regression, Bayesian inference, and causal reasoning.

Flashcards

How does a discrete probability distribution assign probabilities to its set of outcomes?

It assigns probabilities to a countable set of outcomes such that the probabilities sum to one.

How does the empirical distribution of a sample assign probability to observed data points?

It places equal probability on each observed data point.

How is the probability of a random variable $X$ falling in the interval $[a, b]$ calculated using its probability density function $fX(x)$?

$P(a \le X \le b) = \int{a}^{b} fX(x) \, dx$

What is the probability $P(X = x)$ for any specific value $x$ in an absolutely continuous distribution?

Zero ($0$)

What is the mathematical relationship between the cumulative distribution function $FX(x)$ and the probability density function $fX(t)$?

$FX(x) = \int{-\infty}^{x} fX(t) \, dt$

Is every continuous distribution necessarily an absolutely continuous distribution?

No; while all absolutely continuous distributions are continuous, a continuous distribution (such as the singular Cantor distribution) may lack a density.

What is the probability density function $fX(x)$ for a normal (Gaussian) distribution?

$fX(x) = \frac{1}{\sigma\sqrt{2\pi}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ (where $\mu$ is the mean and $\sigma$ is the standard deviation)

What does a univariate distribution describe?

The probabilities of a single random variable taking values in a one-dimensional sample space.

What is the primary function of a multivariate (joint) distribution?

To assign probabilities to vectors of two or more random variables, describing their joint behavior.

What parameters are used to specify the joint density of a multivariate normal distribution?

A mean vector and a covariance matrix.

How are marginal distributions derived from a multivariate distribution?

By integrating or summing over the unwanted variables.

What information is provided by a conditional distribution within a joint distribution?

The probability law of some variables given fixed values of other variables.

Quiz

Probability distribution - Core Distribution Families Quiz Question 1: What is the probability that an absolutely continuous random variable equals a specific value $x$?

0 (correct)
It depends on the density at $x$.
It equals the value of the PDF at $x$.
It is $1$ divided by the length of the support.

Probability distribution - Core Distribution Families Quiz Question 2: How are marginal distributions obtained from a multivariate distribution?

By integrating (or summing) over the unwanted variables. (correct)
By differentiating the joint cumulative distribution function.
By taking the product of the individual distributions.
By conditioning on the values of the other variables.

Probability distribution - Core Distribution Families Quiz Question 3: How can the cumulative distribution function $F_X(x)$ be obtained from the probability density function $f_X(x)$?

By integrating $f_X(t)$ from $-\infty$ to $x$. (correct)
By differentiating $f_X(x)$ with respect to $x$.
By evaluating $f_X(x)$ at the point $x$.
By summing $f_X(t)$ over all $t\le x$.

Probability distribution - Core Distribution Families Quiz Question 4: Which of the following distributions is NOT part of the common discrete families?

Normal distribution (correct)
Binomial distribution
Poisson distribution
Geometric distribution

Probability distribution - Core Distribution Families Quiz Question 5: In a discrete probability distribution, what must the sum of the probabilities of all possible outcomes equal?

1 (correct)
0
Infinity
Any positive number

Probability distribution - Core Distribution Families Quiz Question 6: For an absolutely continuous random variable X with density $f_X(x)$, how is $P(a\le X\le b)$ computed?

$\displaystyle\int_{a}^{b} f_X(x)\,dx$ (correct)
$f_X(b)-f_X(a)$
$\displaystyle\int_{a}^{b} F_X(x)\,dx$
$(b-a)\,f_X\!\bigl(\tfrac{a+b}{2}\bigr)$

Probability distribution - Core Distribution Families Quiz Question 7: Which two components uniquely determine a multivariate normal distribution?

Mean vector and covariance matrix (correct)
Mean vector and correlation coefficient
Variance vector and precision matrix
Eigenvalues and eigenvectors

Probability distribution - Core Distribution Families Quiz Question 8: What is the defining characteristic of the empirical distribution of a sample?

It assigns equal probability to each observed data point. (correct)
It assigns probabilities proportional to the frequency of each value in the population.
It uses a continuous density function fitted to the data.
It places all probability mass on the sample mean.

What is the probability that an absolutely continuous random variable equals a specific value $x$?

1 of 8

Key Concepts

Discrete Distributions

Discrete probability distribution

Binomial distribution

Poisson distribution

Hypergeometric distribution

Empirical distribution

Continuous Distributions

Absolutely continuous distribution

Probability density function (PDF)

Cumulative distribution function (CDF)

Normal distribution

Multivariate normal distribution

Joint and Marginal Distributions

Marginal distribution

Conditional distribution

Definitions

Discrete probability distribution

A probability model that assigns probabilities to a countable set of outcomes whose total sums to one.

Absolutely continuous distribution

A probability distribution that possesses a probability density function whose integral over any interval gives the probability of that interval.

Probability density function (PDF)

A non‑negative function whose integral over a region equals the probability that a continuous random variable falls within that region.

Cumulative distribution function (CDF)

A function giving the probability that a random variable is less than or equal to a particular value, equal to the integral of its PDF.

Normal distribution

Also called the Gaussian distribution; a continuous distribution with bell‑shaped density defined by its mean and variance.

Multivariate normal distribution

The joint distribution of a vector of normally distributed variables characterized by a mean vector and covariance matrix.

Binomial distribution

A discrete distribution describing the number of successes in a fixed number of independent Bernoulli trials.

Poisson distribution

A discrete distribution modeling the count of events occurring in a fixed interval when events happen independently at a constant average rate.

Hypergeometric distribution

A discrete distribution for the number of successes in draws without replacement from a finite population.

Empirical distribution

A discrete distribution that places equal probability on each observed data point in a sample.

Marginal distribution

The probability distribution of a subset of variables obtained by summing or integrating out the other variables from a joint distribution.

Conditional distribution

The probability distribution of some variables given fixed values of other variables within a joint distribution.