Subjects/Math/Statistics and Discrete Math/Statistics/Sampling (statistics)

Sampling (statistics) - Probability Sampling and Methods

Understand the core principles of probability sampling, the major probability sampling techniques, and how they compare to non‑probability methods.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What are the two core requirements for a sampling process to be considered probability sampling?

1 of 31

Summary

Probability Sampling and Sampling Methods Introduction Sampling is fundamental to research when it's impractical or impossible to survey an entire population. The two broad categories of sampling approaches are probability sampling and nonprobability sampling. Probability sampling methods use randomization to ensure that every element in the population has a known, non-zero chance of being selected. This characteristic makes probability sampling more rigorous scientifically, because it enables researchers to estimate population parameters accurately and quantify their uncertainty. This guide covers the core principles of probability sampling and then explores the major sampling techniques used in practice. What Makes Sampling Probabilistic? Probability sampling requires two essential conditions: Known, non-zero selection probabilities: Every element in the population must have an identifiable probability of selection that is greater than zero. This means no element is systematically excluded. Random selection process: The actual selection of elements must incorporate randomization. This prevents bias from researcher judgment or convenience. When these conditions are met, researchers can use statistical theory to make unbiased inferences about the population and calculate how much sampling error to expect. This is the fundamental advantage of probability sampling over nonprobability approaches. Sampling Weights and Representation An important concept in probability sampling is the sampling weight. Each sampled unit has a weight equal to the inverse of its selection probability: $$\text{Weight} = \frac{1}{P(\text{selection})}$$ Sampling weights ensure that each sampled unit represents the correct proportion of the population. For example, if an element had a 1-in-10 chance of being selected (probability = 0.1), its weight would be 10, meaning it represents 10 units in the population. Weights become particularly important in advanced sampling designs like stratified or cluster sampling, where different elements may have different selection probabilities. Advantages and Limitations of Probability Sampling Advantages Unbiased estimation: Known selection probabilities allow estimation of population parameters without systematic bias Calculable sampling error: Researchers can determine the precision of their estimates and construct confidence intervals Scientific rigor: Results can be generalized to the population with quantifiable confidence Limitations Sampling frame requirements: A complete, accurate list of the population is necessary, and such lists are not always available Cost: Probability sampling methods can be more expensive than nonprobability alternatives, especially if the population is geographically dispersed Complexity: Some probability designs (like stratified or cluster sampling) require more planning and statistical expertise than simple approaches Simple Random Sampling Simple random sampling is the most straightforward probability sampling method. In a simple random sample, every possible subset of the given size has an equal probability of being selected. If you're drawing a sample of size $n$ from a population of size $N$, every possible combination of $n$ elements has the same chance of being chosen. How it works: Assign each element a number, then use a random number generator (or random number table) to select your sample. In the illustration, elements 2, 5, 8, and 10 were randomly selected. Advantages Conceptually simple and easy to understand Unbiased estimation with well-understood statistical properties No assumptions required about the population structure Disadvantages Unrepresentative samples can occur by chance: If you randomly happen to over-represent or under-represent certain subgroups, your sample won't accurately reflect the population's composition No guarantees for subgroups: You cannot ensure that important subpopulations are adequately represented Impractical for large populations: Obtaining a truly random sample from millions of elements can be cumbersome and time-consuming Less efficient: When the population naturally divides into relevant subgroups, other methods can produce more precise estimates Simple random sampling works best when the population is relatively small and homogeneous (not divided into distinct subgroups that differ on the variables of interest). Systematic Sampling Systematic sampling provides a practical alternative to simple random sampling, especially for large populations. The method works by arranging the population in an ordered list and selecting every $k$th element after choosing a random starting point. The interval $k$ is calculated as: $$k = \frac{\text{Population Size}}{\text{Sample Size}}$$ How it works: Calculate $k$ based on your population and desired sample size Randomly select a starting point between element 1 and element $k$ (inclusive) Select every $k$th element from that starting point In the illustration below, with a population of 12 and desired sample size of 4, $k = 3$. If the random start is element 2, you select elements 2, 5, 8, and 11. Advantages Easy to implement: No complex random selection process needed Efficient: Can be applied to ordered lists without needing complete numerical codes for all elements Effective when the ordering correlates with the variable of interest: If your list is ordered by a characteristic related to what you're studying, systematic sampling can actually be more precise than simple random sampling Critical Disadvantage: Periodic Patterns Systematic sampling is vulnerable to hidden periodic patterns in the population list. If the population has a repeating pattern that matches your sampling interval $k$, the result can be systematically biased. Example: Imagine a factory supervisor checking quality by systematically sampling every 10th item produced. If the machines have a defect that occurs every 10 items due to a repeating mechanical cycle, the systematic sample might consistently miss (or always catch) the defect. This is why the random starting point is essential—it helps mitigate (though doesn't completely eliminate) this risk. Stratified Sampling Stratified sampling divides the population into homogeneous subgroups called strata, then samples from each stratum independently. Strata are mutually exclusive groups based on characteristics relevant to the research question. Common stratification variables include: Age group Geographic region Income level Education level Any characteristic that divides your population into meaningful subgroups How it works: Divide the population into strata based on a relevant characteristic Determine how many units to sample from each stratum Randomly sample from within each stratum independently The Sampling Fraction The sampling fraction is the ratio of the sample size to the population size within each stratum: $$\text{Sampling Fraction} = \frac{n{\text{stratum}}}{N{\text{stratum}}}$$ In proportional allocation (the most common approach), this fraction is the same for all strata, meaning each stratum contributes to the sample in proportion to its size in the population. Advantages Enables subgroup analysis: You can make reliable estimates for specific strata and compare them between groups Increased precision: When strata are relevant to your study variable, stratified sampling produces more precise estimates than simple random sampling with the same overall sample size Guaranteed representation: You control the representation of important subgroups, avoiding the randomness that might under-represent minorities in simple random sampling Disadvantages Increased cost and complexity: Requires knowing the population's composition and managing multiple sampling processes More difficult stratum definition: Choosing appropriate strata requires subject-matter knowledge; poor choices reduce the benefits Potentially larger sample sizes needed: Using many strata can require larger overall samples to maintain adequate precision within each group Post-Stratification Sometimes researchers don't have stratification information when designing the study, but it becomes available during or after data collection. Post-stratification is a statistical technique applied after sampling to improve estimates by using stratifying variables discovered later. While useful, post-stratification is less effective than pre-planned stratified sampling. Probability-Proportional-to-Size Sampling Probability-proportional-to-size (PPS) sampling assigns selection probabilities to elements based on an auxiliary variable—a size measure that you believe is correlated with your variable of interest. How it works: Instead of every element having an equal chance of selection, larger elements (according to your size measure) have a higher probability of being selected. Example: In a survey of business establishments, instead of giving each business equal selection probability, you might weight by number of employees. Large employers would have higher selection probability than small ones, because they likely contribute more to aggregate statistics like total employment or payroll. Advantages Improved accuracy for aggregate estimates: Because large elements have greater impact on population totals, concentrating the sample on them increases precision Efficient when size is correlated with the variable of interest: If your size measure is a good proxy for what you're measuring, PPS is more precise than simple random sampling The choice of size variable is critical: it must genuinely correlate with your variable of interest for PPS to provide benefits. Cluster Sampling Cluster sampling works differently from the methods discussed so far. Instead of sampling individual elements directly, cluster sampling first selects groups of elements (clusters), then samples within the selected clusters. How it works: Divide the population into mutually exclusive clusters Randomly select some clusters Either include all elements from selected clusters, or randomly sample within selected clusters (multistage sampling) Clusters are often defined by geography (city blocks, counties, schools) or temporal units (days, weeks, time periods). Advantages Dramatically reduced costs: When population elements are geographically dispersed, clustering eliminates the need for detailed lists of all population members. You only need detailed information about selected clusters Practical feasibility: Many populations naturally divide into clusters, making this the only practical sampling approach Important Disadvantage: Increased Variance Elements within a cluster tend to be more similar to each other than to elements in other clusters. This internal similarity increases the variance of estimates compared with simple random sampling. For example, students in the same school are more similar (in socioeconomic status, academic achievement, etc.) than a random sample of all students would be. As a consequence, cluster sampling typically requires larger overall sample sizes than simple random sampling to achieve the same level of precision. Multistage Sampling Multistage sampling is an extension of cluster sampling where randomization occurs at multiple stages. After selecting clusters randomly, you then randomly subsample within selected clusters, rather than including all elements. This approach can dramatically reduce data collection costs because detailed lists of all population elements are required only for the clusters you actually select. Large-scale government surveys frequently use multistage designs for this reason. Nonprobability Sampling Methods Not all sampling methods are probability-based. The following approaches use non-random selection, which means they cannot guarantee that every population element has a known chance of selection. This limits the scientific validity of findings and prevents reliable generalization to the population. Quota Sampling Quota sampling resembles stratified sampling superficially, but differs fundamentally in execution: Divide the population into mutually exclusive subgroups (similar to creating strata) Use judgment to select participants from each subgroup until meeting pre-specified quotas The critical difference: the selection of individuals within each quota is non-random. A researcher might be told "get 50 women aged 18-25" and then approach shoppers at a mall until filling the quota. Why it's nonprobability: Because researchers select specific individuals based on judgment, some population members have no chance of being selected, leading to potential bias. Someone who avoids the mall will never be included. <extrainfo> Additional Nonprobability Methods Accidental (Convenience) Sampling Accidental sampling draws participants from those readily available and convenient to the researcher—for example, surveying students in the cafeteria or passersby on a street corner. Limitations: Without randomization, you cannot reliably generalize findings to the broader population. The sample is likely biased toward people with time available, geographic proximity, or other characteristics correlated with convenience. Voluntary Sampling Voluntary sampling invites people to participate through advertisements or announcements, relying on self-selection. Common examples include online surveys advertised on social media or signup sheets. Key bias: Volunteers often have a stronger interest in the survey topic than the general population, creating systematic bias. For example, people who volunteer for a survey about food quality likely have stronger opinions (positive or negative) than average consumers. Panel Sampling Panel sampling begins with a probability-based selection of participants, then surveys the same participants repeatedly over time. This is distinct from the nonprobability methods above because the initial selection is random. Unique advantage: Repeated measurement of identical individuals enables analysis of change over time and can eliminate confounding from time-invariant individual differences. Challenge: Attrition (participants dropping out) can introduce bias if those remaining differ systematically from those who left. </extrainfo> Summary of Sampling Methods You now understand the landscape of major sampling approaches. Probability sampling methods (simple random, systematic, stratified, PPS, and cluster sampling) allow scientific generalization to the population when properly executed. Nonprobability methods (quota, accidental, voluntary, and panel sampling) are useful for exploratory research or when probability sampling is impossible, but they cannot provide unbiased population estimates. The choice of method depends on your research question, population structure, available resources, and desired precision. Stratified sampling works well for populations with relevant subgroups. Cluster sampling is essential for geographically dispersed populations. Systematic sampling offers practical efficiency for ordered populations. Understanding when each approach is appropriate is key to successful research design.

Flashcards

What are the two core requirements for a sampling process to be considered probability sampling?

Known, non-zero selection probabilities and randomization.

What are the common techniques used in probability sampling?

Simple random sampling Systematic sampling Stratified sampling Probability-proportional-to-size sampling Cluster or multistage sampling

What two statistical advantages are provided by knowing the selection probabilities in probability sampling?

Unbiased estimation of population parameters and calculation of sampling errors.

In probability sampling, how is the weight of a sampled element calculated?

The weight is the inverse of the element’s selection probability.

What are the primary limitations often associated with probability sampling compared to nonprobability approaches?

Availability/quality of the sampling frame and higher costs.

What is the defining characteristic of a simple random sample regarding subset selection?

Every possible subset of the given size has the same probability of being selected.

Why might simple random sampling produce unrepresentative samples despite its random nature?

Random draws may happen to over- or under-represent specific subgroups.

How are elements selected in systematic sampling after a random start is chosen?

Every $k$th element is selected from an ordered list.

In systematic sampling, what is the formula for calculating the interval $k$?

$k = \frac{N}{n}$ (where $N$ is population size and $n$ is sample size).

To avoid systematic bias in systematic sampling, from which range must the random start be chosen?

From the first to the $k$th element.

Under what condition is systematic sampling particularly efficient?

When the ordering variable is correlated with the variable of interest.

What specific risk does systematic sampling face if the population list contains periodic patterns?

The sample may become unrepresentative if the pattern period matches the interval $k$.

How is the population organized before sampling occurs in stratified sampling?

It is divided into homogeneous sub-groups called strata.

What term describes the ratio of the sample size to the population size within a specific stratum?

The sampling fraction.

When does stratified sampling maintain the same efficiency as simple random sampling?

When each stratum is sampled proportionally to its size.

What technique is used when stratifying variables only become available after the sampling process is complete?

Post-stratification.

On what basis are selection probabilities assigned in probability-proportional-to-size sampling?

Proportional to an auxiliary size measure correlated with the variable of interest.

How does probability-proportional-to-size sampling improve the accuracy of population estimates?

By concentrating the sample on large elements that have a greater impact on estimates.

What is the two-step process involved in cluster sampling?

Select groups of elements (clusters) first, then sample elements within those clusters.

How does the internal similarity of clusters affect the variance of estimates compared to simple random sampling?

It usually increases the variance of estimates.

What is multistage sampling?

A form of cluster sampling involving several successive stages of selecting random subsamples.

Why can multistage sampling dramatically reduce costs compared to other methods?

Detailed lists are only required for the clusters that are selected.

In order to achieve the same accuracy as simple random sampling, what is typically required of a cluster sample size?

A larger overall sample size.

How does the selection process in quota sampling differ from stratified sampling?

Selection within subgroups is non-random and based on researcher judgment.

Why is quota sampling classified as a nonprobability sampling method?

The selection of individuals within each quota is non-random.

What is the primary statistical risk of using quota sampling?

It can produce biased estimates because some individuals have no chance of selection.

What is the defining characteristic of accidental (convenience) sampling?

Participants are selected based on being readily available and convenient to the researcher.

Why is reliable generalization to the total population impossible with accidental sampling?

Because the sample selection is not random.

How are participants recruited in voluntary sampling?

They invite themselves to join the study, often via advertisements.

What is the basic procedure for conducting panel sampling?

Select a group randomly and survey the same participants at multiple time points.

What is the primary analytical advantage of using panel sampling?

It allows for the analysis of changes over time within the same individuals.

Quiz

Which of the following is NOT a probability sampling technique?

1 of 8

Key Concepts

Probability Sampling Methods

Probability sampling

Simple random sampling

Systematic sampling

Stratified sampling

Probability‑proportional‑to‑size sampling

Cluster sampling

Multistage sampling

Non-Probability Sampling Methods

Quota sampling

Convenience sampling

Voluntary sampling

Longitudinal Sampling

Panel sampling

Definitions

Probability sampling

A sampling method where every element has a known, non‑zero chance of selection, typically achieved through randomization.

Simple random sampling

A technique in which each possible subset of a given size has an equal probability of being chosen from the population.

Systematic sampling

A method that selects every kth element from an ordered list after a random start, using a fixed interval.

Stratified sampling

An approach that divides the population into homogeneous sub‑groups (strata) and samples each stratum independently.

Probability‑proportional‑to‑size sampling

A design where selection probabilities are proportional to an auxiliary size measure correlated with the study variable.

Cluster sampling

A strategy that first selects groups (clusters) and then samples all or a subset of elements within those clusters.

Multistage sampling

A form of cluster sampling that applies several successive random sampling stages, selecting subsamples at each level.

Quota sampling

A non‑probability technique that fills pre‑specified subgroup quotas using judgmental selection rather than random choice.

Convenience sampling

Also called accidental sampling, it draws participants from those readily available to the researcher.

Voluntary sampling

A non‑random method where individuals self‑select to participate, often through advertisements or invitations.

Panel sampling

A longitudinal design that repeatedly surveys the same randomly selected participants over multiple time points.