Subjects/Math/Statistics and Discrete Math/Statistics/Sampling (statistics)

Sampling (statistics) Study Guide

Study Guide

📖 Core Concepts Sampling – Selecting a subset of a statistical population to estimate its characteristics. Population – All individuals or items possessing the traits of interest (physical, temporal, conceptual). Sampling Frame – A list or mechanism that lets every target‑population element be identified and potentially chosen. Probability Sample – Every unit has a known, non‑zero chance of selection; selection is random. Non‑Probability Sample – Selection probabilities are unknown; sampling is based on judgment, convenience, etc. Survey Weight – The inverse of a unit’s selection probability; used to make the sample “look like” the population. Sampling Error – Random variation that would occur if we repeatedly drew samples. Selection / Non‑Sampling Bias – Systematic distortion from the way units are chosen or from measurement, processing, non‑response, etc. --- 📌 Must Remember Equal‑probability (self‑weighting) design: every element has the same selection probability → weight = 1. Weight formula:  $wi = \dfrac{1}{\pii}$, where $\pii$ = selection probability for unit i. Systematic interval: $k = \dfrac{N}{n}$ (population size ÷ sample size). Stratified sampling efficiency: gains when strata are homogeneous internally and heterogeneous between each other. Cluster sampling cost vs variance trade‑off: cheaper but usually higher variance than simple random sampling. Non‑response turns a probability design into a non‑probability one unless adjustments are made. Quota & convenience sampling = non‑probability → cannot compute sampling error. --- 🔄 Key Processes Design a Probability Sample Define target population → build a complete sampling frame. Choose technique (SRS, systematic, stratified, PPS, cluster, multistage). Compute each unit’s selection probability $\pii$. Derive weights $wi = 1/\pii$. Systematic Sampling Steps Compute $k = N/n$. Randomly pick a start $r$ from $\{1,\dots,k\}$. Select units $r,\, r+k,\, r+2k,\dots$ until sample size n is reached. Stratified Sampling Steps Partition frame into strata (e.g., urban/rural). Decide allocation (proportional or disproportional). Sample within each stratum using SRS or systematic. Cluster / Multistage Sampling Steps Identify clusters (e.g., schools). Randomly select a subset of clusters. Within chosen clusters, either sample all units (one‑stage) or draw a subsample (second stage). Weighting & Non‑Response Adjustment Compute base weights (inverse selection probs). Adjust for non‑response: inflate weights of respondents in under‑represented groups. Post‑stratify if external benchmarks are available. --- 🔍 Key Comparisons Probability vs. Non‑Probability Selection probs known vs. unknown → only the former yields unbiased estimates & sampling errors. Stratified vs. Cluster Stratified: sample within each homogeneous subgroup → lower variance. Cluster: sample whole groups → cheaper, higher variance (intra‑cluster similarity). Systematic vs. Simple Random Systematic: easy, needs ordered list; vulnerable to periodic patterns. Simple Random: fully random; harder to implement for huge N. Quota vs. Stratified Quota: non‑random selection within quotas → biased. Stratified: random selection within strata → unbiased. With‑Replacement vs. Without‑Replacement With: same unit can appear multiple times → simplifies variance formulas. Without: each unit appears at most once → more efficient use of information. --- ⚠️ Common Misunderstandings “Random = unbiased.” Random draws can still produce unrepresentative samples by chance (e.g., SRS that under‑samples a minority). “Larger sample always fixes bias.” Bias from frame problems or non‑response persists regardless of size. “Systematic sampling eliminates randomness.” It is random once the start is random; the interval is fixed. “Weights always make a sample perfect.” Weights correct known probability imbalances; they cannot fix selection bias from a faulty frame. --- 🧠 Mental Models / Intuition “Sampling as a scaled‑up photograph.” Think of the population as a huge picture; the sample is a smaller, proportionally scaled snapshot. Weights tell you how many real‑world pixels each sampled pixel represents. “Clusters = boxes of marbles.” If each box (cluster) contains similar marbles, picking a few boxes gives a rough idea but misses the fine‑grained variation inside each box. “Strata = layers of a cake.” Each layer (stratum) is distinct; sampling a slice from each layer ensures you taste every flavor. --- 🚩 Exceptions & Edge Cases Periodic ordering in systematic sampling that matches the interval k → sample becomes biased (e.g., every 10th house lies on the same street). Very small strata → proportional allocation may yield zero units; may need disproportional (oversampling) and later weighting. Non‑response that is not random → simple weighting may be insufficient; need modeling or imputation. Probability‑proportional‑to‑size (PPS) without replacement can produce unequal inclusion probabilities; adjust weights accordingly. --- 📍 When to Use Which Simple Random – Small, well‑defined populations; when you can list every unit. Systematic – Large lists with a reliable ordering variable correlated with the outcome; when simplicity matters. Stratified – When subpopulations are of analytic interest or when strata differ markedly in variance. PPS – When an auxiliary size measure (e.g., store sales) strongly predicts the variable of interest. Cluster / Multistage – When the population is geographically dispersed and travel costs dominate. Quota / Convenience – Only for exploratory, qualitative work where representativeness is not required. Panel – When tracking change over time on the same respondents is essential. --- 👀 Patterns to Recognize “Higher variance → cluster sampling.” Spot questions that mention geographic or school‑based sampling → think cluster. “Weight = 1 / prob.” Whenever a weighting problem appears, look for the inverse of the selection probability. “Non‑response = missing‑data bias.” If a survey reports low response rates, anticipate the need for weighting or imputation. “Periodicity in list → systematic trap.” If the ordering variable has a known cycle (e.g., weekly sales), systematic sampling may be dangerous. --- 🗂️ Exam Traps Choosing “simple random” when the frame is ordered. Test‑writers may present an ordered list and expect systematic sampling. Confusing quota with stratified sampling. Both involve sub‑groups, but only stratified is random. Assuming PPS eliminates the need for weights. PPS still requires weighting by the inverse of the size‑based probability. Ignoring the effect of non‑response on a probability design. An answer that treats the design as still probability‑based after high non‑response is wrong. Mixing up with‑ vs. without‑replacement variance formulas. Remember that with‑replacement variance is slightly larger because repeats are possible. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or