Introduction to Epidemiology
Understand key epidemiologic concepts, major study designs, and how findings inform public‑health actions.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the scientific definition of epidemiology?
1 of 20
Summary
Foundations of Epidemiology
Introduction
Epidemiology is the scientific study of how diseases and health-related events are distributed in populations and what factors influence their occurrence. Rather than focusing on individual patients like clinical medicine does, epidemiology takes a population-level approach to understand disease patterns and causes.
The ultimate goal of epidemiology is to prevent disease, guide public-health interventions, and improve health policy. Epidemiologists ask questions like: Why is disease X more common in this geographic area? Which groups are at highest risk? What causes are modifiable? By answering these questions systematically, epidemiology provides the evidence base for public health action.
Measuring Disease in Populations: Incidence and Prevalence
One of the most fundamental skills in epidemiology is measuring disease frequency. There are two main ways to do this, and it's crucial to understand the difference because they answer different questions and are used in different study designs.
Incidence: Measuring New Cases and Risk
Incidence counts the number of new cases of disease that arise in a defined population during a specific time period. Think of it as capturing people who were disease-free at the start and then developed the disease.
The key insight is that incidence directly reflects risk—the probability that a healthy person will develop the disease. This makes incidence particularly valuable for understanding how likely someone is to become ill.
For example, if you follow 1,000 disease-free people for one year and 50 develop the disease, the incidence is 50 per 1,000 people per year, meaning each person had a 5% risk of developing the disease that year.
Prevalence: Measuring All Existing Cases
Prevalence captures all existing cases of disease—both newly diagnosed cases and cases that have existed for a while—at a particular point in time or averaged over a period.
Prevalence answers the question: "Right now, how many people in my population have this disease?" This reflects the overall burden of disease in the community. It's useful for planning healthcare resources or understanding public health priorities.
Importantly, prevalence is influenced both by how many people get the disease (incidence) and how long they survive with it. A disease with low incidence but high survival time could still have high prevalence. Conversely, a disease with high incidence but quick recovery will have low prevalence.
Why This Distinction Matters
This is a common source of confusion: incidence and prevalence measure different things and are used differently. Incidence tells you about risk and is essential for cohort studies (which we'll discuss later). Prevalence tells you about burden and is used in cross-sectional studies. Understanding which measure you're looking at is critical for interpreting research correctly.
Using Comparison to Identify Patterns
Epidemiologists generate hypotheses about possible causes by comparing disease frequency across different groups. By stratifying the population by characteristics like age, sex, geographic location, or exposure status, patterns emerge that suggest what might be causing disease.
For instance, if disease incidence is much higher in an exposed group than an unexposed group, this suggests the exposure may be causally related to the disease. If incidence varies dramatically by geography, this might point to an environmental cause. These patterns then become the basis for more rigorous testing of causal hypotheses.
Types of Epidemiologic Studies
Understanding study design is essential because each design has different strengths and answers different types of questions. The outline in the image below shows how different studies relate to one another:
Descriptive Studies: Defining the Problem
Descriptive studies form the foundation of epidemiologic inquiry. They answer the basic questions: Who is affected? Where does the disease occur? When does it occur?
Descriptive studies typically use surveillance data (ongoing systematic collection of disease data) to produce rates, trends, and geographic patterns. For example, a descriptive study might document that heart disease mortality has been declining nationally but remains elevated in certain counties, with higher rates in men than women. This information is often presented as trends over time or maps showing geographic variation.
While descriptive studies don't test hypotheses about causation, they are invaluable for identifying patterns that suggest possible causes and for determining where public health action is most needed. They're also relatively quick and inexpensive to conduct, making them ideal for understanding emerging health issues.
<extrainfo>
Descriptive studies might use historical data analysis or ecological studies (which examine disease rates in geographic areas and correlate them with area-level exposures). While useful for hypothesis generation, ecological studies are prone to a special type of bias called the "ecological fallacy"—inferring that patterns at the group level apply to individuals within those groups.
</extrainfo>
Analytic Studies: Testing Hypotheses
Once descriptive studies have identified patterns, analytic studies are used to test specific hypotheses about causal relationships. These studies compare exposure status between groups to determine whether differences in exposure are associated with differences in disease occurrence.
There are three main types of analytic studies, and each has distinct advantages and disadvantages:
Cohort Studies: Following People Forward in Time
Cohort studies are the most straightforward conceptually: they follow a group of exposed and unexposed individuals over time and observe who develops the disease.
Here's how they work: You start by identifying a cohort—a group of disease-free people—and classify them based on whether they were exposed to some factor of interest (like smoking, dietary intake, an occupational chemical, etc.). Then you follow both the exposed and unexposed group forward in time, tracking who develops the outcome.
The key strength of cohort studies is that they provide a direct measure of risk. Because you start disease-free and observe who develops disease, you can calculate incidence in each group and compute the relative risk (RR)—the ratio of disease risk in exposed people to disease risk in unexposed people.
For example, if 20% of smokers develop lung cancer over 10 years but only 1% of non-smokers do, the relative risk is 20/1 = 20, meaning smokers are 20 times more likely to develop lung cancer.
The weaknesses are significant: cohort studies can be very costly and time-consuming, especially for diseases that take years to develop or that are relatively rare. You must maintain contact with participants over many years, and loss to follow-up can bias results. Despite these drawbacks, cohort studies remain the gold standard for establishing causation when they're feasible to conduct.
Case-Control Studies: Working Backward from Disease
Case-control studies work in the opposite direction. Rather than starting with exposed and unexposed people and following them forward, you start with people who have the disease (cases) and people who don't (controls), then look backward at their prior exposures.
The logic is straightforward: If an exposure causes disease, people with the disease should have been exposed more frequently in the past than people without the disease.
Here's a concrete example: You identify 100 people newly diagnosed with a certain cancer (cases) and 100 similar people without the cancer (controls). You then interview both groups about past exposures—say, childhood residence near a factory. If 60% of cases but only 30% of controls lived near a factory, this suggests living near the factory may increase cancer risk.
The major strength of case-control studies is efficiency for rare diseases. Cohort studies require following vast numbers of people when disease is rare (because few will develop it). Case-control studies solve this by identifying cases and controls up front—you're guaranteed to have disease cases to study.
Case-control studies also provide a measure of association called the odds ratio (OR), which estimates the association between exposure and disease. For rare diseases, the odds ratio approximates the relative risk closely.
The limitation is that you cannot directly measure incidence or risk (since you start with people who already have disease). Additionally, case-control studies are vulnerable to recall bias—people with disease may remember past exposures differently (or more carefully) than people without disease, biasing results.
Cross-Sectional Studies: A Snapshot in Time
Cross-sectional studies assess both exposure and disease status at a single point in time, like a snapshot. You survey a population, identifying who is exposed and who has disease, all at the same moment.
The advantage is that cross-sectional studies are quick and relatively inexpensive. They're useful for examining prevalence in relation to exposures and are commonly used in public health surveillance.
The limitation is that because exposure and disease are measured simultaneously, you cannot establish temporal sequence—you cannot determine whether exposure preceded disease, which is essential for inferring causation. This makes cross-sectional studies primarily useful for hypothesis generation rather than testing causal hypotheses.
<extrainfo>
There are some specialized analytic study designs worth knowing about:
Ecological studies compare disease rates across geographic areas (or populations) and correlate them with area-level exposures. While hypothesis-generating, they're prone to ecological fallacy.
Natural experiments exploit naturally occurring variations in exposure (like a factory opening in one town but not another) to estimate effects, combining some advantages of observational and experimental studies.
</extrainfo>
Interpretation and Validity: Understanding What Findings Really Mean
Finding an association between exposure and disease doesn't automatically mean the exposure causes the disease. Three critical concepts help epidemiologists distinguish real causal relationships from spurious ones:
Confounding: When a Third Variable Creates False Associations
Confounding occurs when a third variable (the confounder) creates or distorts the apparent relationship between an exposure and outcome.
Here's the key mechanism: A confounder must be associated with both the exposure and the outcome independently. When this happens, the confounder creates a false association (or masks a true one) that you observe when you compare exposed to unexposed groups.
Example: Suppose you find that coffee drinkers have higher rates of heart disease than non-coffee drinkers. But coffee consumption is associated with smoking (many coffee drinkers also smoke), and smoking independently causes heart disease. Smoking is a confounder—the elevated risk in coffee drinkers may actually be due to their higher smoking rates, not the coffee.
To address confounding, epidemiologists can:
Restrict the study to only people in one stratum of the confounder (e.g., study only non-smokers)
Match cases and controls on the confounder so both groups have equal distribution
Stratify the analysis to examine the exposure-disease relationship separately within each level of the confounder
Statistically adjust for the confounder in multivariable analysis
Bias: Systematic Errors in Measurement or Selection
Bias refers to systematic (non-random) errors in measurement, selection of participants, or data collection that lead to incorrect estimates of association. Unlike random error, which averages out over time, bias consistently pushes results in one direction.
Common types include:
Selection bias: When the choice of study participants is related to both exposure and outcome, distorting the association. Example: In a case-control study, if cases of a disease are more likely to remember past exposures (recall bias) than controls, this distorts the apparent strength of the exposure-disease association.
Information bias: When measurement of exposure or disease is systematic errors. Example: If a certain group tends to over-report exposure while another under-reports it, comparisons between groups become unreliable.
Observer bias: When knowledge of exposure status influences how disease is assessed, or vice versa. This is why blinding—keeping observers unaware of participants' exposure status—is critical.
Effect Modification: When Effects Differ Across Groups
Effect modification (also called interaction) occurs when the effect of an exposure differs across subgroups of the population.
This is conceptually different from confounding: with confounding, there's a "true" effect that's being distorted. With effect modification, there's no single true effect—the effect genuinely differs by subgroup.
Example: The effect of a medication on blood pressure might be stronger in men than in women. Sex is an effect modifier—the effect modification is real and interesting, not a problem to be controlled for. You'd report results separately by sex.
Recognizing effect modification is important because it means you cannot summarize results with a single number for the entire population—you must describe how effects vary.
Simple Measures of Association: Quantifying Relationships
Once you've identified an association and controlled for confounding, you need to quantify the strength of the relationship. Three measures are most common:
Relative Risk (RR)
Relative Risk is calculated as:
$$RR = \frac{Ie}{Iu}$$
where $Ie$ is the incidence (or risk) of disease in the exposed group and $Iu$ is the incidence in the unexposed group.
Relative risk tells you how many times more (or less) likely the exposed group is to develop disease compared to the unexposed group. An RR of 1.0 means no association. An RR > 1.0 means increased risk with exposure. An RR < 1.0 means decreased risk (protective effect).
Who calculates it: Cohort studies and clinical trials directly observe incidence, so they can calculate RR.
Odds Ratio (OR)
Odds Ratio is calculated as:
$$OR = \frac{\text{odds}{\text{cases}}}{\text{odds}{\text{controls}}}$$
This is the ratio of the odds of exposure among people with disease to the odds of exposure among people without disease. It's harder to interpret intuitively than RR, but it approximates RR well when disease is rare.
Who calculates it: Case-control studies use OR because they identify cases and controls and look backward at exposure, so they can't directly measure incidence. However, when disease is uncommon, OR ≈ RR.
Risk Difference (RD)
Risk Difference is calculated as:
$$RD = Ie - Iu$$
This is the absolute difference in risk between exposed and unexposed groups. While RR and OR are useful for comparing relative effects across populations, RD is useful for public health planning—it tells you how many cases you'd prevent per unit population by eliminating the exposure.
Example: If exposed individuals have 20% risk and unexposed have 5% risk, the RR is 4 (exposed are 4 times more likely), but the RD is 15% (eliminating exposure would prevent 15 cases per 100 people).
Applications of Epidemiology
Outbreak Investigations and Epidemic Control
When a disease outbreak occurs, epidemiologists use systematic methods to locate the source, understand transmission, and implement control measures.
A classic historical example is John Snow's investigation of the 1854 cholera outbreak in London:
Snow used a case-control approach, identifying cholera cases and controls, then examining their exposures (particularly their water source). He found cases clustered around the Broad Street pump and demonstrated that people using that pump had much higher disease rates than those using other water sources. His investigation identified contaminated water as the source and led to removal of the pump handle—stopping the outbreak.
This investigation exemplifies epidemiologic methods: describing the outbreak geographically and temporally, forming hypotheses about cause (contaminated water), testing the hypothesis systematically, and implementing control measures based on evidence.
Screening Programs and Disease Detection
Screening programs aim to identify disease in asymptomatic people before they develop symptoms. Epidemiologic data directly inform these programs.
Specifically, epidemiologic estimates of disease prevalence and individual risk guide which populations should be screened. For a screening program to be worthwhile, disease must be common enough in the target population that screening efficiently identifies cases. Additionally, epidemiologic research documents the natural history of disease (how it develops and progresses) to determine whether early detection improves outcomes—a prerequisite for beneficial screening.
Vaccination Campaigns and Immunization Strategy
Epidemiologic data guides vaccination decisions by identifying which populations face the highest disease risk and monitoring trends in vaccine-preventable diseases.
By tracking disease incidence and prevalence before and after vaccination, epidemiologists assess vaccine effectiveness and guide decisions about coverage targets, vaccine schedules, and booster recommendations. Outbreak investigations often reveal gaps in vaccination coverage that public health agencies then target with interventions.
Flashcards
What is the scientific definition of epidemiology?
The study of how diseases and health-related events are distributed in populations and the factors influencing their occurrence.
What are the ultimate goals of epidemiology?
Prevent disease
Guide public-health interventions
Improve health policy
In epidemiology, what does the measure of incidence represent?
The number of new cases that arise in a defined period, reflecting the risk of developing the disease.
What does prevalence capture in a population?
All existing cases (both new and old) at a particular point or over a period, reflecting the overall disease burden.
How do epidemiologists typically generate hypotheses about disease causes?
By comparing incidence or prevalence across groups defined by factors like age, sex, geography, or exposure status.
What primary questions do descriptive studies explore regarding a disease?
Who is affected, where the disease occurs, and when it occurs.
What is the primary purpose of analytic studies in epidemiology?
To test specific hypotheses about causal relationships between exposures and outcomes.
What is the basic design of a cohort study?
Following a group of exposed and unexposed individuals over time to see who develops the outcome.
What measure of association is directly estimated by cohort studies?
Relative risk.
How is a case-control study designed?
It starts with people who have the disease (cases) and compares them to similar people without the disease (controls) to examine prior exposures.
For what type of diseases are case-control studies particularly efficient?
Rare diseases.
What measure of association is yielded by a case-control study?
Odds ratio.
At what point in time does a cross-sectional study assess exposure and disease status?
At a single point in time.
When does confounding occur in an epidemiologic study?
When a third variable distorts the true relationship between an exposure and an outcome.
In epidemiology, what does bias refer to?
Systematic errors in measurement or selection that lead to incorrect estimates of association.
What is effect modification?
When the effect of an exposure differs across various subgroups of a population.
What is the formula for calculating Relative Risk ($RR$)?
$RR = \dfrac{I{e}}{I{u}}$ (where $I{e}$ is incidence among the exposed and $I{u}$ is incidence among the unexposed).
What is the formula for calculating an Odds Ratio ($OR$)?
$OR = \dfrac{\text{odds}{\text{cases}}}{\text{odds}{\text{controls}}}$.
What is the formula for calculating Risk Difference ($RD$)?
$RD = I{e} - I{u}$ (where $I{e}$ is incidence in the exposed and $I{u}$ is incidence in the unexposed).
What epidemiologic estimates are used to determine target populations for screening programs?
Disease prevalence and risk.
Quiz
Introduction to Epidemiology Quiz Question 1: What does a cross‑sectional study assess?
- Exposure and disease status at a single point in time (correct)
- Participants over several years to observe incidence
- Cases based on disease status and then looks back at exposures
- Randomly assigns exposures to participants
Introduction to Epidemiology Quiz Question 2: How is the odds ratio (OR) calculated?
- $OR = \dfrac{\text{odds}_{\text{cases}}}{\text{odds}_{\text{controls}}}$ (correct)
- $OR = \text{odds}_{\text{cases}} + \text{odds}_{\text{controls}}$
- $OR = \dfrac{\text{exposed}_{\text{cases}}}{\text{exposed}_{\text{controls}}}$
- $OR = \dfrac{I_{e}}{I_{u}}$
Introduction to Epidemiology Quiz Question 3: How is risk difference (RD) calculated?
- $RD = I_{e} - I_{u}$ (correct)
- $RD = \dfrac{I_{e}}{I_{u}}$
- $RD = \dfrac{I_{e} + I_{u}}{2}$
- $RD = I_{u} - I_{e}$
What does a cross‑sectional study assess?
1 of 3
Key Concepts
Epidemiological Concepts
Epidemiology
Incidence
Prevalence
Confounding
Bias (epidemiology)
Study Designs
Cohort study
Case‑control study
Outbreak investigation
Measures of Association
Relative risk
Odds ratio
Definitions
Epidemiology
The scientific study of the distribution and determinants of health‑related events in populations.
Incidence
The count of new disease cases occurring in a defined population during a specific time period, reflecting risk.
Prevalence
The total number of existing cases of a disease in a population at a particular point or over a period, indicating overall burden.
Cohort study
An observational design that follows exposed and unexposed groups over time to compare incidence of outcomes.
Case‑control study
An observational design that starts with individuals who have a disease and compares their prior exposures to those without the disease.
Confounding
A distortion of the true exposure‑outcome relationship caused by a third variable associated with both.
Bias (epidemiology)
Systematic errors in study design, data collection, or analysis that lead to incorrect estimates of association.
Relative risk
A measure of association calculated as the incidence among the exposed divided by the incidence among the unexposed.
Odds ratio
A measure of association calculated as the odds of exposure among cases divided by the odds of exposure among controls.
Outbreak investigation
The application of epidemiologic methods to identify the source, transmission, and control measures for a sudden increase in disease cases.