Population genetics - Practical Applications and Inference
Understand how population genetics explains genetic variation, detects selection, infers demographic history, and predicts the evolution of genetic systems.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What two factors determine predicted nucleotide diversity according to Neutral Theory?
1 of 13
Summary
Applications of Population Genetics
Population genetics provides powerful tools for understanding real biological systems. Beyond the theoretical principles of evolution, population genetics helps us answer practical questions: How much genetic variation should we expect in a population? What genomic regions are under selection? How have populations changed through history? And how do populations evolve new traits? This section explores four major applications of population genetic theory.
Explaining Levels of Genetic Variation
One of the most important applications of population genetics is explaining why genetic variation differs so dramatically across species.
The Neutral Theory Prediction
The neutral theory of evolution makes a clear prediction about how much genetic variation (specifically, nucleotide diversity) we should observe in populations. The prediction is simple: nucleotide diversity is proportional to the product of the effective population size ($Ne$) and the neutral mutation rate ($\mu$).
$$\text{Nucleotide diversity} \propto Ne \times \mu$$
This makes intuitive sense. Larger populations should maintain more variation because they're less affected by genetic drift. Higher mutation rates should introduce more new variation. So genetic diversity should scale predictably with these two factors.
The Paradox of Variation
Here's where things get interesting—and confusing. When biologists measure genetic variation across many species, they find something surprising: genetic diversity varies far less than population size does.
For example, humans and fruit flies (Drosophila melanogaster) differ in population size by many orders of magnitude. Humans have a small effective population size (roughly 10,000), while fruit flies have much larger populations (millions or more). Yet their nucleotide diversity levels are surprisingly similar. If the neutral theory prediction held exactly, fruit flies should have dramatically more variation than humans—but they don't.
This contradiction is called the "paradox of variation": Why don't large-population species have vastly more genetic diversity than small-population species?
Solutions to the Paradox
Population geneticists have identified several factors that help explain this paradox:
Selection at linked sites: The key insight is that not all genetic variation is truly "neutral" in the way the neutral theory assumes. When deleterious mutations occur, natural selection removes them—and nearby neutral sites get removed too because they're physically linked on the chromosome. This "genetic hitchhiking" reduces variation across large chromosomal regions even at truly neutral sites. In large populations, this effect is stronger because selection is more efficient. Counter-intuitively, this means that large-population species might not accumulate as much neutral variation as predicted, because selection is constantly removing blocks of DNA that contain both deleterious and neutral mutations.
Variation in recombination rates: Recombination breaks linkage between sites, allowing neutral mutations to escape the negative effects of selection at nearby deleterious sites. Species or regions with high recombination rates should maintain more variation; regions with low recombination should lose more variation to selection at linked sites. This variation in recombination efficiency across the genome can significantly affect overall diversity levels.
Life-history effects: Species with shorter generation times and simpler life histories experience stronger genetic drift relative to selection, while long-lived species with complex life histories experience weaker drift. These differences can offset the effects of population size on genetic variation.
Detecting Selection
A central goal in population genetics is identifying which parts of the genome are under natural selection. This matters because selected regions reveal what traits are evolutionarily important. Population genetics provides several tools for this.
Selective Sweeps and Linkage Disequilibrium
When a beneficial allele rises to fixation through positive selection, it creates a characteristic pattern in the genome called a selective sweep. A sweeping allele increases in frequency rapidly, taking nearby neutral alleles along with it (due to linkage). This creates two key signatures:
High linkage disequilibrium: Alleles at nearby sites are statistically associated more strongly than expected by chance, because the selective sweep hasn't allowed time for recombination to shuffle the alleles.
Low genetic variation: The region has unusually low nucleotide diversity because the sweeping allele has recently gone to fixation and eliminated alternative variants in that region.
Researchers look for genomic windows with this combination of patterns—high LD and low variation relative to the genome-wide average—to identify regions likely affected by selective sweeps.
The McDonald–Kreitman Test
A more sophisticated tool for detecting selection is the McDonald–Kreitman test. This test compares patterns of polymorphism within a species to divergence between species, carefully distinguishing between neutral and selected sites.
The key insight is this: neutral mutations should show similar ratios between polymorphism (variation within species) and divergence (differences between species). In contrast, selected sites show a different pattern because natural selection changes the probability that a mutation becomes fixed.
The test works as follows:
Sequence the same gene in multiple individuals within a species (detecting polymorphisms)
Sequence the same gene in a closely related species (detecting divergence)
Classify sites as synonymous (silent mutations, typically neutral) or non-synonymous (amino acid-changing, often selected)
Create a 2×2 table comparing polymorphism and divergence at each site class
If positive selection has been acting on non-synonymous sites, there will be an excess of fixed non-synonymous differences relative to polymorphic non-synonymous sites. This is quantified by the parameter $\alpha$, which represents the proportion of substitutions fixed by positive selection (rather than fixed by drift or purifying selection).
$$\alpha = 1 - \frac{\text{Neutral polymorphisms}}{\text{Neutral divergence}} \times \frac{\text{Selected divergence}}{\text{Selected polymorphisms}}$$
An $\alpha$ value near 0 suggests little positive selection; an $\alpha$ value near 1 suggests most substitutions were driven by positive selection.
Demographic Inference
Population geneticists often want to reconstruct the demographic history of populations—how population sizes have changed, whether populations have mixed, whether there's been inbreeding. These factors all leave genetic signatures that we can measure and interpret.
Hardy–Weinberg Equilibrium and Population Structure
The Hardy–Weinberg equilibrium (HWE) is a null hypothesis: if a population is at HWE, then genotype frequencies match the expected proportions based on allele frequencies.
When populations deviate from HWE, it typically signals population structure—meaning the population is subdivided, with different allele frequencies in different groups. For example, if you sample individuals from two populations that have been isolated from each other, the combined sample will show an excess of homozygotes compared to HWE predictions, because individuals are more likely to be homozygous for the alleles common in their own population.
Testing for deviations from HWE is therefore a way to detect population structure and assess whether a sample comes from a single, randomly mating population.
The Inbreeding Coefficient
The inbreeding coefficient ($F$) quantifies excess homozygosity in a population. It measures the probability that two alleles at a locus in an individual are identical by descent (inherited from a common ancestor).
$F$ ranges from 0 (no inbreeding, random mating) to 1 (complete inbreeding). An individual with $F > 0$ has an elevated probability of being homozygous at any given locus compared to a random-mating population.
$F{ST}$ and Population Differentiation
While $F$ describes inbreeding within a population, $F{ST}$ describes differentiation between populations. Specifically, $F{ST}$ measures the proportion of genetic variance explained by population structure.
$$F{ST} = \frac{\text{Variance among populations}}{\text{Total genetic variance}}$$
$F{ST}$ ranges from 0 (no differentiation; populations are genetically identical) to 1 (complete differentiation; populations have no alleles in common).
In practice, $F{ST}$ tells us how much genetic variation is "used up" in distinguishing populations from each other versus variation within populations. A high $F{ST}$ means populations are very different genetically; a low $F{ST}$ means most variation exists within populations rather than between them.
Coalescent Theory and Historical Inference
The most powerful approach to demographic inference comes from coalescent theory, which connects patterns of genetic variation in present-day samples to the historical events that created those patterns.
The key idea: if we trace the ancestry of a sample of DNA sequences backward in time, the pattern of how they coalesce (share common ancestors) depends on the historical size and structure of the population.
For example:
Population bottlenecks (sudden reductions in size) create a pattern where many lineages coalesce rapidly during the bottleneck period, followed by slower coalescence before the bottleneck
Population expansions create the opposite pattern—slow coalescence in the distant past, then rapid coalescence more recently as the expanding population grew
Population structure creates patterns where sequences from different populations coalesce slowly to each other but rapidly within populations
By comparing the observed coalescent patterns in genetic data to predictions from different demographic models, researchers can infer population history: when did bottlenecks occur? How fast did populations expand? When did populations split?
Evolution of Genetic Systems
Population genetics also helps us understand why different species have such different genomes—why some have large, complex genomes filled with introns and transposable elements, while others have compact, streamlined genomes. The answers often involve population size and the effectiveness of selection.
The Drift-Barrier Hypothesis
A key principle in evolutionary genomics is the drift-barrier hypothesis: selection can effectively purge deleterious mutations from a population only when the selection coefficient $s$ exceeds a threshold related to population size.
The critical threshold is roughly $s > 1/Ne$. When a deleterious mutation has a small effect ($s < 1/Ne$), the effect of selection is weak compared to genetic drift, and the mutation may fix by chance anyway. When $s > 1/Ne$, selection is strong enough to reliably purge the mutation.
This has a surprising consequence: species with small effective population sizes tolerate more deleterious mutations because selection cannot efficiently remove them. Mutations that would be purged in large-population species accumulate in small populations.
Genome Structure and Population Size
This principle explains major differences in genome organization across species:
Large-population species tend to have streamlined genomes: Organisms like bacteria and some parasites with huge effective population sizes have very compact genomes. Introns are rare, transposable elements are scarce, and non-coding DNA is minimal. This is because selection has been effective enough to remove "unnecessary" DNA elements that provide no benefit.
Small-population species accumulate complex genomes: Species with small $Ne$ (like mammals, including humans) tolerate more non-functional DNA. Their genomes contain:
Abundant introns within genes
Numerous transposable elements that can copy themselves
Large amounts of non-coding repetitive DNA
Pseudogenes and other non-functional sequences
These elements aren't actively maintained by selection; they persist because selection is too weak to purge them in small populations. In large-population species, the same mutations might be efficiently removed.
<extrainfo>
Population-Genetic Models of Other Traits
Beyond genome structure, population genetics provides models for understanding the evolution of:
Dominance: Why are most harmful mutations recessive rather than dominant? Selection against dominant deleterious mutations is more efficient, so they're rapidly purged. Only recessive mutations persist at measurable frequencies.
Sexual reproduction: Why do many organisms reproduce sexually despite the "cost" of sex (you only pass on half your genes, not all of them)? Sexual reproduction generates variation that selection can act upon and may purge genetic parasites more effectively.
Recombination rates: Populations can evolve higher or lower recombination rates; the optimal rate depends on the balance between generating beneficial genetic combinations and breaking up favorable combinations by too much recombination.
Mutation rates: Paradoxically, mutation rates themselves can evolve. Error-correcting mechanisms in DNA replication impose metabolic costs, so organisms with small populations may tolerate higher mutation rates, while large-population species invest more in mutation-reducing mechanisms.
Aging and senescence: Why do organisms age? Some aging traits may be maintained because they have weak effects in young individuals (where most reproduction happens), making selection against them inefficient.
</extrainfo>
Summary: Population genetics has evolved from a theoretical framework into a practical toolkit for understanding real biological systems. By predicting levels of genetic variation, detecting selection in genomes, reconstructing population history from DNA sequences, and explaining the diversity of genetic systems across species, population genetics connects evolutionary theory to the genomic data we can now easily measure. These applications have revolutionized fields from conservation biology to medicine to understanding human evolution.
Flashcards
What two factors determine predicted nucleotide diversity according to Neutral Theory?
Effective population size ($Ne$) and neutral mutation rate ($\mu$)
What observation defines the “paradox of variation” in population genetics?
Genetic diversity varies much less than population size across different species.
What are the primary proposed solutions to the paradox of variation?
Selection at linked sites
Variation in recombination rates
Life-history effects
Which two genomic characteristics are used to identify selective sweeps?
Regions of high linkage disequilibrium
Regions of low genetic variation
What two types of data does the McDonald–Kreitman test compare to detect selection?
Polymorphism within species and divergence between species at neutral and selected sites.
In a McDonald–Kreitman test, what does an excess of divergent sites signify?
Positive selection
What does the parameter $\alpha$ represent in the context of the McDonald–Kreitman test?
The proportion of substitutions fixed by positive selection.
How are Hardy–Weinberg equilibrium tests used in demographic inference?
They assess whether genotype frequencies match expected proportions to identify population structure.
What does the inbreeding coefficient $F$ quantify in a population?
Excess homozygosity
What does the $F{ST}$ statistic measure in population genetics?
The proportion of genetic variance explained by population structure.
What is the primary application of Coalescent Theory in demographic inference?
Relating sampled genetic diversity to historical events like bottlenecks and expansions.
According to the drift-barrier hypothesis, under what condition can selection purge deleterious mutations?
When the selection coefficient $s$ is greater than $1 / Ne$ (where $Ne$ is effective population size).
How does effective population size ($Ne$) typically influence the presence of introns and transposable elements?
Small populations tend to accumulate them, while large populations tend to have streamlined genomes.
Quiz
Population genetics - Practical Applications and Inference Quiz Question 1: According to neutral theory, nucleotide diversity ($\pi$) is expected to be proportional to which of the following?
- $N_e \times \mu$ (correct)
- $N_e + \mu$
- $N_e / \mu$
- $N_e^{2} \times \mu$
Population genetics - Practical Applications and Inference Quiz Question 2: Which statement best describes the typical genome architecture of species with large effective population sizes?
- They tend to have streamlined genomes with few introns and transposable elements (correct)
- They possess large, repeat‑rich genomes filled with many transposable elements
- They exhibit high intron density and extensive noncoding DNA
- They show low coding density due to accumulation of neutral sequences
According to neutral theory, nucleotide diversity ($\pi$) is expected to be proportional to which of the following?
1 of 2
Key Concepts
Evolutionary Mechanisms
Neutral theory of molecular evolution
Selective sweep
Drift‑barrier hypothesis
McDonald–Kreitman test
Population Genetics
Hardy–Weinberg equilibrium
Inbreeding coefficient (F)
F_ST
Effective population size (Ne)
Genetic Diversity and History
Paradox of variation
Coalescent theory
Definitions
Neutral theory of molecular evolution
A hypothesis that most evolutionary changes at the molecular level are caused by random drift of neutral mutations, predicting nucleotide diversity proportional to effective population size times mutation rate.
Paradox of variation
The observation that genetic diversity varies far less across species than their census population sizes would predict.
Selective sweep
The process by which a beneficial mutation rapidly rises to fixation, reducing genetic variation and creating a region of high linkage disequilibrium.
McDonald–Kreitman test
A comparative method that contrasts polymorphism within a species to divergence between species to infer the proportion of substitutions driven by positive selection.
Hardy–Weinberg equilibrium
The principle that allele and genotype frequencies remain constant from generation to generation in an idealized, non‑evolving population.
Inbreeding coefficient (F)
A measure of the probability that two alleles at a locus are identical by descent, quantifying excess homozygosity.
F_ST
A statistic that quantifies the proportion of total genetic variance attributable to differences among subpopulations.
Coalescent theory
A retrospective stochastic model that relates the genealogical history of sampled alleles to past demographic events such as bottlenecks and expansions.
Drift‑barrier hypothesis
The idea that the efficacy of natural selection in eliminating deleterious mutations is limited by genetic drift, especially when selection coefficients are smaller than 1/Ne.
Effective population size (Ne)
The size of an idealized population that would experience the same amount of genetic drift as the actual population under study.