Phylogenetics Study Guide
Study Guide
📖 Core Concepts
Phylogenetics – study of evolutionary history using heritable traits (DNA, proteins, morphology).
Phylogenetic tree – diagram of hypothesized relationships; tips = observed taxa, internal nodes = inferred ancestors.
Rooted tree – has a designated ancestor; shows direction of change.
Unrooted tree – only shows relationships, no direction or root.
Clade – an ancestor plus all its descendants (monophyletic group).
Cladistics – reconstructs relationships using shared derived characters (synapomorphies).
Parsimony – prefers the tree requiring the fewest evolutionary changes.
Maximum Likelihood (ML) – evaluates a tree by the probability of the observed data given a model of character change.
Bayesian inference – combines prior probabilities with a likelihood to produce a posterior distribution of trees; explored via Markov chain Monte Carlo (MCMC).
Phenetic distance methods – build trees from overall similarity matrices (e.g., neighbor‑joining).
Long‑branch attraction (LBA) – artifact where rapidly evolving, unrelated lineages appear together because of convergent/homoplastic changes.
Bootstrap – resampling technique that provides a statistical support value for each node.
📌 Must Remember
Rooted vs. Unrooted – rooted → lineage ancestry; unrooted → similarity pattern only.
Parsimony principle – “simplest explanation wins” → fewest character changes.
ML & Bayesian both require an explicit substitution model (e.g., Jukes‑Cantor, GTR).
Neighbor‑joining builds a tree by repeatedly joining the pair with the smallest distance.
Bootstrap ≥ 70 % is commonly taken as moderate support; ≥ 95 % as strong support.
Dollo’s Law – once a complex character is lost, it is unlikely to be re‑evolved.
Phenetics ignores phylogeny; cladistics requires synapomorphies; evolutionary taxonomy blends both.
Robinson–Foulds metric quantifies topological distance between two trees.
🔄 Key Processes
Data collection – obtain DNA/protein sequences or morphological characters.
Alignment – line up homologous positions; trim ambiguous regions.
Model selection – use information‑theoretic criteria (AIC, BIC) to pick the best substitution model.
Tree inference
Parsimony: enumerate (or heuristically search) trees → choose with minimum steps.
ML: compute likelihood $L = P(\text{data} \mid \text{tree}, \text{model})$ for each candidate → pick highest $L$.
Bayesian: define priors, run MCMC → obtain posterior sample; summarize with majority‑rule consensus.
Neighbor‑joining: calculate pairwise distances → iteratively join closest taxa → produce unrooted tree.
Support assessment – bootstrap (or jackknife) replicates → assign % support to nodes.
Tree visualisation & interpretation – root (if needed), annotate branch lengths (time vs. change), identify clades.
🔍 Key Comparisons
Parsimony vs. ML – Parsimony counts changes; ML uses explicit probabilistic model of change.
Rooted vs. Unrooted – Rooted = direction + ancestor; Unrooted = only relationship pattern.
Phenetics vs. Cladistics – Phenetics = overall similarity; Cladistics = shared derived traits only.
Bootstrap vs. Posterior Probability – Bootstrap = resampling frequency; Posterior = probability given data & priors.
⚠️ Common Misunderstandings
“Long branch = ancient” – LBA is a methodological artifact, not a true age indicator.
“Higher bootstrap = correct tree” – high support can still be misleading if model is wrong or taxon sampling is poor.
“Bayesian = always better” – Bayesian results depend heavily on priors; poor priors can bias the posterior.
“Unrooted tree shows evolution” – without a root, direction of change cannot be inferred.
🧠 Mental Models / Intuition
Tree as a family diagram – think of the root as the “great‑grandparent” and each node as a “child” that gives rise to all descendants.
Parsimony as “Occam’s razor” – the simplest story (fewest mutations) is preferred unless data demand complexity.
Likelihood as “fit score” – higher likelihood = the model predicts the observed data more accurately, like a better‑fitting curve.
MCMC as “random walk” through tree space – the chain spends more time in high‑probability regions, giving a natural weighting to likely trees.
🚩 Exceptions & Edge Cases
Homoplasy (convergent evolution) can make parsimony misleading; ML/Bayesian can accommodate it via substitution models.
Dollo’s Law is a rule of thumb; rare cases of re‑evolution of complex traits have been documented.
Very short internal branches → low resolution; may produce a “star” phylogeny regardless of method.
Sparse taxon sampling → increases risk of LBA and can inflate branch lengths.
📍 When to Use Which
Parsimony – small morphological datasets, when computational speed matters, and homoplasy is expected to be low.
Maximum Likelihood – medium‑size DNA datasets, need for model‑based inference, and when branch‑length estimates are required.
Bayesian – large molecular datasets, desire for posterior probabilities, or when incorporating prior knowledge (e.g., fossil calibrations).
Neighbor‑joining – quick exploratory analysis, large numbers of taxa, or when only distance matrix is available.
Bootstrap – to assess node stability for any method; use >1000 replicates for reliable estimates.
👀 Patterns to Recognize
Star‑like topology → very rapid radiation or insufficient data.
Long branch attached to short internal branch → red flag for possible LBA.
High bootstrap on shallow nodes but low on deep nodes → recent divergences well resolved, ancient splits ambiguous.
Consistent clade across methods → robust evolutionary signal.
🗂️ Exam Traps
Choosing “rooted” vs. “unrooted” – a question may ask which tree type is appropriate for visualizing similarity; answer: unrooted.
Confusing phenetic distance with phylogenetic distance – similarity ≠ ancestry; phenetic methods ignore character polarity.
Assuming bootstrap = probability of correctness – it is a frequency under resampling, not a direct probability.
Mixing up “clade” and “grade” – a grade is a paraphyletic group (missing some descendants); a clade is monophyletic.
Misinterpreting posterior probabilities as p‑values – they are conditional on the model and priors, not frequentist significance.
---
Keep this guide handy – it condenses the high‑yield concepts you’ll need to ace any phylogenetics exam.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or