Core Foundations of the Genome
Understand the structure, types, components, and size variation of genomes across organisms.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What is the definition of a genome?
1 of 22
Summary
Definition and Overview of the Genome
What is a Genome?
A genome is the complete set of genetic information contained within an organism or cell. It consists of the nucleotide sequences of DNA (or RNA in the case of RNA viruses) that encode all the instructions necessary for building and maintaining that organism. Think of it as the complete instruction manual for life—every gene, regulatory sequence, and structural element that makes an organism what it is.
In most eukaryotes (organisms with nucleated cells), the genome is distributed across multiple locations. The nuclear genome resides in the nucleus and contains protein-coding genes, non-coding genes, regulatory sequences, and often substantial amounts of non-functional DNA. However, eukaryotes also contain genetic material outside the nucleus: the mitochondrial genome is found in mitochondria, and in plants and algae, the chloroplast genome (also called the plastome) is located in chloroplasts. These organellar genomes are separate from—and evolutionarily distinct from—the nuclear genome.
Ploidy: How Many Copies?
Most eukaryotes are diploid, meaning each chromosome exists in two copies within the nucleus. One copy comes from each parent. The human reference genome, for example, contains 22 pairs of autosomes (regular chromosomes) plus one pair of sex chromosomes (either XX or XY), for a total of 24 different chromosome types. When we refer to genome size, we typically mean the size of one complete copy—the haploid genome—rather than the diploid cell.
Types of Genomes
Viral Genomes
Viral genomes are remarkably diverse and don't follow the patterns of cellular life. Some viruses have RNA genomes while others have DNA genomes.
RNA virus genomes can be single-stranded or double-stranded. Some RNA viruses package their genome as a single RNA molecule, while others are segmented, meaning the genome is divided into multiple separate RNA molecules. This segmentation is biologically important because all segments must be packaged into a virion for successful infection.
DNA virus genomes are similarly variable. They can be single-stranded or double-stranded, and while many are linear molecules, some are circular (more like bacterial DNA).
Prokaryotic Genomes
Bacteria and archaea typically have a single, circular chromosome located in the nucleoid region (not membrane-bound). However, some prokaryotic species have linear chromosomes or even multiple chromosomes.
An important feature of prokaryotic cells is the presence of plasmids—small, circular DNA molecules that exist independently of the main chromosome. Plasmids carry auxiliary genetic material (often genes for antibiotic resistance or metabolic capabilities) but are not considered part of the organism's core genome.
Eukaryotic Genomes
Eukaryotic genomes consist of one or more linear DNA chromosomes packaged in the nucleus (in contrast to the circular bacterial chromosome). The number of chromosomes varies enormously across species. While humans have 24 chromosome types, the variation is striking: some ant species have as few as one pair, while certain fern species have over 700 pairs.
Organellar Genomes
Mitochondria and chloroplasts retain their own circular chromosomes, inherited from their bacterial ancestors (supporting the endosymbiotic theory). These organellar genomes are much smaller than nuclear genomes and encode only a subset of the proteins needed for organellar function.
Genome Composition: What's Inside?
Coding vs. Noncoding Sequences
The genome is not entirely filled with gene instructions. Coding sequences are DNA regions that carry the instructions to synthesize proteins. However, the proportion of a genome occupied by coding sequences varies dramatically among species. Bacteria are relatively "efficient," with 85-95% coding sequence, while humans use only about 1-2% of their nuclear genome for protein-coding genes.
Noncoding sequences include introns (non-coding portions within genes), genes for non-coding RNA molecules, regulatory regions that control gene expression, and repetitive DNA. Notably, approximately 98% of the human genome consists of noncoding sequences. This doesn't mean 98% is "junk"—many noncoding regions have important regulatory or structural functions—but much of it remains poorly understood.
Repetitive DNA: Tandem Repeats
Tandem repeats are short DNA sequences repeated multiple times in a head-to-tail fashion at the same location. Two important categories are:
Microsatellites: Short tandem repeats consisting of 2–5 base-pair repeat units. For example, the sequence CACACACA contains four repeats of the dinucleotide CA.
Minisatellites: Longer tandem repeats consisting of 30–35 base-pair repeat units.
Both types are highly variable among individuals, making them valuable for DNA fingerprinting and forensic analysis.
Transposable Elements: Mobile DNA
Transposable elements are DNA sequences with the remarkable ability to move around within the genome. This "jumping" of genetic elements was discovered by Barbara McClintock, and we now know they comprise a significant fraction of many eukaryotic genomes—about 45% of the human genome.
There are two main classes, based on their mechanism of movement:
Retrotransposons (copy-and-paste elements) operate through an RNA intermediate. They are transcribed into RNA, then reverse-transcribed back into DNA, and this new copy inserts elsewhere in the genome. The original copy remains in place, so retrotransposons increase in copy number over time. They include:
LINEs (Long Interspersed Nuclear Elements): Can be several kilobases long and encode their own machinery for transposition
SINEs (Short Interspersed Nuclear Elements): Shorter elements that typically cannot transpose independently; they rely on proteins encoded by LINEs
DNA transposons (cut-and-paste elements) operate differently. The element is cut out of one location and pasted into another, typically via a transposase enzyme encoded within the inverted terminal repeats (the palindromic sequences at the element's ends). Because the original copy is removed, DNA transposons don't increase in abundance over time. Most DNA transposons in mammals are now inactive due to accumulated mutations.
The key distinction: retrotransposons copy themselves (increasing in number), while DNA transposons move themselves (maintaining number).
Genome Size and Variation
Defining Genome Size
Genome size is defined as the total number of DNA base pairs in one complete copy of a haploid genome. For humans, this is approximately 3.1 billion base pairs (3.1 billion nucleotides) distributed across 24 different chromosome types. Individual human chromosomes range from about 45 million base pairs (the Y chromosome) to 248 million base pairs (chromosome 1).
What Determines Genome Size?
Interestingly, genome size doesn't correlate well with organism complexity—a phenomenon called the C-value paradox. A single-celled amoeba has a larger genome than a human! This is because genome size is largely determined by the expansion and contraction of repetitive DNA elements, particularly transposable elements.
Organisms with compact genomes, such as many invertebrates (like fruit flies and nematodes), typically have few transposable elements and less repetitive DNA. In contrast, genomes bloated with transposable elements can become enormous. This variation is not primarily driven by gene number but by how much "extra" DNA an organism has accumulated.
<extrainfo>
The variation in genome size also reflects differences in how effectively organisms can eliminate unnecessary DNA. Some organisms have mechanisms that actively remove transposable elements or excess DNA, while others accumulate it passively over evolutionary time.
</extrainfo>
Flashcards
What is the definition of a genome?
All of the genetic information of an organism or cell.
What types of nucleotide sequences can compose a genome?
DNA or RNA (in the case of RNA viruses).
Besides the nucleus, which two organelles in eukaryotes may contain their own genomes?
Mitochondria and chloroplasts.
What does it mean for a eukaryote to be diploid?
Each chromosome is present in two copies in the nucleus.
Which chromosomes are included in the human reference genome?
One copy of each of the 22 autosomes, one X chromosome, and one Y chromosome.
In what physical forms can RNA virus genomes exist?
Single-stranded or double-stranded, and may be segmented.
What is the standard structure of a chromosome in most bacteria and archaea?
A single circular chromosome.
What is the term for auxiliary genetic material in prokaryotes that is not part of the main chromosome?
Plasmids.
What is the shape of the mitochondrial genome?
Circular.
What is the specific name given to the circular chromosome found in chloroplasts?
Plastome.
What is the primary function of coding sequences?
To carry instructions for synthesizing proteins.
Approximately what percentage of the human genome is composed of noncoding sequences?
98%.
What is the definition of tandem repeats?
Short noncoding sequences repeated head-to-tail.
What is the repeat unit size for microsatellites vs. minisatellites?
Microsatellites: 2–5 base pairs; Minisatellites: 30–35 base pairs.
What are transposable elements?
DNA sequences that can change their location within a genome.
What are the two functional classifications of transposable elements?
Copy-and-paste (retrotransposons) and cut-and-paste (DNA transposons).
What is the mechanism used by retrotransposons to move?
DNA is transcribed into RNA, then reverse-transcribed back into DNA for insertion.
What are the two types of non-long terminal repeat retrotransposons?
Long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs).
What specific enzyme is typically encoded by DNA transposons?
Transposase.
How is genome size defined?
The total number of DNA base pairs in one copy of a haploid genome.
What factor is primarily responsible for the expansion and contraction of genome size?
Repetitive DNA elements (especially transposable elements).
What is the total number of nucleotides and chromosomes in the human nuclear genome?
Approximately 3.1 billion nucleotides distributed among 24 linear chromosomes.
Quiz
Core Foundations of the Genome Quiz Question 1: What does the term genome refer to?
- All of the genetic information of an organism or cell (correct)
- Only the protein‑coding genes of an organism
- The set of chromosomes visible during mitosis
- The mitochondrial DNA only
Core Foundations of the Genome Quiz Question 2: Which statement accurately describes viral genomes?
- They may be composed of either RNA or DNA (correct)
- All viral genomes are single‑stranded RNA
- Viral genomes are always circular DNA molecules
- Viruses do not contain genetic material
Core Foundations of the Genome Quiz Question 3: How many nucleotides are in the human nuclear genome?
- Approximately 3.1 billion nucleotides (correct)
- Approximately 1.5 billion nucleotides
- Approximately 5.0 billion nucleotides
- Approximately 6.4 billion nucleotides
Core Foundations of the Genome Quiz Question 4: What is the most common structural form of the chromosome in bacteria and archaea?
- A single circular chromosome (correct)
- Multiple linear chromosomes
- A single linear chromosome
- Several circular plasmids only
Core Foundations of the Genome Quiz Question 5: What structural form do eukaryotic chromosomes in the nucleus typically have?
- Linear DNA chromosomes (correct)
- Circular DNA chromosomes
- Protein filaments
- RNA‑based chromosomes
Core Foundations of the Genome Quiz Question 6: How do microsatellites differ from minisatellites in repeat unit length?
- Microsatellites have 2–5 bp repeats; minisatellites have 30–35 bp repeats (correct)
- Microsatellites have 10–15 bp repeats; minisatellites have 100–200 bp repeats
- Both have 2–5 bp repeat units
- Microsatellites have longer repeat units than minisatellites
Core Foundations of the Genome Quiz Question 7: What is the structural form of the mitochondrial genome in most eukaryotes?
- A circular chromosome (correct)
- A linear chromosome
- Multiple separate plasmids
- A segmented RNA genome
Core Foundations of the Genome Quiz Question 8: Genome size is defined as the total number of base pairs in which version of the genome?
- One haploid set of chromosomes (correct)
- One diploid set of chromosomes
- The mitochondrial genome only
- The combined nuclear and organellar DNA
Core Foundations of the Genome Quiz Question 9: What term is used for the large portion of the nuclear genome that does not code for proteins and has no known function?
- Junk DNA (correct)
- Intron
- Exon
- Promoter region
Core Foundations of the Genome Quiz Question 10: Which of the following is NOT a noncoding sequence in the genome?
- Exon (correct)
- Intron
- tRNA gene
- Regulatory region
Core Foundations of the Genome Quiz Question 11: What is the initial molecular step that retrotransposons undergo before inserting into a new genomic location?
- Transcription of their DNA into RNA (correct)
- Reverse‑transcription of RNA into DNA
- Translation of a transposase protein
- Excising from the original DNA site
Core Foundations of the Genome Quiz Question 12: Copy‑and‑paste transposable elements are called ___, while cut‑and‑paste elements are called ___.
- retrotransposons; DNA transposons (correct)
- DNA transposons; retrotransposons
- microsatellites; minisatellites
- LINEs; SINEs
Core Foundations of the Genome Quiz Question 13: What term describes a cell that contains a single copy of each chromosome?
- haploid (correct)
- diploid
- tetraploid
- polyploid
Core Foundations of the Genome Quiz Question 14: Coding sequences in the genome primarily serve to produce which type of cellular molecule?
- Proteins (correct)
- Carbohydrates
- Lipids
- Polysaccharides
Core Foundations of the Genome Quiz Question 15: True or false: The percentage of a genome that consists of coding sequences is nearly identical across all species.
- False (correct)
- True
- Cannot be determined
- Depends on the organism's size
Core Foundations of the Genome Quiz Question 16: Which enzyme is encoded by DNA transposons to enable their movement?
- Transposase (correct)
- Reverse transcriptase
- DNA polymerase
- RNA polymerase
What does the term genome refer to?
1 of 16
Key Concepts
Genomic Structures
Genome
Nuclear genome
Mitochondrial genome
Plasmid
Genomic Elements
Transposable element
Retrotransposon
DNA transposon
Tandem repeat
Genomic Metrics
Ploidy
Genome size
Definitions
Genome
The complete set of genetic material, including all DNA or RNA sequences, present in an organism or cell.
Nuclear genome
The collection of chromosomes located in the cell nucleus that contains protein‑coding genes, non‑coding genes, and regulatory elements.
Mitochondrial genome
A circular DNA molecule residing in mitochondria that encodes genes essential for oxidative phosphorylation.
Plasmid
An extrachromosomal, often circular, DNA molecule in prokaryotes that can replicate independently and carry auxiliary genes.
Transposable element
A DNA sequence capable of moving to new positions within a genome, either by copy‑and‑paste (retrotransposon) or cut‑and‑paste (DNA transposon) mechanisms.
Retrotransposon
A type of transposable element that transcribes its DNA into RNA, then reverse‑transcribes the RNA back into DNA for insertion elsewhere in the genome.
DNA transposon
A cut‑and‑paste transposable element that encodes a transposase enzyme and moves directly as DNA.
Tandem repeat
A short DNA motif repeated consecutively head‑to‑tail, including microsatellites (2–5 bp) and minisatellites (30–35 bp).
Ploidy
The number of complete sets of chromosomes in a cell, with diploid organisms having two copies of each chromosome.
Genome size
The total number of base pairs in a haploid set of an organism’s DNA, often measured in gigabases for eukaryotes.