Subjects/Science/Biology/Molecular Biology/Transcriptomics

Transcriptomics Study Guide

Study Guide

📖 Core Concepts Transcriptome – the complete set of RNA transcripts present in a cell at a given moment; provides a functional snapshot of gene activity. mRNA vs. ncRNA – messenger RNA carries coding information to ribosomes; non‑coding RNAs (lncRNA, miRNA, siRNA, etc.) perform regulatory or structural roles. Differential regulation – the same genome can produce many cell types by turning different sets of genes on/off. Hybridisation – annealing of a single‑stranded nucleic‑acid to its complementary strand; basis of microarray detection. RNA‑Seq workflow – isolate RNA → convert to cDNA (or direct RNA) → library prep → high‑throughput sequencing → bioinformatic analysis (QC, alignment, quantification, DE). --- 📌 Must Remember RNA isolation: RNase‑free, chaotropic salts, DNase treatment, mRNA enrichment (poly‑A capture) or rRNA depletion. Microarray principle: fluorescently labeled RNA hybridises to fixed probes; fluorescence intensity ≈ transcript abundance. RNA‑Seq read types: Single‑end – one direction, cheaper, sufficient for expression levels. Paired‑end – both ends, improves alignment, splice‑junction detection, isoform identification. Strand‑specific sequencing preserves transcription direction → better for overlapping genes. Normalization: adjusts for library size/composition before differential expression (DE) testing. Key tools: SAMtools – manipulate SAM/BAM alignment files. Kallisto – pseudo‑alignment for rapid transcript quantification. RSEM – accurate quantification with/without reference. DESeq2 – moderated fold‑change & dispersion estimation for DE analysis. Quality metrics: per‑base quality scores, GC content, over‑represented k‑mers, duplication rate. --- 🔄 Key Processes RNA Extraction Disrupt tissue → add chaotropic salts → separate RNA from DNA/proteins → precipitate or column‑purify → DNase treat → (optional) poly‑A capture or rRNA depletion. Library Preparation (a) Size‑select (e.g., small RNAs on gel). (b) Fragment mRNA (chemical, sonication, transposase). (c) Reverse‑transcribe with adaptor‑bearing primers. (d) PCR amplify; add Unique Molecular Identifiers (UMIs) for bias correction. (e) Spike‑in controls → assess GC bias, fragment length, positional bias. Sequencing Strategies Choose single‑end for cost‑effective expression profiling; paired‑end for isoform/alternative splicing studies; strand‑specific when overlapping transcripts matter. Data Analysis Pipeline QC (FastQC/FaQCs) → trim low‑quality bases. Alignment (splice‑aware aligner) → generate SAM/BAM. Quantification – gene‑level counts (HTSeq) or transcript‑level (Kallisto, RSEM, StringTie2). Normalization – library‑size scaling, composition adjustment. Differential Expression – statistical testing (DESeq2) → p‑value & FDR. Validation – qRT‑PCR with isoform‑specific primers; correlate Ct values with RNA‑Seq counts. --- 🔍 Key Comparisons Microarray vs. RNA‑Seq Microarray: probes for known sequences, limited dynamic range, hybridisation‑based intensity. RNA‑Seq: sequence‑agnostic, detects novel transcripts, >5‑order dynamic range, quantifies low‑abundance RNAs. Single‑end vs. Paired‑end reads Single‑end: cheaper, adequate for gene‑level expression. Paired‑end: more expensive, resolves splice junctions, improves isoform reconstruction. Poly‑A capture vs. rRNA depletion Poly‑A: enriches mRNA, misses non‑polyadenylated ncRNAs. rRNA depletion: retains both poly‑A and many ncRNAs (e.g., lncRNA, miRNA). --- ⚠️ Common Misunderstandings “RNA‑Seq only measures mRNA.” – Library prep can retain ncRNAs; direct RNA nanopore sequencing reads native RNAs, including modifications. “Higher read count always means higher expression.” – Must normalize for library size & composition; raw counts are not comparable across samples. “Microarray intensity is linear across all genes.” – Signal saturates for high‑abundance transcripts; low‑abundance genes may be below detection. “All aligners treat multi‑mapped reads the same.” – Some discard them, others assign probabilistically; choice affects quantification of gene families. --- 🧠 Mental Models / Intuition “Transcriptome as a city’s traffic map.” – Roads (genes) may be open (high expression) or closed (silenced); RNA‑Seq is a live traffic camera capturing every car (read) passing by. “UMIs are barcodes on each car.” – They let you count each original molecule, not the duplicates produced by PCR. “Paired‑end reads are a two‑person search party.” – Knowing both ends narrows down where a read belongs, especially across complex intersections (splice sites). --- 🚩 Exceptions & Edge Cases Degraded RNA → loss of 5′ ends → biased coverage; snap‑freeze samples, assess RIN before library prep. Highly GC‑rich transcripts → may show low coverage; spike‑ins help detect bias. Single‑cell RNA‑Seq – extremely low input; requires cDNA amplification and UMIs to control amplification bias. Direct RNA nanopore sequencing – bypasses cDNA conversion, captures modifications, but has higher error rates than Illumina. --- 📍 When to Use Which Goal: Known gene expression profiling → Microarray (if budget tight & genome well‑annotated). Goal: Discover novel transcripts / splice isoforms → RNA‑Seq with paired‑end, strand‑specific library. Goal: Study small ncRNAs (miRNA, siRNA) → Size‑selected library, short‑read sequencing. Goal: Quantify absolute molecule numbers (e.g., single‑cell) → Use UMIs + spike‑in controls. Goal: Detect RNA modifications → Direct RNA nanopore sequencing. --- 👀 Patterns to Recognize High GC + low coverage → likely sequencing bias → check QC plots. Many reads mapping to multiple loci → gene families or repetitive elements; consider pseudo‑alignment methods. Sharp peaks at transcript 5′ ends → Cap Analysis of Gene Expression (CAGE) data → transcription start sites. Consistent differential expression across replicates → robust signal; single outlier samples often have QC warnings. --- 🗂️ Exam Traps “RNA‑Seq always uses paired‑end reads.” – Not required; single‑end is acceptable for simple expression studies. “Microarray probes must be exactly 25‑mers.” – Only high‑density Affymetrix arrays use 25‑mers; other platforms use longer probes. “All splice‑aware aligners are equally fast.” – Speed varies; some prioritize sensitivity (e.g., Tophat) while others (e.g., STAR) are faster. “Normalization eliminates the need for biological replicates.” – Normalization corrects technical variation; replicates are still essential for statistical power. “UMIs replace the need for spike‑in controls.” – UMIs correct amplification bias; spike‑ins assess library quality and technical variance.

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or