Biological database Study Guide
Study Guide
📖 Core Concepts
Biological database – A curated library of experimental, literature‑derived, or computational biological information (genes, proteins, metabolites, images, specimens).
Purpose – Enables analysis of structures, pathways, evolution; supports disease research, drug development, and species discovery.
Classification – Grouped by the type of content:
Molecular (sequences, structures)
Functional (physiology, phenotypes)
Taxonomic (species hierarchies)
Image/Media (microscopy, radiographs)
Specimen (museum samples)
Technical foundation – Relational database theory & information‑retrieval methods give the schema and query capabilities.
Data representation – Usually semi‑structured: tables, key‑delimited records, or XML.
Cross‑reference – Stable accession numbers link the same entity across multiple databases despite naming changes.
---
📌 Must Remember
Key databases: GenBank (DNA), UniProt (protein), Protein Data Bank (3‑D structures), SCOP & CATH (structure classification), PubMed (bibliographic medical literature).
Major challenges: data distribution, naming inconsistencies, interoperability, redundancy.
Catalogue of Life – aims to list every accepted species on Earth.
Nucleic Acids Research (NAR) Database Issue – annual journal issue that catalogs public biological databases.
---
🔄 Key Processes
Submitting a new sequence
Prepare FASTA file → submit to GenBank → receive accession number → accession used for cross‑references.
Linking across databases
Entry receives accession → other databases store the same accession → stable linkage despite species‑name updates.
Retrieving information
Define query (e.g., gene name) → information‑retrieval engine searches indexed records → returns matching entries from relational tables or XML stores.
---
🔍 Key Comparisons
Molecular vs. Functional databases
Molecular: store raw sequences/structures (e.g., GenBank, PDB).
Functional: store physiological or phenotype data (e.g., enzyme activity, ecological interactions).
Taxonomic vs. Specimen databases
Taxonomic: catalog species and hierarchical classification (Catalogue of Life).
Specimen: document physical samples and museum holdings.
GenBank vs. UniProt
GenBank: DNA/RNA sequences, broader organism coverage.
UniProt: curated protein sequences with functional annotation.
---
⚠️ Common Misunderstandings
“All protein information lives in UniProt.” – Protein structures are stored separately in PDB, SCOP, CATH; redundancy can occur.
“Accession numbers change with species names.” – Accession numbers are stable identifiers; they protect against naming updates.
“A single database contains every type of biological data.” – Databases are specialized; integration relies on cross‑references.
---
🧠 Mental Models / Intuition
Library analogy: Think of each database as a section of a large library (DNA → GenBank shelf, proteins → UniProt shelf, 3‑D structures → PDB shelf). The catalog number (accession) lets you locate the same book across shelves.
Puzzle pieces: Each database provides a piece of the biological “puzzle.” To see the full picture, you must fit pieces together using accession numbers.
---
🚩 Exceptions & Edge Cases
Redundant storage – The same protein sequence may appear in both a sequence database (UniProt) and a structure database (PDB). Redundancy is intentional for specialized queries but can inflate storage.
Naming changes – When taxonomy is revised, species names change; only the accession remains reliable.
---
📍 When to Use Which
Sequence lookup → Use GenBank for DNA/RNA, UniProt for proteins.
3‑D structural analysis → Query PDB, SCOP, or CATH.
Evolutionary relationships → Turn to taxonomic databases (Catalogue of Life) or phylogenetic resources.
Literature search → Use PubMed for biomedical articles.
Cross‑disciplinary integration → Rely on accession numbers to join molecular, functional, and taxonomic data.
---
👀 Patterns to Recognize
Accession‑number pattern (e.g., “NM001256789” for GenBank mRNA) signals a stable cross‑reference.
File‑format clues: FASTA → sequence data; PDB → 3‑D coordinates; XML tags → structured metadata.
Redundancy flag: Same identifier appearing in multiple database tables suggests overlapping content (e.g., protein sequence in UniProt and PDB).
---
🗂️ Exam Traps
Distractor: “Protein structure data are stored in UniProt.” – Wrong; UniProt is for sequences; structures are in PDB/SCOP/CATH.
Trap: Assuming “accession numbers change with taxonomy.” – Incorrect; they remain constant.
Near‑miss: Confusing “Catalogue of Life” (taxonomic catalog) with “GenBank” (sequence repository). – Remember their distinct focus: species classification vs. genetic sequences.
Misleading choice: Selecting “PubMed” as a source for raw DNA sequences. – PubMed indexes literature, not primary sequence data.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or