Subjects/Science/Biology/Biology/Biological database

Biological database Study Guide

Study Guide

📖 Core Concepts Biological database – A curated library of experimental, literature‑derived, or computational biological information (genes, proteins, metabolites, images, specimens). Purpose – Enables analysis of structures, pathways, evolution; supports disease research, drug development, and species discovery. Classification – Grouped by the type of content: Molecular (sequences, structures) Functional (physiology, phenotypes) Taxonomic (species hierarchies) Image/Media (microscopy, radiographs) Specimen (museum samples) Technical foundation – Relational database theory & information‑retrieval methods give the schema and query capabilities. Data representation – Usually semi‑structured: tables, key‑delimited records, or XML. Cross‑reference – Stable accession numbers link the same entity across multiple databases despite naming changes. --- 📌 Must Remember Key databases: GenBank (DNA), UniProt (protein), Protein Data Bank (3‑D structures), SCOP & CATH (structure classification), PubMed (bibliographic medical literature). Major challenges: data distribution, naming inconsistencies, interoperability, redundancy. Catalogue of Life – aims to list every accepted species on Earth. Nucleic Acids Research (NAR) Database Issue – annual journal issue that catalogs public biological databases. --- 🔄 Key Processes Submitting a new sequence Prepare FASTA file → submit to GenBank → receive accession number → accession used for cross‑references. Linking across databases Entry receives accession → other databases store the same accession → stable linkage despite species‑name updates. Retrieving information Define query (e.g., gene name) → information‑retrieval engine searches indexed records → returns matching entries from relational tables or XML stores. --- 🔍 Key Comparisons Molecular vs. Functional databases Molecular: store raw sequences/structures (e.g., GenBank, PDB). Functional: store physiological or phenotype data (e.g., enzyme activity, ecological interactions). Taxonomic vs. Specimen databases Taxonomic: catalog species and hierarchical classification (Catalogue of Life). Specimen: document physical samples and museum holdings. GenBank vs. UniProt GenBank: DNA/RNA sequences, broader organism coverage. UniProt: curated protein sequences with functional annotation. --- ⚠️ Common Misunderstandings “All protein information lives in UniProt.” – Protein structures are stored separately in PDB, SCOP, CATH; redundancy can occur. “Accession numbers change with species names.” – Accession numbers are stable identifiers; they protect against naming updates. “A single database contains every type of biological data.” – Databases are specialized; integration relies on cross‑references. --- 🧠 Mental Models / Intuition Library analogy: Think of each database as a section of a large library (DNA → GenBank shelf, proteins → UniProt shelf, 3‑D structures → PDB shelf). The catalog number (accession) lets you locate the same book across shelves. Puzzle pieces: Each database provides a piece of the biological “puzzle.” To see the full picture, you must fit pieces together using accession numbers. --- 🚩 Exceptions & Edge Cases Redundant storage – The same protein sequence may appear in both a sequence database (UniProt) and a structure database (PDB). Redundancy is intentional for specialized queries but can inflate storage. Naming changes – When taxonomy is revised, species names change; only the accession remains reliable. --- 📍 When to Use Which Sequence lookup → Use GenBank for DNA/RNA, UniProt for proteins. 3‑D structural analysis → Query PDB, SCOP, or CATH. Evolutionary relationships → Turn to taxonomic databases (Catalogue of Life) or phylogenetic resources. Literature search → Use PubMed for biomedical articles. Cross‑disciplinary integration → Rely on accession numbers to join molecular, functional, and taxonomic data. --- 👀 Patterns to Recognize Accession‑number pattern (e.g., “NM001256789” for GenBank mRNA) signals a stable cross‑reference. File‑format clues: FASTA → sequence data; PDB → 3‑D coordinates; XML tags → structured metadata. Redundancy flag: Same identifier appearing in multiple database tables suggests overlapping content (e.g., protein sequence in UniProt and PDB). --- 🗂️ Exam Traps Distractor: “Protein structure data are stored in UniProt.” – Wrong; UniProt is for sequences; structures are in PDB/SCOP/CATH. Trap: Assuming “accession numbers change with taxonomy.” – Incorrect; they remain constant. Near‑miss: Confusing “Catalogue of Life” (taxonomic catalog) with “GenBank” (sequence repository). – Remember their distinct focus: species classification vs. genetic sequences. Misleading choice: Selecting “PubMed” as a source for raw DNA sequences. – PubMed indexes literature, not primary sequence data.

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or