Database management system Study Guide
Study Guide
📖 Core Concepts
Database – Organized collection of data managed by a Database Management System (DBMS) that stores, retrieves, and analyzes data.
Database System – DBMS + the actual database + applications that use it.
Three‑level Architecture
External view – How individual users see the data.
Conceptual view – Global logical schema for developers/administrators.
Internal view – Physical storage layout, indexes, files.
Data Independence – Changes in one level (e.g., storage) do not require changes in higher‑level views.
Relational Model – Data in tables (rows = records, columns = attributes). Primary keys uniquely identify rows; foreign keys create relationships.
ACID Properties – Guarantees for a transaction:
Atomicity – All‑or‑nothing.
Consistency – Database stays valid.
Isolation – Transactions don’t interfere.
Durability – Committed changes survive failures.
CAP Theorem – In a distributed system you can have at most two of Consistency, Availability, Partition tolerance.
Normalization – Decompose tables so each fact is stored once, eliminating insert/update/delete anomalies.
Materialized View – Physically stored query result for fast reads; must be refreshed.
Replication – Copies of data for higher availability and load balancing.
📌 Must Remember
Primary key = unique row identifier; foreign key links tables.
SQL = single language that covers DDL, DML, DQL, DCL.
Data definition = CREATE / ALTER / DROP objects.
Data manipulation = INSERT, UPDATE, DELETE.
Data query = SELECT with WHERE, JOIN, GROUP BY, HAVING.
Data control = GRANT / REVOKE permissions.
Indexing speeds look‑ups; typical types: B‑tree (row‑oriented) and column‑oriented indexes.
Three major DBMS architectures:
Shared‑memory – common RAM, multiple CPUs.
Shared‑disk – each CPU has own memory, shares disks.
Shared‑nothing – each node has its own memory and disk (eliminates contention).
Backup vs. Restore – Backup captures full state; Restore re‑creates a prior state (point‑in‑time recovery).
Eventual consistency – Guarantees that replicas will converge eventually, used by many NoSQL systems.
🔄 Key Processes
Design → Logical → Physical
Conceptual model (ER diagram) → Logical schema (tables, keys) → Physical design (storage format, indexes).
Transaction Execution
Begin → Perform reads/writes → Commit (make durable) or Rollback (undo).
Query Optimization
Parse → Generate logical plan → Apply rewrite rules → Choose physical plan (indexes, join order) → Execute.
Backup Procedure
Initiate full or incremental backup → Store backup files → Verify integrity.
Restore Procedure
Select backup set → Apply logs (if point‑in‑time) → Bring database online.
🔍 Key Comparisons
Relational vs. NoSQL (Not‑Only‑SQL)
Schema: Fixed vs. flexible.
Consistency: ACID vs. eventual.
Query: SQL joins vs. key/value or document queries.
Shared‑memory vs. Shared‑disk vs. Shared‑nothing
Resource sharing: RAM only vs. disk only vs. both private.
Scalability: Shared‑nothing scales best, shared‑memory limited by bus contention.
Row‑oriented vs. Column‑oriented storage
Workload: OLTP (many writes) vs. OLAP (large scans, aggregates).
Compression: Column stores compress better due to homogeneous data.
⚠️ Common Misunderstandings
“SQL = only SELECT” – SQL also defines schemas (DDL) and controls access (DCL).
“NoSQL = no structure” – Document and column‑family stores still enforce a schema at the application level.
“Normalization always best” – Over‑normalization can hurt performance; denormalization is common in OLAP and NoSQL.
“Replication = backup” – Replication provides live copies for availability; backups are immutable snapshots for recovery.
🧠 Mental Models / Intuition
Table as a spreadsheet – rows = records, columns = attributes; primary key = row label.
Transaction as a bank transfer – money leaves one account and appears in another or nothing changes (atomic).
CAP Triangle – imagine a triangle; you can only pick two corners at a time for a distributed system.
Index as a book’s table of contents – lets you jump directly to the data without scanning every page.
🚩 Exceptions & Edge Cases
Materialized view staleness – must be refreshed after base‑table updates, otherwise queries return outdated data.
Eventual consistency windows – reads may return stale data shortly after a write.
Shared‑nothing failures – loss of a node removes both compute and its data slice; replication needed to avoid data loss.
Normalization anomalies – only apply up to 3NF/BCNF for typical OLTP; higher normal forms rarely needed.
📍 When to Use Which
Choose Relational DB when you need strong ACID guarantees, complex joins, and a fixed schema (e.g., financial systems).
Choose Document Store for semi‑structured JSON‑like data with frequent schema changes (e.g., content management).
Choose Key‑Value Store for simple look‑ups by a unique key with massive scale (caching, session storage).
Choose Column‑Family for wide tables with many columns and analytical workloads (time‑series, log analytics).
Use Shared‑nothing architecture for horizontally scalable workloads; use Shared‑disk for smaller clusters needing easy failover.
👀 Patterns to Recognize
JOIN on primary‑foreign key → typical relational relationship.
Frequent “SELECT … WHERE indexedcolumn = …” → candidate for an index.
High write‑to‑read ratio with simple look‑ups → key‑value or document DB may be more efficient.
Repeated aggregation over many rows → column‑oriented storage or materialized view improves performance.
CAP trade‑off clues – if the question mentions network partitions, decide which two guarantees the system favors.
🗂️ Exam Traps
“All NoSQL systems are eventually consistent.” – Some (e.g., NewSQL) provide strong consistency.
“Normalization eliminates all redundancy.” – Denormalization is deliberately used for performance; not all redundancy is bad.
“Replication automatically provides backup.” – Replicas are live copies; they do not protect against logical errors or accidental deletes.
“Shared‑nothing means no communication between nodes.” – Nodes still coordinate (e.g., for distributed transactions) but they do not share memory/disk.
“Materialized view can be updated directly.” – Usually read‑only; updates must be propagated to base tables.
---
Focus on these high‑yield points during your final review; they cover the core terminology, guarantees, architectures, and decision‑making that most exam questions test.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or