Normalization (database) Study Guide
Study Guide
📖 Core Concepts
Database Normalization – Organizing tables/columns to enforce dependencies, cut redundancy, and keep data integrity.
Anomalies – Undesirable side‑effects when data is inserted, updated, or deleted (insertion, update, deletion anomalies).
Normal Forms – Successive levels of table design quality: 1NF → 2NF → 3NF → BCNF → 4NF → 5NF.
Lossless Join Decomposition – Splitting a relation into tables that can be recombined (joined) without losing any rows.
Denormalization – Deliberate re‑introduction of redundancy for performance, opposite of normalization.
---
📌 Must Remember
1NF: Every field holds a single atomic value; no repeating groups or sets.
2NF: All non‑key attributes depend on the entire candidate key (no partial dependency).
3NF: No transitive dependencies; non‑key attributes depend only on candidate keys.
BCNF: Every determinant is a candidate key (automatically satisfied if 3NF with only a simple primary key).
4NF: No multi‑valued dependencies (MVDs) that are not implied by a key.
5NF: Decompositions are lossless and dependency‑preserving for join‑project‑join cycles.
Primary objectives: eliminate insertion, update, deletion anomalies; make future extensions easy.
Denormalization tip: use only when query performance testing shows a clear bottleneck; keep redundancy minimal.
---
🔄 Key Processes
Identify candidate keys → list all minimal sets of attributes that uniquely identify rows.
Check 1NF: Split multi‑valued columns into separate rows or tables.
Check 2NF: For each non‑key attribute, verify it depends on the whole candidate key; move partial dependencies to new tables.
Check 3NF: Locate any non‑key → non‑key dependencies (transitive); place the dependent attribute in its own table.
BCNF test: For each functional dependency \(X \rightarrow Y\), ensure \(X\) is a candidate key; if not, decompose.
4NF test: Detect MVDs \(X \twoheadrightarrow Y\); if \(X\) is not a superkey, decompose.
5NF test: Ensure that joining the decomposed tables recreates the original relation without loss (lossless join).
---
🔍 Key Comparisons
1NF vs 2NF – 1NF stops multi‑valued fields; 2NF stops partial dependencies on a composite key.
2NF vs 3NF – 2NF still allows transitive non‑key dependencies; 3NF removes them.
3NF vs BCNF – Both eliminate transitive dependencies; BCNF is stricter: every determinant must be a candidate key.
BCNF vs 4NF – BCNF handles functional dependencies; 4NF adds protection against multi‑valued dependencies.
Normalization vs Denormalization – Normalization reduces redundancy for integrity; denormalization adds redundancy for speed.
---
⚠️ Common Misunderstandings
“3NF = fully normalized” – True for most practical systems, but BCNF can still be violated in edge cases.
“Denormalization is always bad” – Not true; it can be a purposeful optimization if used judiciously.
“Lossless join is automatic after 3NF” – Not guaranteed; higher forms (4NF/5NF) explicitly address lossless join concerns.
“Composite key ⇒ automatically BCNF” – Only if there are no non‑key attributes; otherwise BCNF may still be violated.
---
🧠 Mental Models / Intuition
“Dependency as a chain” – Visualize each attribute pointing to the attribute(s) that determine it. Break the chain at the candidate key; any break before the key signals a violation.
“Atoms in a molecule” – 1NF forces every column to be an indivisible atom; higher forms group atoms into stable molecules (tables) without dangling bonds (anomalies).
---
🚩 Exceptions & Edge Cases
BCNF violations with overlapping candidate keys – May require a trade‑off between BCNF and dependency preservation.
Multi‑valued dependencies in 4NF – Rare in typical business apps; appear in scenarios like “students ↔ courses ↔ semesters”.
5NF relevance – Mostly academic; only needed when complex join dependencies exist (e.g., projection‑join cycles).
---
📍 When to Use Which
Designing a new schema: aim for 3NF (covers most anomalies) → upgrade to BCNF if functional dependencies cause key‑determinant conflicts.
Encountering many‑to‑many relationships: consider 4NF to eliminate MVDs.
Complex join requirements with many tables: test for 5NF to guarantee lossless reconstruction.
Performance tuning: evaluate query plans; if joins become costly, apply controlled denormalization (add redundant columns or summary tables).
---
👀 Patterns to Recognize
Repeating groups → 1NF violation (e.g., a column storing “Math, English”).
Partial key → 2NF violation (e.g., attribute depends only on part of a composite PK).
Non‑key → non‑key → key chain → 3NF violation (transitive).
Determinant not a key → BCNF violation (functional dependency with non‑key left side).
Same key determines two independent attribute sets → 4NF violation (MVD).
---
🗂️ Exam Traps
“If a table is in 3NF, it must be in BCNF.” – False when a non‑key determinant exists.
“Denormalization is never acceptable.” – Wrong; many high‑performance designs deliberately denormalize.
“Lossless join is guaranteed after 3NF.” – Not always; need to verify with join dependency analysis.
“Only primary key matters for dependencies.” – Overlooks candidate keys and composite keys; partial dependencies can hide behind non‑primary candidate keys.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or