Index (database) Study Guide
Study Guide
📖 Core Concepts
Database index – a separate data structure that stores copies of one or more column values (or expressions) together with pointers to the original rows, enabling fast look‑ups.
Key vs. pointer – the key is the indexed value; the pointer (row ID, page number, offset) lets the engine fetch the full row instantly.
Clustered vs. non‑clustered – clustered index dictates the physical order of rows on disk; non‑clustered index stores keys in a separate structure that points back to rows.
Composite index column order – only a leading column (or left‑most prefix) can be used independently; later columns are useful only when all preceding columns are constrained.
Functional / partial index – stores results of an expression (e.g., UPPER(name)) or only rows satisfying a predicate, shrinking size and supporting special queries.
Uniqueness & primary key – a unique index enforces that indexed columns contain no duplicates; every primary key automatically gets a unique index.
Foreign‑key support – both referencing and referenced columns are usually indexed to keep INSERT/UPDATE/DELETE checks fast.
Exclusion constraint – prevents any two rows from satisfying a given predicate; requires an index that can quickly locate matching rows.
Covering index – contains every column a query needs, so the engine never reads the base table.
---
📌 Must Remember
Lookup complexity:
Full table scan → $O(N)$
Typical B‑tree index → $O(\log N)$
Hash index (exact match) → $O(1)$ average
Only one clustered index per table.
Composite index usage – can use the index for queries that filter on the first k columns (prefix).
Leading wildcard (LIKE '%foo') = not sargable → forces full scan.
Reverse index trick – index reverse(col) to make a leading‑wildcard search sargable.
Bitmap index best for low‑cardinality columns (few distinct values).
Hash index works only for equality (=) predicates; not for range (>, <).
Including columns (INCLUDE (col)) makes an index covering without affecting key order.
---
🔄 Key Processes
Creating a basic index
sql
CREATE INDEX idxname ON table (col1, col2);
Creating a functional (expression) index
sql
CREATE INDEX idxupperlast ON users (UPPER(lastname));
Creating a partial index
sql
CREATE INDEX idxactive ON orders (customerid) WHERE status = 'active';
Creating a covering index with included columns
sql
CREATE INDEX idxcov ON sales (orderdate) INCLUDE (totalamount, customerid);
Enforcing a uniqueness constraint
sql
CREATE UNIQUE INDEX idxuniqemail ON users (email);
Using a reverse index for leading wildcards
sql
CREATE INDEX idxrevemail ON customers (reverse(emailaddress));
SELECT emailaddress
FROM customers
WHERE reverse(emailaddress) LIKE reverse('%@wikipedia.org');
---
🔍 Key Comparisons
Clustered vs. Non‑clustered
Clustered: rows stored in key order; one per table; ideal for range scans.
Non‑clustered: separate structure; many per table; good for look‑ups on non‑PK columns.
Bitmap vs. B‑tree index
Bitmap: bit‑maps per distinct value; excels on low‑cardinality columns; read‑heavy workloads.
B‑tree: balanced tree; works for high‑cardinality and range queries.
Hash vs. B‑tree for equality
Hash: constant‑time $O(1)$ but no ordering; only equality.
B‑tree: $O(\log N)$, supports range and ordering.
Partial vs. Full index
Partial: indexes only rows meeting a condition → smaller, faster for targeted queries.
Full: indexes every row → larger, broader applicability.
---
⚠️ Common Misunderstandings
“All indexes speed up queries.” – Indexes help only when the query predicate matches the index key order and is sargable; otherwise they add overhead.
“More indexes always improve performance.” – Each index costs storage and slows INSERT/UPDATE/DELETE because the index must be maintained.
“Composite index can be used starting from any column.” – Only the left‑most prefix can be used; a query filtering on the second column alone cannot use the index efficiently.
“Hash indexes support LIKE or range scans.” – They do not; hash indexes work only for exact match (=).
“Bitmap indexes are good for any column.” – They become inefficient on high‑cardinality columns (many distinct values).
---
🧠 Mental Models / Intuition
“Index = table shortcut map.” – Think of the index as a phone book: you look up a name (key) to get the page number (pointer) instead of scanning every page.
“Clustered = physical arrangement of the book.” – When the book is already sorted by the key, flipping to a page is a single, fast motion (sequential read).
“Composite index = layered filter.” – The first layer (first column) decides the main bucket; subsequent layers further narrow within that bucket.
---
🚩 Exceptions & Edge Cases
Leading wildcard (LIKE '%x') – never uses a normal index; must rewrite or use a reverse index.
Full‑text search – requires an inverted index, not a B‑tree.
Unique index on nullable columns – some DBMS allow multiple NULL values because NULL is not considered equal.
Sparse index – only useful when data is stored in fixed‑size blocks and many rows share the same leading key.
---
📍 When to Use Which
Range scans / ORDER BY → clustered index (or non‑clustered with appropriate ordering).
Exact match on high‑cardinality column → hash index (if supported) or B‑tree.
Low‑cardinality column (e.g., gender, status) → bitmap index.
Full‑text search → inverted index.
Queries need only a few columns → covering index with INCLUDE.
Frequent INSERT/UPDATE, many indexes → limit indexes; prefer non‑clustered over clustered for write‑heavy tables.
Searches with functions or partial data → functional or partial index.
---
👀 Patterns to Recognize
Predicate starts with indexed column → index likely used.
WHERE clause uses = or range on leading column(s) of a composite index → index hit.
LIKE 'text%' → index usable; LIKE '%text' or %text% → index not usable.
Multiple columns in SELECT but not in WHERE → consider covering index with INCLUDE.
Foreign‑key column appears in join condition → ensure both sides have indexes.
---
🗂️ Exam Traps
Choosing a bitmap index for a high‑cardinality column – will be marked wrong; bitmap excels only for few distinct values.
Assuming any index makes a LIKE '%abc' fast – leading wildcard defeats standard indexes; only reverse or full‑text solves it.
Creating a clustered index on a column that is frequently updated – hurts write performance; exam may ask why this is a bad idea.
Believing a hash index can satisfy ORDER BY – hash indexes have no ordering; a B‑tree or clustered index is needed.
Omitting the unique constraint when primary key is defined – primary key automatically creates a unique index; forgetting this may lead to duplicate‑key errors.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or