Subjects/Technology/Data and AI/Database/Index (database)

Index (database) Study Guide

Study Guide

📖 Core Concepts Database index – a separate data structure that stores copies of one or more column values (or expressions) together with pointers to the original rows, enabling fast look‑ups. Key vs. pointer – the key is the indexed value; the pointer (row ID, page number, offset) lets the engine fetch the full row instantly. Clustered vs. non‑clustered – clustered index dictates the physical order of rows on disk; non‑clustered index stores keys in a separate structure that points back to rows. Composite index column order – only a leading column (or left‑most prefix) can be used independently; later columns are useful only when all preceding columns are constrained. Functional / partial index – stores results of an expression (e.g., UPPER(name)) or only rows satisfying a predicate, shrinking size and supporting special queries. Uniqueness & primary key – a unique index enforces that indexed columns contain no duplicates; every primary key automatically gets a unique index. Foreign‑key support – both referencing and referenced columns are usually indexed to keep INSERT/UPDATE/DELETE checks fast. Exclusion constraint – prevents any two rows from satisfying a given predicate; requires an index that can quickly locate matching rows. Covering index – contains every column a query needs, so the engine never reads the base table. --- 📌 Must Remember Lookup complexity: Full table scan → $O(N)$ Typical B‑tree index → $O(\log N)$ Hash index (exact match) → $O(1)$ average Only one clustered index per table. Composite index usage – can use the index for queries that filter on the first k columns (prefix). Leading wildcard (LIKE '%foo') = not sargable → forces full scan. Reverse index trick – index reverse(col) to make a leading‑wildcard search sargable. Bitmap index best for low‑cardinality columns (few distinct values). Hash index works only for equality (=) predicates; not for range (>, <). Including columns (INCLUDE (col)) makes an index covering without affecting key order. --- 🔄 Key Processes Creating a basic index sql CREATE INDEX idxname ON table (col1, col2); Creating a functional (expression) index sql CREATE INDEX idxupperlast ON users (UPPER(lastname)); Creating a partial index sql CREATE INDEX idxactive ON orders (customerid) WHERE status = 'active'; Creating a covering index with included columns sql CREATE INDEX idxcov ON sales (orderdate) INCLUDE (totalamount, customerid); Enforcing a uniqueness constraint sql CREATE UNIQUE INDEX idxuniqemail ON users (email); Using a reverse index for leading wildcards sql CREATE INDEX idxrevemail ON customers (reverse(emailaddress)); SELECT emailaddress FROM customers WHERE reverse(emailaddress) LIKE reverse('%@wikipedia.org'); --- 🔍 Key Comparisons Clustered vs. Non‑clustered Clustered: rows stored in key order; one per table; ideal for range scans. Non‑clustered: separate structure; many per table; good for look‑ups on non‑PK columns. Bitmap vs. B‑tree index Bitmap: bit‑maps per distinct value; excels on low‑cardinality columns; read‑heavy workloads. B‑tree: balanced tree; works for high‑cardinality and range queries. Hash vs. B‑tree for equality Hash: constant‑time $O(1)$ but no ordering; only equality. B‑tree: $O(\log N)$, supports range and ordering. Partial vs. Full index Partial: indexes only rows meeting a condition → smaller, faster for targeted queries. Full: indexes every row → larger, broader applicability. --- ⚠️ Common Misunderstandings “All indexes speed up queries.” – Indexes help only when the query predicate matches the index key order and is sargable; otherwise they add overhead. “More indexes always improve performance.” – Each index costs storage and slows INSERT/UPDATE/DELETE because the index must be maintained. “Composite index can be used starting from any column.” – Only the left‑most prefix can be used; a query filtering on the second column alone cannot use the index efficiently. “Hash indexes support LIKE or range scans.” – They do not; hash indexes work only for exact match (=). “Bitmap indexes are good for any column.” – They become inefficient on high‑cardinality columns (many distinct values). --- 🧠 Mental Models / Intuition “Index = table shortcut map.” – Think of the index as a phone book: you look up a name (key) to get the page number (pointer) instead of scanning every page. “Clustered = physical arrangement of the book.” – When the book is already sorted by the key, flipping to a page is a single, fast motion (sequential read). “Composite index = layered filter.” – The first layer (first column) decides the main bucket; subsequent layers further narrow within that bucket. --- 🚩 Exceptions & Edge Cases Leading wildcard (LIKE '%x') – never uses a normal index; must rewrite or use a reverse index. Full‑text search – requires an inverted index, not a B‑tree. Unique index on nullable columns – some DBMS allow multiple NULL values because NULL is not considered equal. Sparse index – only useful when data is stored in fixed‑size blocks and many rows share the same leading key. --- 📍 When to Use Which Range scans / ORDER BY → clustered index (or non‑clustered with appropriate ordering). Exact match on high‑cardinality column → hash index (if supported) or B‑tree. Low‑cardinality column (e.g., gender, status) → bitmap index. Full‑text search → inverted index. Queries need only a few columns → covering index with INCLUDE. Frequent INSERT/UPDATE, many indexes → limit indexes; prefer non‑clustered over clustered for write‑heavy tables. Searches with functions or partial data → functional or partial index. --- 👀 Patterns to Recognize Predicate starts with indexed column → index likely used. WHERE clause uses = or range on leading column(s) of a composite index → index hit. LIKE 'text%' → index usable; LIKE '%text' or %text% → index not usable. Multiple columns in SELECT but not in WHERE → consider covering index with INCLUDE. Foreign‑key column appears in join condition → ensure both sides have indexes. --- 🗂️ Exam Traps Choosing a bitmap index for a high‑cardinality column – will be marked wrong; bitmap excels only for few distinct values. Assuming any index makes a LIKE '%abc' fast – leading wildcard defeats standard indexes; only reverse or full‑text solves it. Creating a clustered index on a column that is frequently updated – hurts write performance; exam may ask why this is a bad idea. Believing a hash index can satisfy ORDER BY – hash indexes have no ordering; a B‑tree or clustered index is needed. Omitting the unique constraint when primary key is defined – primary key automatically creates a unique index; forgetting this may lead to duplicate‑key errors. ---

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.

Start learning in seconds

Drop your PDFs here or