Subjects/Science/Computer and Information Science/Computer Science/Index (database)

Index (database) - Fundamentals of Indexes

Understand what indexes are, how they boost query performance and enforce constraints, and the differences between clustered and non‑clustered indexes.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is the primary purpose of a database index?

1 of 19

Summary

Understanding Database Indexes Introduction A database index is one of the most important tools for improving query performance in relational databases. Just as an index in a book allows you to quickly find topics without reading every page, a database index enables the system to locate and retrieve data far more quickly than scanning every single row. However, indexes come with trade-offs: they consume storage space and require maintenance when data changes. Understanding how indexes work and when to use different types is essential for writing efficient database queries and designing performant databases. What Is a Database Index? At its core, a database index is a data structure that stores a copy of selected columns from a table in a way that optimizes searching. Rather than storing full rows of data, an index maintains keys—the indexed column values—along with pointers or references that link back to the actual rows. When you search for a value using an index, the database can jump directly to matching rows without examining every row in the table. Think of it this way: in a phone book, names are indexed alphabetically. To find someone's number, you don't flip through every page randomly; you jump to the section where their last name starts. The index tells you exactly where to look. Types of Values in Indexes Indexes can be built on one or more columns. When multiple columns are included, they're called composite indexes. Beyond simple column values, databases support specialized indexes: Functional indexes store the results of expressions applied to column values. For example, instead of indexing a last name as stored, a functional index might store the uppercase version. This allows queries searching for names regardless of capitalization to use the index efficiently. Partial indexes contain entries only for rows meeting a specific condition. If most of your queries filter for active customers (where status = 'active'), a partial index on that subset is smaller and faster than indexing all rows. The Performance Impact Why Indexes Matter Without an index, finding rows matching a condition requires scanning every row in the table—an operation with linear time complexity of $O(N)$, where $N$ is the number of rows. In a table with a million rows, this means examining an average of 500,000 rows for each query. Most database indexes provide dramatically better performance: Logarithmic complexity $O(\log N)$: Typical for tree-based indexes (B-trees). In a million-row table, finding a row requires roughly 20 comparisons instead of 500,000 scans. Constant complexity $O(1)$: Hash indexes can achieve this in ideal conditions, retrieving any row in roughly the same time regardless of table size. However, this speed comes with a cost. Indexes consume disk space, and every INSERT, UPDATE, or DELETE operation must update all relevant indexes. The larger and more numerous your indexes, the slower write operations become. This is the fundamental trade-off in index design. Index Architecture: Clustered vs. Non-Clustered There are two fundamental types of indexes, distinguished by how they organize the physical data. Non-Clustered Indexes In a non-clustered index, the physical arrangement of rows on disk is completely independent of the index order. The index stores the index key values and pointers (such as page number and row offset) that tell the database where to find the actual data rows. Think of a library with books arranged on shelves by call number (physical order) but with a card catalog organized alphabetically by author (index). The catalog doesn't change where books sit; it just helps you find them. Key characteristics: A table can have multiple non-clustered indexes Each index optimizes searches on different columns Non-clustered indexes are typically created on columns appearing in WHERE, JOIN, and ORDER BY clauses The database must follow the pointer to retrieve the full row, which adds a small performance cost Clustered Indexes A clustered index is fundamentally different: it determines the physical order of rows on disk. The data rows themselves are stored sequentially according to the clustered index key. This has profound performance implications: Only one clustered index can exist per table (since rows can only be physically ordered one way) Retrieval of ranges of values is extremely fast because adjacent index keys correspond to adjacent rows on disk Queries that request ordered results benefit tremendously because the data is already physically sorted Sequential scans—retrieving all rows in order—are highly efficient In practice, the clustered index is almost always the primary key. When you define a primary key in most relational databases, the system automatically creates a clustered index on it. The difference between these types matters enormously in practice. A query retrieving all customers with last names between "Smith" and "Wilson" runs orders of magnitude faster with a clustered index on last name than with a non-clustered index. Strategic Index Design: Column Order in Composite Indexes When indexing multiple columns, the order matters critically for query performance. This is one of the trickiest aspects of index design that confuses many developers. Consider an index on (department, hiredate). The first column is the most significant—it's the leftmost position and determines the overall structure. The database can use this index efficiently for queries filtering on department. But what about queries filtering only on hiredate, ignoring department? Here's the key principle: the database can use leading columns of a composite index independently, but cannot efficiently jump to non-leading columns without first filtering on leading columns. With a (department, hiredate) index: ✓ Queries filtering on department use the index efficiently ✓ Queries filtering on both department AND hiredate use it efficiently ✗ Queries filtering only on hiredate cannot use this index efficiently—the database still needs to scan entries since hiredate values are interleaved across different departments To maximize index usefulness, order columns by frequency of use in search conditions. If you frequently search by hiredate alone but rarely search by department alone, a (hiredate, department) index would be better. This principle extends to WHERE and JOIN clauses: columns that appear most frequently in filtering conditions should appear first in the index. Indexes for Database Constraints Beyond search performance, indexes serve another critical purpose: enforcing database constraints. Uniqueness Constraints When you declare an index as unique, you're creating an implicit uniqueness constraint on those columns. The database uses the index to quickly verify that no duplicate values exist whenever new data is inserted or updated. Without an index, checking uniqueness would require scanning the entire table. Most relational databases automatically create a unique index on primary key columns. This accomplishes two goals with one structure: it enforces uniqueness and provides fast lookups. Supporting Relationships Foreign key constraints—which ensure that referenced rows actually exist—typically require indexes on both the referencing columns and the referenced columns. These indexes speed up the validation of inserts, updates, and deletes in related tables, preventing slow constraint violations from blocking operations. <extrainfo> Exclusion Constraints Some advanced databases support exclusion constraints, which ensure that a specified condition doesn't hold for any pair of rows simultaneously. For example, a constraint might prevent two reservations from overlapping in time. Enforcing such constraints efficiently requires an index that can quickly locate rows satisfying the predicate. </extrainfo>

Flashcards

What is the primary purpose of a database index?

To improve the speed of data retrieval operations on a table.

What does a database index store to enable efficient searching?

A copy of selected columns from a table.

What information does each entry in a database index include to allow for full row retrieval?

A key that links directly to the original row of data.

What types of record access do indexes support by being created on one or more columns?

Rapid random lookups and ordered record access.

What is the average-case time complexity of retrieving a row based on a column value without an index?

$O(N)$ (linear scan).

What are the three main trade-offs involved in database index design?

Lookup performance Index size Maintenance cost during writes

What do functional indexes store instead of raw column values?

The results of functions or expressions applied to column values.

Which rows are included in a partial index?

Only rows that satisfy a specified conditional expression.

What is the effect of declaring a database index as unique?

It creates an implicit uniqueness constraint on the indexed columns.

Which index type is automatically created by an RDBMS for columns defined as a primary key?

A unique index.

Why do foreign key constraints typically require both referencing and referenced columns to be indexed?

To speed up inserts, updates, and deletes involving the related tables.

What is required to enforce an exclusion constraint that ensures a predicate does not hold for any two rows?

An index that can quickly locate rows satisfying that predicate.

How does the physical order of rows in a table relate to the logical order of keys in a non-clustered index?

They are unrelated.

On which types of columns are non-clustered indexes usually built?

Non-primary-key columns used in JOIN, WHERE, or ORDER BY clauses.

How does a clustered index affect the physical storage of data rows on a disk?

It determines the physical order of rows to match the index key order.

How many clustered indexes can exist on a single table?

Only one.

For which types of operations do clustered indexes provide the most dramatic performance improvement?

Sequential range scans or ordered retrievals.

What determines which columns in a composite index can be used independently for a search?

The order of the columns defined in the index.

Which column in a composite index must be specified in a query for the index to be used efficiently?

The first column.

Quiz

What does a clustered index determine about a table's storage?

1 of 3

Key Concepts

Index Types

Database index

Functional index

Partial index

Non‑clustered index

Clustered index

Composite index

Constraints

Unique constraint

Primary key

Foreign key constraint

Exclusion constraint

Definitions

Database index

A data structure that speeds up data retrieval by storing selected column values with pointers to the original rows.

Functional index

An index that stores the results of a function or expression applied to column values.

Partial index

An index that includes entries only for rows meeting a specified condition, reducing its size.

Non‑clustered index

An index whose logical order is independent of the physical order of table rows, containing pointers to the data.

Clustered index

An index that determines the physical storage order of rows, aligning data layout with the index key order.

Composite index

An index built on multiple columns where the column order affects which parts can be used for queries.

Unique constraint

A rule that ensures all values in the indexed column(s) are distinct, often enforced by a unique index.

Primary key

A column or set of columns uniquely identifying each row, automatically backed by a unique index.

Foreign key constraint

A rule linking rows between tables, typically requiring indexes on both referencing and referenced columns.

Exclusion constraint

A constraint that prohibits rows from satisfying a specified predicate simultaneously, enforced using an index.