Relational database - Design and Integrity of Relational Schemas
Understand keys and relationships, constraints and normalization, and relational operations and indexing.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
How is a Primary Key defined in a database table?
1 of 29
Summary
Database Design: Keys, Relationships, and Operations
Keys and Relationships
Understanding Primary Keys
A primary key is a column (or set of columns) that uniquely identifies each row in a table. Think of it as a permanent ID number assigned to every record—no two rows can have the same primary key value, and no primary key value can ever be null. When you need to retrieve, update, or delete a specific row, the primary key is how the database finds it instantly.
When a new row is inserted into a table, a new unique primary key value is generated automatically. Most databases use either auto-incrementing integers (1, 2, 3...) or universally unique identifiers (UUIDs) for this purpose.
Example: In a Students table, StudentID might be the primary key. Each student gets a unique StudentID, and the database uses this to distinguish one student from another.
Alternate Keys
An alternate key (or natural key) is another column or combination of columns that could also serve as a unique identifier, even though it's not chosen as the primary key. While a table has only one primary key, it can have multiple alternate keys.
Example: In a Students table, both StudentID and Email might be unique. StudentID is the primary key, but Email is an alternate key because each student's email is also unique.
Understanding Foreign Keys
A foreign key is a column that contains a primary key value from another table. It creates a link between two tables, establishing relationships in your database.
When you want to connect information from two tables—such as linking students to their courses—you use a foreign key. The foreign key in one table references the primary key in another table.
Example: In a Registrations table, you might have StudentID (which references the Students table) and CourseID (which references the Courses table). These are foreign keys that link registrations to the correct student and course.
One-to-One and One-to-Many Relationships
When a primary key from one table appears as a foreign key in another table, the relationship can be structured in two main ways:
One-to-One Relationships: Each row in table A is associated with exactly one row in table B, and vice versa. For example, each employee has exactly one assigned office, and each office is assigned to exactly one employee.
One-to-Many Relationships: Each row in table A is associated with multiple rows in table B, but each row in table B is associated with only one row in table A. For example, each department has many employees, but each employee works in only one department.
Resolving Many-to-Many Relationships
A many-to-many relationship occurs when each row in table A can be associated with multiple rows in table B, and vice versa. For example, students can enroll in multiple courses, and each course has multiple students.
Relational databases cannot directly represent many-to-many relationships. Instead, you create a junction table (also called a bridge or join table) that contains the primary keys from both tables. Each row in the junction table represents a specific pairing.
Example: To represent the relationship between Students and Courses:
Students table: StudentID, Name
Courses table: CourseID, CourseName
Registrations table (junction table): StudentID, CourseID
This design allows any student to be linked to multiple courses, and any course to be linked to multiple students.
<extrainfo>
Global Unique Identifiers
Some systems use globally unique identifiers (GUIDs) or UUIDs instead of simple auto-incrementing numbers. These identifiers are designed to be unique not just within one database, but across multiple databases and applications. While this adds complexity, it's useful for distributed systems or when merging data from different sources.
</extrainfo>
Relations and Tables
Base Relations and Derived Relations
A relation in database terminology is essentially a table—a collection of rows and columns. However, database theory distinguishes between two types:
Base relations (or base tables) are the actual tables that physically store data in the database. When you create a table and insert data, you're creating a base relation.
Derived relations are computed from base relations using relational operations. They exist virtually and are often implemented as views. A view is a saved query that looks and behaves like a table but doesn't store data itself—instead, it retrieves data from one or more base relations whenever it's accessed.
Example: A base relation might be Employees with columns for EmployeeID, Name, and Salary. A derived relation could be a view called HighEarners that shows only employees with salaries above $100,000.
Domains and Constraints
A domain specifies the set of allowed values for a given attribute (column). It's essentially a rule that restricts what data can be stored in that column.
Example: An Age attribute might have a domain of integers between 0 and 150. A Status attribute might have a domain of only three values: 'Active', 'Inactive', or 'Pending'.
Constraints enforce these restrictions and ensure data integrity. They prevent invalid data from being stored in your database. Constraints can be applied at three levels:
Single attribute constraints (like domain restrictions)
Tuple constraints (restrictions on one row)
Relation constraints (restrictions across the whole table)
Entity Integrity and Referential Integrity
Entity integrity is a fundamental rule: the primary key of a table can never contain a null value. Every row must have a valid, unique primary key. This ensures you can always uniquely identify every row in the table.
Referential integrity ensures that relationships between tables remain valid. Specifically: if a foreign key references a primary key in another table, that primary key value must exist. You cannot create a registration for a student that doesn't exist, or reference a department that has been deleted.
Example: If a StudentID is used as a foreign key in a Registrations table, that StudentID must exist in the Students table. The database will prevent you from creating a registration with a non-existent StudentID.
Check Constraints
A check constraint uses a Boolean expression (true/false condition) to enforce valid values for an attribute or combination of attributes. It's more flexible than a domain because it can enforce complex rules.
Example: A check constraint might specify that Price must be greater than 0, or that StartDate must be before EndDate. These constraints are evaluated whenever data is inserted or updated.
Normalization
Why Normalization Matters
Normalization is a design process that organizes data to eliminate redundancy and prevent data anomalies. When data is poorly organized, you can run into serious problems:
Insertion anomalies: You can't insert certain data because related data doesn't exist yet.
Update anomalies: Changing one piece of information requires changing it in many places.
Deletion anomalies: Removing one row accidentally removes related data you wanted to keep.
Normalization solves these problems by breaking down tables into smaller, more focused tables that follow specific rules.
The Normal Forms
Normal forms are a series of standards that define increasingly strict levels of data organization. A table in a higher normal form is more normalized than one in a lower normal form. Most practical databases aim for third normal form (3NF).
First Normal Form (1NF)
First normal form requires that each attribute contain only atomic (indivisible) values. In other words, each cell in a table should contain a single value, not a collection of values.
Problem example: A Students table where the Phone attribute stores multiple phone numbers: "555-1234, 555-5678"
Solution: Create a separate StudentPhones table where each row contains one student and one phone number.
Second Normal Form (2NF)
Second normal form requires that every non-key attribute be fully functionally dependent on the entire primary key. This means each non-primary-key column must depend on the whole primary key, not just part of it.
This rule is mainly important when your primary key consists of multiple columns (a composite key).
Problem example: A StudentCourses table with a composite primary key of (StudentID, CourseID) that also contains InstructorOffice. The instructor's office depends only on which course it is, not on which student is taking it. This violates 2NF because InstructorOffice depends on only part of the primary key (CourseID).
Solution: Move InstructorOffice to the Courses table where it properly depends on CourseID.
Third Normal Form (3NF)
Third normal form requires that no non-key attribute be transitively dependent on the primary key. In simpler terms: non-key columns should depend directly on the primary key, not on other non-key columns.
Problem example: An Employees table contains EmployeeID, EmployeeName, DepartmentID, and DepartmentName. DepartmentName transitively depends on EmployeeID (EmployeeID → DepartmentID → DepartmentName) rather than directly depending on it.
Solution: Create a separate Departments table. The Employees table keeps DepartmentID as a foreign key, but removes DepartmentName. You can look up the department name when needed.
This design prevents the update anomalies that occur when you have to change a department name in multiple employee rows.
Relational Operations
Relational algebra provides a set of operations for querying and manipulating relations. Understanding these operations is essential because SQL (the language used to query databases) is built on these concepts.
Set Operations
These operations treat relations as mathematical sets:
Union ($\cup$) combines the tuples from two relations and removes duplicates. Both relations must have the same structure.
$$R1 \cup R2 = \text{all unique tuples that appear in } R1 \text{ or } R2$$
Intersection ($\cap$) returns only the tuples that appear in both relations.
$$R1 \cap R2 = \text{tuples that appear in both } R1 \text{ and } R2$$
Set Difference ($-$) returns tuples that are in the first relation but not in the second.
$$R1 - R2 = \text{tuples in } R1 \text{ that don't appear in } R2$$
Cartesian Product ($\times$) pairs every tuple from the first relation with every tuple from the second relation, creating all possible combinations.
$$R1 \times R2 = \text{every tuple from } R1 \text{ paired with every tuple from } R2$$
Row and Column Operations
Selection (or Restriction) ($\sigma$) retrieves rows that satisfy a specified condition. It filters rows based on criteria you specify.
$$\sigma{\text{condition}}(R) = \text{rows from } R \text{ where the condition is true}$$
Example: $\sigma{\text{Age > 30}}(\text{Employees})$ returns all employees older than 30.
Projection ($\pi$) extracts specific columns from a relation, effectively choosing which attributes you want to see.
$$\pi{\text{column1, column2}}(R) = \text{only the specified columns from } R$$
Example: $\pi{\text{Name, Email}}(\text{Customers})$ returns just the name and email for each customer.
Join Operations
Natural Join ($\bowtie$) combines two relations based on their common attributes. It finds rows where the common columns have matching values and merges them into a single row, keeping the common column only once.
$$R1 \bowtie R2 = \text{rows from } R1 \text{ and } R2 \text{ that match on common attributes}$$
Example: If Students has StudentID and Name, and Registrations has StudentID and CourseID, the natural join combines them on StudentID to show which courses each student is taking.
<extrainfo>
Division Operator
Division ($\div$) finds rows that are associated with all rows of the divisor relation. This is a more advanced operation used to answer questions like "Which students are enrolled in all required courses?"
$$R1 \div R2 = \text{tuples from } R1 \text{ that are associated with every tuple in } R2$$
</extrainfo>
Indexes
Why Indexes Matter
An index is a data structure that enables the database to locate matching rows quickly without scanning every single row in a table. When a table contains millions of rows, an index can mean the difference between a query that returns results instantly and one that takes minutes.
Without an index, the database performs a full table scan, examining every row. With an index, it can jump directly to the relevant rows.
Example: Searching for a student by StudentID in a table of 1 million students. Without an index, the database might need to check all 1 million rows. With an index, it can find the student in just a few lookups.
Index Structures and Performance
Different index structures offer different performance characteristics:
B-Tree Indexes are the most common type. They organize data in a balanced tree structure that keeps data sorted. The query time is proportional to the logarithm of the number of rows ($\log n$). This means that even with millions of rows, you need only about 20 comparisons. B-tree indexes are versatile and work well for most queries.
<extrainfo>
Other Index Structures:
Hash Indexes provide constant-time lookup ($O(1)$) when the index fits in memory, making them very fast for equality searches. However, they don't support range queries and can have performance issues if the index becomes larger than memory.
R-Tree Indexes are specialized for spatial data like geographic coordinates.
Bitmap Indexes are efficient for columns with few distinct values (like Status: Active/Inactive).
</extrainfo>
Strategic Index Placement
Indexes are most beneficial on:
Primary Keys: Always indexed automatically, enabling fast row lookups and join operations.
Foreign Keys: Indexing foreign keys dramatically improves join performance when you need to connect data from multiple tables.
Frequently Searched Columns: If you often query by a particular column, indexing it will speed up those queries significantly.
The tradeoff is that indexes consume storage space and slow down insert/update/delete operations (because the index must be updated). The key is to create indexes strategically on columns that are frequently used in searches and joins.
Flashcards
How is a Primary Key defined in a database table?
A unique identifier assigned to each row used for efficient data access.
When is a new unique Primary Key value typically generated?
When a new row is inserted.
What is an Alternate Key?
A natural key that can uniquely identify a row and may consist of multiple columns.
What is the function of a Foreign Key?
To create a link between tables by containing the primary key value from another table.
Which relationship types can be represented when a primary key appears as a foreign key in another table?
One-to-one or one-to-many relationships.
How are many-to-many relationships resolved in a relational database?
By creating a junction table containing the primary keys from the two related tables.
What is the purpose of applying a Global Unique Identifier (GUID)?
To ensure uniqueness across multiple databases and applications.
What are Base Relations?
Tables that physically store data in the database.
What are Derived Relations and how are they often implemented?
Relations computed from base relations using operations, often implemented as views.
What does a Domain specify for a given attribute?
The set of allowed values, acting as a constraint.
What is the primary purpose of Constraints in a database?
To restrict stored values to enforce business rules.
On what levels can Constraints be defined within a relation?
Single attribute
Entire tuple
Whole relation
What does the Entity Integrity Rule require regarding primary keys?
The primary key must never contain a null value.
What does the Referential Integrity Rule require of foreign key values?
They must match an existing primary key value in the referenced table.
How do Check Constraints enforce valid attribute values?
By using Boolean expressions to enforce ranges or patterns.
What are the main purposes of performing Normalization?
Eliminate non-atomic values
Eliminate redundant data
Prevent anomalies
Preserve data integrity
What is required for a table to be in First Normal Form (1NF)?
Each attribute must contain only atomic (indivisible) values.
What is the requirement for Second Normal Form (2NF)?
Every non-key attribute must be fully functionally dependent on the primary key.
What dependency is prohibited in Third Normal Form (3NF)?
Transitive dependency of non-key attributes on the primary key.
What is the result of the Union operator $\cup$?
Combines tuples of two relations and removes duplicates.
What does the Intersection operator $\cap$ return?
The set of tuples that appear in both relations.
How does the Set Difference operator $-$ function?
Returns tuples in the first relation that are not in the second.
What does the Cartesian Product operator $\times$ do?
Pairs each tuple of the first relation with each tuple of the second relation.
What is the purpose of the Selection (Restriction) operator $\sigma$?
Retrieves rows that satisfy a specified condition.
What is the function of the Projection operator $\pi$?
Extracts specific columns from a relation.
How does a Natural Join operator $\bowtie$ combine relations?
Based on their common attributes.
How does an Index provide faster access to rows?
By locating matching tuples directly without scanning the entire table.
What is the typical query time performance of a B-tree index?
Proportional to the logarithm of the number of rows ($\log n$).
Under what condition does a Hash Index provide constant-time lookup?
When the index fits into memory.
Quiz
Relational database - Design and Integrity of Relational Schemas Quiz Question 1: What is a primary key in a database table?
- A unique identifier for each row (correct)
- A column that references another table
- A non‑unique attribute used for sorting
- A default value assigned to missing data
Relational database - Design and Integrity of Relational Schemas Quiz Question 2: What are base relations in a database?
- Tables that physically store data (correct)
- Views computed from other tables
- Temporary tables used for queries
- Indexes that speed up data retrieval
Relational database - Design and Integrity of Relational Schemas Quiz Question 3: What does the union operator (∪) do in relational algebra?
- Combines tuples from two relations, removing duplicates (correct)
- Selects rows that satisfy a condition
- Finds common tuples in both relations
- Pairs each tuple from one relation with each from another
Relational database - Design and Integrity of Relational Schemas Quiz Question 4: Which of the following is an example of a check constraint condition?
- Salary > 0 (correct)
- EmployeeID is a primary key
- DepartmentID references Departments(DeptID)
- Email must be unique
Relational database - Design and Integrity of Relational Schemas Quiz Question 5: What does the natural join operator (⨝) do when combining two relations?
- It merges rows that have equal values in all common attribute names (correct)
- It returns only rows that appear in both relations
- It concatenates every row of the first relation with every row of the second
- It extracts a specified set of columns from a relation
Relational database - Design and Integrity of Relational Schemas Quiz Question 6: In database design, what does the term “domain” refer to?
- The set of allowable values for a given attribute (correct)
- A lookup table that stores reference data
- A constraint that enforces uniqueness of a column
- A rule that defines foreign‑key relationships between tables
Relational database - Design and Integrity of Relational Schemas Quiz Question 7: What requirement must a relation satisfy to be in First Normal Form (1NF)?
- Each attribute must contain only atomic (indivisible) values (correct)
- The relation must have a primary key defined
- All non‑key attributes must be fully functionally dependent on the primary key
- No attribute may contain duplicate values within the relation
Relational database - Design and Integrity of Relational Schemas Quiz Question 8: What is the primary performance characteristic of a hash index when the index fits entirely in memory?
- Constant‑time lookup for searched keys (correct)
- Lookup time proportional to the logarithm of the row count
- Improved performance only for range‑based queries
- Requires a full table scan for each lookup
Relational database - Design and Integrity of Relational Schemas Quiz Question 9: What result does the intersection operator (∩) produce when applied to two relations?
- Only the tuples that appear in both relations (correct)
- All tuples from the first relation
- All tuples from the second relation
- Tuples that are in either relation but not both
Relational database - Design and Integrity of Relational Schemas Quiz Question 10: Which of the following is a common index structure used in relational databases?
- B‑plus tree (correct)
- Hash map
- Linked list
- Binary heap
Relational database - Design and Integrity of Relational Schemas Quiz Question 11: What condition must a relation satisfy to be in Third Normal Form (3NF)?
- No non‑key attribute is transitively dependent on the primary key (correct)
- All attributes must be atomic values
- Every non‑key attribute must be fully functionally dependent on the whole primary key
- The primary key must be a surrogate key
Relational database - Design and Integrity of Relational Schemas Quiz Question 12: Why are indexes commonly created on primary key and foreign key columns?
- They speed up joins and lookup operations involving those keys (correct)
- They enforce uniqueness constraints automatically
- They compress the stored data to save space
- They automatically generate foreign key relationships
Relational database - Design and Integrity of Relational Schemas Quiz Question 13: Normalization primarily helps to prevent which of the following problems?
- Data anomalies caused by redundant or non‑atomic values (correct)
- Slow query performance due to lack of indexing
- Unauthorized data access through weak security
- Loss of referential links between tables
Relational database - Design and Integrity of Relational Schemas Quiz Question 14: In relational algebra, what does the selection (σ) operator accomplish?
- Retrieves rows that satisfy a specified condition (correct)
- Eliminates duplicate rows from a relation
- Joins two relations based on a common attribute
- Extracts specific columns from a relation
Relational database - Design and Integrity of Relational Schemas Quiz Question 15: What result does the set difference operator ( - ) produce?
- All tuples that appear in the first relation but not in the second (correct)
- Only the tuples that appear in both relations
- All possible pairs of tuples from the two relations
- All tuples from the second relation that are not in the first
Relational database - Design and Integrity of Relational Schemas Quiz Question 16: In relational algebra, what does the projection operator π do?
- It extracts specified columns from a relation (correct)
- It selects rows that satisfy a given condition
- It joins two relations on common attributes
- It computes the Cartesian product of two relations
Relational database - Design and Integrity of Relational Schemas Quiz Question 17: What is the main benefit of applying higher normal forms to a database schema?
- They reduce data redundancy and update anomalies (correct)
- They automatically improve query execution speed
- They encrypt sensitive data by default
- They increase the amount of required storage space
Relational database - Design and Integrity of Relational Schemas Quiz Question 18: Which relational operation would you use to find customers who have purchased every product in a given set?
- Division (correct)
- Selection
- Projection
- Union
Relational database - Design and Integrity of Relational Schemas Quiz Question 19: When a new row is inserted into a table, what must be true about the primary key value assigned to that row?
- It is a new unique value not used by any other row (correct)
- It is the same as the primary key of the previous row
- It can be null if no data is provided
- It must match a foreign key in another table
Relational database - Design and Integrity of Relational Schemas Quiz Question 20: If a primary key value appears multiple times as a foreign key in another table, what type of relationship does this indicate?
- One-to-many (correct)
- One-to-one
- Many-to-many
- No relationship
Relational database - Design and Integrity of Relational Schemas Quiz Question 21: Why does the entity integrity rule prohibit null values in a primary‑key column?
- Because a primary key must uniquely identify each row (correct)
- Because null values improve storage efficiency
- Because nulls are required for foreign keys
- Because nulls enable faster indexing
Relational database - Design and Integrity of Relational Schemas Quiz Question 22: In a relational schema, a foreign key is used to create a link between tables by storing what kind of value?
- The primary key value from another table (correct)
- A unique identifier generated for the current table
- A checksum of the row’s data
- A computed column based on other attributes
Relational database - Design and Integrity of Relational Schemas Quiz Question 23: When a many‑to‑many relationship needs to be represented, which type of table is introduced?
- Junction table containing the two foreign keys (correct)
- Single table that merges the related entities
- View that joins the related tables on demand
- Temporary table used only during query execution
Relational database - Design and Integrity of Relational Schemas Quiz Question 24: What key characteristic makes a globally unique identifier (GUID) valuable for rows that may be stored in different databases?
- It guarantees uniqueness across all systems (correct)
- It reduces the size of the stored row
- It improves index search speed
- It automatically enforces referential integrity
Relational database - Design and Integrity of Relational Schemas Quiz Question 25: What kind of relation is produced by applying relational operators to base tables and is often realized as a view?
- Derived relation (correct)
- Base relation
- Temporary relation
- Materialized relation
Relational database - Design and Integrity of Relational Schemas Quiz Question 26: The main purpose of defining a constraint on a table is to:
- Restrict the values that can be stored to enforce business rules (correct)
- Speed up query execution by creating indexes automatically
- Enable automatic generation of primary‑key values
- Compress the stored data to save space
Relational database - Design and Integrity of Relational Schemas Quiz Question 27: If relation R contains m tuples and relation S contains n tuples, how many tuples are in the Cartesian product R × S?
- m × n (correct)
- m + n
- max(m, n)
- min(m, n)
Relational database - Design and Integrity of Relational Schemas Quiz Question 28: Which of the following column combinations could serve as an alternate key in a table?
- CountryCode together with PhoneNumber (correct)
- EmployeeID (a surrogate identifier)
- FirstName alone
- DepartmentName plus Salary
Relational database - Design and Integrity of Relational Schemas Quiz Question 29: Which type of constraint validates a condition that involves multiple columns within the same row?
- Tuple‑level constraint (correct)
- Attribute‑level constraint
- Relation‑level constraint
- Database‑level constraint
Relational database - Design and Integrity of Relational Schemas Quiz Question 30: Which functional dependency would cause a relation to violate Second Normal Form?
- A → C, where (A, B) is the composite primary key and C is a non‑key attribute (correct)
- AB → C, where (A, B) is the composite primary key and C is a non‑key attribute
- A → B, where B is part of the primary key
- AB → CD, where CD are non‑key attributes
Relational database - Design and Integrity of Relational Schemas Quiz Question 31: According to the referential integrity rule, what must a foreign‑key value correspond to in the referenced table?
- It must equal an existing primary‑key value (correct)
- It must be unique across the entire database
- It must be non‑NULL
- It must be a numeric data type
What is a primary key in a database table?
1 of 31
Key Concepts
Database Integrity Concepts
Primary key
Foreign key
Entity integrity
Referential integrity
Check constraint
Domain (database)
Database Design and Structure
Normalization
Many‑to‑many relationship
B‑tree index
Data Manipulation Techniques
Relational algebra
Definitions
Primary key
A unique identifier for each row in a relational table, ensuring entity integrity.
Foreign key
A column that references the primary key of another table to establish a relational link.
Normalization
The process of organizing database tables to reduce redundancy and improve data integrity, defined by a series of normal forms.
Relational algebra
A formal language of set‑based operations (e.g., selection, projection, join) used to query and manipulate relational data.
B‑tree index
A balanced tree data structure that enables logarithmic‑time search, insertion, and deletion of rows in a database table.
Check constraint
A rule expressed as a Boolean expression that restricts the permissible values of a column or set of columns.
Entity integrity
The rule that a table’s primary key must contain no null values, guaranteeing each row’s uniqueness.
Referential integrity
The rule that a foreign key value must match an existing primary key value in the referenced table.
Many‑to‑many relationship
A type of association between two entities that is implemented via a junction table containing foreign keys from both sides.
Domain (database)
The set of allowable values for a given attribute, defining its data type and any associated constraints.