RemNote Community
Community

Relational database - Design and Integrity of Relational Schemas

Understand keys and relationships, constraints and normalization, and relational operations and indexing.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

How is a Primary Key defined in a database table?
1 of 29

Summary

Database Design: Keys, Relationships, and Operations Keys and Relationships Understanding Primary Keys A primary key is a column (or set of columns) that uniquely identifies each row in a table. Think of it as a permanent ID number assigned to every record—no two rows can have the same primary key value, and no primary key value can ever be null. When you need to retrieve, update, or delete a specific row, the primary key is how the database finds it instantly. When a new row is inserted into a table, a new unique primary key value is generated automatically. Most databases use either auto-incrementing integers (1, 2, 3...) or universally unique identifiers (UUIDs) for this purpose. Example: In a Students table, StudentID might be the primary key. Each student gets a unique StudentID, and the database uses this to distinguish one student from another. Alternate Keys An alternate key (or natural key) is another column or combination of columns that could also serve as a unique identifier, even though it's not chosen as the primary key. While a table has only one primary key, it can have multiple alternate keys. Example: In a Students table, both StudentID and Email might be unique. StudentID is the primary key, but Email is an alternate key because each student's email is also unique. Understanding Foreign Keys A foreign key is a column that contains a primary key value from another table. It creates a link between two tables, establishing relationships in your database. When you want to connect information from two tables—such as linking students to their courses—you use a foreign key. The foreign key in one table references the primary key in another table. Example: In a Registrations table, you might have StudentID (which references the Students table) and CourseID (which references the Courses table). These are foreign keys that link registrations to the correct student and course. One-to-One and One-to-Many Relationships When a primary key from one table appears as a foreign key in another table, the relationship can be structured in two main ways: One-to-One Relationships: Each row in table A is associated with exactly one row in table B, and vice versa. For example, each employee has exactly one assigned office, and each office is assigned to exactly one employee. One-to-Many Relationships: Each row in table A is associated with multiple rows in table B, but each row in table B is associated with only one row in table A. For example, each department has many employees, but each employee works in only one department. Resolving Many-to-Many Relationships A many-to-many relationship occurs when each row in table A can be associated with multiple rows in table B, and vice versa. For example, students can enroll in multiple courses, and each course has multiple students. Relational databases cannot directly represent many-to-many relationships. Instead, you create a junction table (also called a bridge or join table) that contains the primary keys from both tables. Each row in the junction table represents a specific pairing. Example: To represent the relationship between Students and Courses: Students table: StudentID, Name Courses table: CourseID, CourseName Registrations table (junction table): StudentID, CourseID This design allows any student to be linked to multiple courses, and any course to be linked to multiple students. <extrainfo> Global Unique Identifiers Some systems use globally unique identifiers (GUIDs) or UUIDs instead of simple auto-incrementing numbers. These identifiers are designed to be unique not just within one database, but across multiple databases and applications. While this adds complexity, it's useful for distributed systems or when merging data from different sources. </extrainfo> Relations and Tables Base Relations and Derived Relations A relation in database terminology is essentially a table—a collection of rows and columns. However, database theory distinguishes between two types: Base relations (or base tables) are the actual tables that physically store data in the database. When you create a table and insert data, you're creating a base relation. Derived relations are computed from base relations using relational operations. They exist virtually and are often implemented as views. A view is a saved query that looks and behaves like a table but doesn't store data itself—instead, it retrieves data from one or more base relations whenever it's accessed. Example: A base relation might be Employees with columns for EmployeeID, Name, and Salary. A derived relation could be a view called HighEarners that shows only employees with salaries above $100,000. Domains and Constraints A domain specifies the set of allowed values for a given attribute (column). It's essentially a rule that restricts what data can be stored in that column. Example: An Age attribute might have a domain of integers between 0 and 150. A Status attribute might have a domain of only three values: 'Active', 'Inactive', or 'Pending'. Constraints enforce these restrictions and ensure data integrity. They prevent invalid data from being stored in your database. Constraints can be applied at three levels: Single attribute constraints (like domain restrictions) Tuple constraints (restrictions on one row) Relation constraints (restrictions across the whole table) Entity Integrity and Referential Integrity Entity integrity is a fundamental rule: the primary key of a table can never contain a null value. Every row must have a valid, unique primary key. This ensures you can always uniquely identify every row in the table. Referential integrity ensures that relationships between tables remain valid. Specifically: if a foreign key references a primary key in another table, that primary key value must exist. You cannot create a registration for a student that doesn't exist, or reference a department that has been deleted. Example: If a StudentID is used as a foreign key in a Registrations table, that StudentID must exist in the Students table. The database will prevent you from creating a registration with a non-existent StudentID. Check Constraints A check constraint uses a Boolean expression (true/false condition) to enforce valid values for an attribute or combination of attributes. It's more flexible than a domain because it can enforce complex rules. Example: A check constraint might specify that Price must be greater than 0, or that StartDate must be before EndDate. These constraints are evaluated whenever data is inserted or updated. Normalization Why Normalization Matters Normalization is a design process that organizes data to eliminate redundancy and prevent data anomalies. When data is poorly organized, you can run into serious problems: Insertion anomalies: You can't insert certain data because related data doesn't exist yet. Update anomalies: Changing one piece of information requires changing it in many places. Deletion anomalies: Removing one row accidentally removes related data you wanted to keep. Normalization solves these problems by breaking down tables into smaller, more focused tables that follow specific rules. The Normal Forms Normal forms are a series of standards that define increasingly strict levels of data organization. A table in a higher normal form is more normalized than one in a lower normal form. Most practical databases aim for third normal form (3NF). First Normal Form (1NF) First normal form requires that each attribute contain only atomic (indivisible) values. In other words, each cell in a table should contain a single value, not a collection of values. Problem example: A Students table where the Phone attribute stores multiple phone numbers: "555-1234, 555-5678" Solution: Create a separate StudentPhones table where each row contains one student and one phone number. Second Normal Form (2NF) Second normal form requires that every non-key attribute be fully functionally dependent on the entire primary key. This means each non-primary-key column must depend on the whole primary key, not just part of it. This rule is mainly important when your primary key consists of multiple columns (a composite key). Problem example: A StudentCourses table with a composite primary key of (StudentID, CourseID) that also contains InstructorOffice. The instructor's office depends only on which course it is, not on which student is taking it. This violates 2NF because InstructorOffice depends on only part of the primary key (CourseID). Solution: Move InstructorOffice to the Courses table where it properly depends on CourseID. Third Normal Form (3NF) Third normal form requires that no non-key attribute be transitively dependent on the primary key. In simpler terms: non-key columns should depend directly on the primary key, not on other non-key columns. Problem example: An Employees table contains EmployeeID, EmployeeName, DepartmentID, and DepartmentName. DepartmentName transitively depends on EmployeeID (EmployeeID → DepartmentID → DepartmentName) rather than directly depending on it. Solution: Create a separate Departments table. The Employees table keeps DepartmentID as a foreign key, but removes DepartmentName. You can look up the department name when needed. This design prevents the update anomalies that occur when you have to change a department name in multiple employee rows. Relational Operations Relational algebra provides a set of operations for querying and manipulating relations. Understanding these operations is essential because SQL (the language used to query databases) is built on these concepts. Set Operations These operations treat relations as mathematical sets: Union ($\cup$) combines the tuples from two relations and removes duplicates. Both relations must have the same structure. $$R1 \cup R2 = \text{all unique tuples that appear in } R1 \text{ or } R2$$ Intersection ($\cap$) returns only the tuples that appear in both relations. $$R1 \cap R2 = \text{tuples that appear in both } R1 \text{ and } R2$$ Set Difference ($-$) returns tuples that are in the first relation but not in the second. $$R1 - R2 = \text{tuples in } R1 \text{ that don't appear in } R2$$ Cartesian Product ($\times$) pairs every tuple from the first relation with every tuple from the second relation, creating all possible combinations. $$R1 \times R2 = \text{every tuple from } R1 \text{ paired with every tuple from } R2$$ Row and Column Operations Selection (or Restriction) ($\sigma$) retrieves rows that satisfy a specified condition. It filters rows based on criteria you specify. $$\sigma{\text{condition}}(R) = \text{rows from } R \text{ where the condition is true}$$ Example: $\sigma{\text{Age > 30}}(\text{Employees})$ returns all employees older than 30. Projection ($\pi$) extracts specific columns from a relation, effectively choosing which attributes you want to see. $$\pi{\text{column1, column2}}(R) = \text{only the specified columns from } R$$ Example: $\pi{\text{Name, Email}}(\text{Customers})$ returns just the name and email for each customer. Join Operations Natural Join ($\bowtie$) combines two relations based on their common attributes. It finds rows where the common columns have matching values and merges them into a single row, keeping the common column only once. $$R1 \bowtie R2 = \text{rows from } R1 \text{ and } R2 \text{ that match on common attributes}$$ Example: If Students has StudentID and Name, and Registrations has StudentID and CourseID, the natural join combines them on StudentID to show which courses each student is taking. <extrainfo> Division Operator Division ($\div$) finds rows that are associated with all rows of the divisor relation. This is a more advanced operation used to answer questions like "Which students are enrolled in all required courses?" $$R1 \div R2 = \text{tuples from } R1 \text{ that are associated with every tuple in } R2$$ </extrainfo> Indexes Why Indexes Matter An index is a data structure that enables the database to locate matching rows quickly without scanning every single row in a table. When a table contains millions of rows, an index can mean the difference between a query that returns results instantly and one that takes minutes. Without an index, the database performs a full table scan, examining every row. With an index, it can jump directly to the relevant rows. Example: Searching for a student by StudentID in a table of 1 million students. Without an index, the database might need to check all 1 million rows. With an index, it can find the student in just a few lookups. Index Structures and Performance Different index structures offer different performance characteristics: B-Tree Indexes are the most common type. They organize data in a balanced tree structure that keeps data sorted. The query time is proportional to the logarithm of the number of rows ($\log n$). This means that even with millions of rows, you need only about 20 comparisons. B-tree indexes are versatile and work well for most queries. <extrainfo> Other Index Structures: Hash Indexes provide constant-time lookup ($O(1)$) when the index fits in memory, making them very fast for equality searches. However, they don't support range queries and can have performance issues if the index becomes larger than memory. R-Tree Indexes are specialized for spatial data like geographic coordinates. Bitmap Indexes are efficient for columns with few distinct values (like Status: Active/Inactive). </extrainfo> Strategic Index Placement Indexes are most beneficial on: Primary Keys: Always indexed automatically, enabling fast row lookups and join operations. Foreign Keys: Indexing foreign keys dramatically improves join performance when you need to connect data from multiple tables. Frequently Searched Columns: If you often query by a particular column, indexing it will speed up those queries significantly. The tradeoff is that indexes consume storage space and slow down insert/update/delete operations (because the index must be updated). The key is to create indexes strategically on columns that are frequently used in searches and joins.
Flashcards
How is a Primary Key defined in a database table?
A unique identifier assigned to each row used for efficient data access.
When is a new unique Primary Key value typically generated?
When a new row is inserted.
What is an Alternate Key?
A natural key that can uniquely identify a row and may consist of multiple columns.
What is the function of a Foreign Key?
To create a link between tables by containing the primary key value from another table.
Which relationship types can be represented when a primary key appears as a foreign key in another table?
One-to-one or one-to-many relationships.
How are many-to-many relationships resolved in a relational database?
By creating a junction table containing the primary keys from the two related tables.
What is the purpose of applying a Global Unique Identifier (GUID)?
To ensure uniqueness across multiple databases and applications.
What are Base Relations?
Tables that physically store data in the database.
What are Derived Relations and how are they often implemented?
Relations computed from base relations using operations, often implemented as views.
What does a Domain specify for a given attribute?
The set of allowed values, acting as a constraint.
What is the primary purpose of Constraints in a database?
To restrict stored values to enforce business rules.
On what levels can Constraints be defined within a relation?
Single attribute Entire tuple Whole relation
What does the Entity Integrity Rule require regarding primary keys?
The primary key must never contain a null value.
What does the Referential Integrity Rule require of foreign key values?
They must match an existing primary key value in the referenced table.
How do Check Constraints enforce valid attribute values?
By using Boolean expressions to enforce ranges or patterns.
What are the main purposes of performing Normalization?
Eliminate non-atomic values Eliminate redundant data Prevent anomalies Preserve data integrity
What is required for a table to be in First Normal Form (1NF)?
Each attribute must contain only atomic (indivisible) values.
What is the requirement for Second Normal Form (2NF)?
Every non-key attribute must be fully functionally dependent on the primary key.
What dependency is prohibited in Third Normal Form (3NF)?
Transitive dependency of non-key attributes on the primary key.
What is the result of the Union operator $\cup$?
Combines tuples of two relations and removes duplicates.
What does the Intersection operator $\cap$ return?
The set of tuples that appear in both relations.
How does the Set Difference operator $-$ function?
Returns tuples in the first relation that are not in the second.
What does the Cartesian Product operator $\times$ do?
Pairs each tuple of the first relation with each tuple of the second relation.
What is the purpose of the Selection (Restriction) operator $\sigma$?
Retrieves rows that satisfy a specified condition.
What is the function of the Projection operator $\pi$?
Extracts specific columns from a relation.
How does a Natural Join operator $\bowtie$ combine relations?
Based on their common attributes.
How does an Index provide faster access to rows?
By locating matching tuples directly without scanning the entire table.
What is the typical query time performance of a B-tree index?
Proportional to the logarithm of the number of rows ($\log n$).
Under what condition does a Hash Index provide constant-time lookup?
When the index fits into memory.

Quiz

What is a primary key in a database table?
1 of 31
Key Concepts
Database Integrity Concepts
Primary key
Foreign key
Entity integrity
Referential integrity
Check constraint
Domain (database)
Database Design and Structure
Normalization
Many‑to‑many relationship
B‑tree index
Data Manipulation Techniques
Relational algebra