RemNote Community
Community

Database management system - Data Modeling and Design Principles

Learn the major DBMS classifications, core database models, and the end‑to‑end data modeling and design process.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What functional group of DBMS capabilities involves the creation, modification, and removal of data definitions?
1 of 27

Summary

Classification and Design of Database Management Systems Introduction to Database Management Systems A Database Management System (DBMS) is software that allows organizations to store, organize, retrieve, and manage data efficiently. To understand databases, it's helpful to first recognize what functions all database systems must perform. Every DBMS, regardless of its underlying model or architecture, handles three fundamental functional groups of capabilities. Functional Groups of DBMS Capabilities Data Definition covers the creation, modification, and removal of the definitions that describe how data is organized. When you define a database structure—for instance, specifying that a table has columns for customer names, addresses, and phone numbers—you're using the data definition function. Update involves the insertion, modification, and deletion of the actual data itself. After defining the structure, you populate it with real information. This includes adding new records, changing existing data, and removing obsolete entries. Retrieval involves selecting data according to specified criteria and providing it to users or making it available for further processing. This is often what people think of as "querying" the database—asking questions like "Show me all customers from New York" or "Calculate total sales for Q4." Major Database Models Different DBMS designs use different underlying models—different fundamental ways of organizing and thinking about data. Understanding these models is essential because each model has distinct advantages and use cases. The Relational Database Model The relational database model organizes data as rows and columns in tables. Each row represents a record, and each column represents a specific attribute. This model is the most widely adopted for general-purpose databases because it's intuitive, flexible, and well-understood. The example above shows a typical relational database query returning film data with columns for title, release year, length, and replacement cost. Notice how the data naturally fits into rows and columns. Relational databases typically use Structured Query Language (SQL) for both data definition and queries. SQL provides a standardized way to interact with the database. NoSQL Database Models As databases grew more diverse, new models emerged collectively called NoSQL (Not Only Structured Query Language) databases. These don't rely on the traditional relational table structure and often use different query languages. Document-oriented databases store semi-structured documents (often in JSON format) rather than structured rows and columns. This flexibility makes them ideal for data that varies in structure. Key-value stores map unique keys directly to values, similar to a dictionary. They're extremely fast for retrieving data when you know the key, but less suitable for complex queries. Column-family stores organize data by column families rather than rows, optimizing for situations where you frequently access specific columns across many records. Graph databases represent data as nodes (entities), edges (relationships), and properties. They excel at capturing and querying complex relationships—think social networks or recommendation systems. Other Important Database Models Object-oriented databases store data as objects that encapsulate both state (data) and behavior (methods). This approach mirrors how programmers think in object-oriented languages but hasn't achieved the widespread adoption of the relational model. Distributed databases store both the data and the database management system across multiple computers. This enables scalability and fault tolerance but introduces complexity in maintaining consistency. Cloud databases host the database and most of the DBMS remotely in a cloud environment, managed by a cloud service provider. This eliminates the need for organizations to manage their own database infrastructure. Data warehouses archive data from operational databases and external sources, specifically designed for analytical processing rather than day-to-day transactions. They support complex analytical queries on large historical datasets. Database System Architectures Beyond the logical model of data organization, DBMS systems also differ in how they distribute processing and storage across computer hardware. These architectural choices profoundly affect performance and scalability. Shared-memory architecture allows multiple processors to access the same main memory space. While this provides simple data sharing, it's limited by the amount of memory a single machine can have and creates contention as more processors compete for the same resources. Shared-disk architecture gives each processing unit its own main memory while all units share common storage devices (like disk arrays). This balances data sharing with reduced memory contention, but disk I/O becomes a bottleneck. Shared-nothing architecture provides each processing unit with its own memory and storage. This completely eliminates resource contention and scales well, but requires careful management of how data is distributed across the nodes. Parallel database architecture improves performance by parallelizing independent tasks such as data loading, index building, and query evaluation across multiple processors. Combined with any of the above architectures, it can dramatically improve throughput. Detailed Database Models We've mentioned several models briefly. Let's examine the most important ones more closely. The Relational Model The relational model's strength lies in its simplicity and power. Data is organized in tables where each row is a distinct record and each column is a specific attribute. The model defines relationships between tables through shared key values. This example shows a simple relational structure with a "login" key that connects a main customer table with a related phone number table. Hierarchical and Network Models <extrainfo> Hierarchical databases organize data in a tree-like structure with a single root and parent-child relationships between records. Each child can have only one parent. Network databases allow more flexible graph-like relationships among records, where each record can have multiple parents. While more flexible than hierarchical models, they're more complex to navigate. These diagrams illustrate how hierarchical and network models structure data differently from the relational model, with explicit pointers (shown as arrows) connecting parent and child records. These models were historically important but are rarely used in modern systems compared to relational databases. </extrainfo> Object-Relational Model Object-relational databases combine relational tables with object-oriented features, allowing storage of complex data types (arrays, user-defined types) that pure relational databases handle awkwardly. Document, Key-Value, and Column-Family Models We've briefly described these under NoSQL, but they deserve emphasis: document databases store semi-structured data that doesn't fit neatly into rows, key-value stores sacrifice query flexibility for extreme speed, and column-family stores optimize for specific analytical access patterns. Database Design and Modeling Building a database requires a structured approach moving from high-level business requirements down to implementation details. Conceptual Data Modeling The first design step is to create a conceptual data model that accurately represents the information to be stored, independent of any specific technology. This model should capture what data exists and how it relates to other data, without worrying about how it will be implemented. Common tools for conceptual modeling include entity-relationship diagrams and Unified Modeling Language (UML). An entity-relationship diagram shows entities (things like "Customer" or "Order") and their relationships (like "Customer places Order"). Logical Database Design After creating a conceptual model, the next step is logical database design: translating the conceptual model into a logical schema expressed in the data model of your chosen DBMS. For example, if you've chosen a relational database, you'd translate entities into tables and attributes into columns. Normalization A crucial aspect of logical design is normalization, a process that restructures data to ensure each elementary fact is recorded only once. This prevents anomalies—problems that arise during insertion, update, or deletion of data. For example, if customer information and order information are stored in the same table row, updating a customer's address requires modifying multiple rows (one for each order). Normalization separates this into different tables, solving the problem. Physical Database Design Physical database design focuses on implementation details: performance, scalability, recovery, and security. Design decisions at this stage include which columns to index, how to partition data across storage, and how to configure caching. The result is a physical data model that specifies actual storage structures. This flowchart shows how the design process flows from conceptual modeling through logical and physical design, with considerations like data independence and security throughout. The Three-Schema Architecture and Data Independence A key principle in database design is separating different perspectives of the same data. This separation is formalized in the three-schema architecture. External View (User View) The external view defines how a particular group of end users perceives the organization of data. Different users may need different perspectives of the same database. A payroll department might see employee tables with salary information, while a customer service department sees the same database but with customer information instead. Multiple external views can exist simultaneously, each customized for specific users or applications, all drawn from the same underlying database. Conceptual View (Logical View) The conceptual view unifies all external views into a single global view that represents the complete logical structure of the database. This is of primary interest to database developers and administrators. It's the "official" schema that defines all entities, attributes, and relationships, independent of how specific users see subsets of the data. Internal View (Physical View) The internal view describes the physical storage layout, indexes, file structures, and other implementation details used by the DBMS to store and retrieve data efficiently. This includes decisions about disk organization, buffer management, and access paths. This diagram illustrates how the external schema (user view) relates to the internal schema (computer view), with database systems translating between them. Data Independence Data independence is the principle that changes at one level should not affect the higher levels. This is essential for database flexibility. Logical data independence means that changes to the conceptual schema (like adding a new attribute or restructuring tables) shouldn't require changes to external views or user applications. The DBMS handles translating the changed conceptual schema to the various external views. Physical data independence means that changes to the internal schema (like reorganizing files, changing indexes, or moving data to different disks) shouldn't require changes to the conceptual or external schemas. Applications and users are completely insulated from these implementation details. This independence is powerful: a database administrator can restructure storage to improve performance or change the physical organization for recovery purposes without affecting any running applications. Similarly, new external views can be added without disrupting existing systems.
Flashcards
What functional group of DBMS capabilities involves the creation, modification, and removal of data definitions?
Data definition
What functional group of DBMS capabilities involves selecting data based on criteria and providing it to the user?
Retrieval
How does the relational database model organize data?
As rows and columns in tables
Which language is typically used by the relational model for data definition and queries?
Structured Query Language (SQL)
Which database model is currently the most widely used for general-purpose databases?
The relational model
How does the Not Only Structured Query Language (NoSQL) model differ from the relational model regarding structure?
It does not rely on the relational table structure
How are data units stored in the object-oriented database model?
As objects that encapsulate both state and behavior
What specific category of databases does the document-oriented model fall under?
Not Only Structured Query Language (NoSQL)
What type of data content is primarily stored by document-oriented databases?
Semi-structured documents
What three elements does the graph database model use to represent data and capture relationships?
Nodes Edges Properties
Where are the data and the DBMS stored in a distributed database model?
Across multiple computers
Which architecture allows multiple processors to share the same main memory space?
Shared-memory architecture
In shared-disk architecture, what resource do all processing units share?
Common storage devices
Which architecture eliminates resource contention by giving each unit its own memory and storage?
Shared-nothing architecture
What is the primary goal of parallel database architecture?
To improve performance by parallelizing tasks
What type of structure is used by hierarchical databases to organize data?
Tree-like structure
What type of relationship structure is permitted in network databases?
Flexible graph-like relationships
Which two models are combined in object-relational databases to support complex data types?
Relational and object-oriented models
What are two common methods used to represent a conceptual data model?
Entity-relationship (ER) diagrams Unified Modeling Language (UML)
What is a conceptual model translated into during the logical database design phase?
A logical schema
What is the primary goal of restructuring data through normalization?
To ensure each elementary fact is recorded only once
What does normalization help prevent during data operations?
Anomalies (insertion, update, or deletion)
What four factors are the primary focus of physical database design decisions?
Performance Scalability Recovery Security
Which database view defines how specific end users perceive the organization of data?
External view
Which database view unifies all external views into a single global view for administrators?
Conceptual view
Which database view describes the physical storage layout and indexing structures?
Internal view
What principle ensures that changes to the internal storage layout do not affect the external or conceptual views?
Data independence

Quiz

Which DBMS capability involves creating, altering, and removing the definitions that describe how data is organized?
1 of 7
Key Concepts
Database Types
Relational Database Model
NoSQL Database
Object‑oriented Database
Document‑oriented Database
Graph Database
Distributed Database
Database Concepts
Database Management System
Data Warehouse
Normalization (Database)
Data Independence