Data model - Core Concepts and Practices
Understand core data modeling concepts, essential data properties and structures, and common model types and patterns.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
Which pillar of enterprise or solution architecture describes the data structures used by a business and its applications?
1 of 14
Summary
Core Topics in Data Modeling
Introduction: What is Data Modeling?
Data modeling is the process of creating a structured representation of how an organization's data is organized, stored, and used. Think of it as creating a blueprint for a database—just as architects design buildings before construction begins, data modelers design data structures before a system is built.
At its core, data modeling serves two purposes. First, it captures business requirements by defining what data an organization needs and how that data relates to other data. Second, it guides implementation by providing a formal specification that developers and database administrators use to build actual systems.
The result of data modeling is a data model—a formal description of the structure, constraints, and relationships within a specific domain of data. This model acts like a grammar for how we organize and understand information in that domain.
Data Architecture: The Big Picture
Before diving into models, it's important to understand data architecture, which provides the larger context for all data modeling work.
Data architecture is the overall design that defines a target state for how an organization manages its data and plans how to achieve that state. It's a foundational pillar of enterprise architecture—the master plan for how an organization's technology works.
Data architecture describes:
Data structures used by the business and its applications
Data in different states: data at rest (stored), data in motion (being transferred), and data being processed
Data components: data stores where information lives, data groups (collections of related data), and individual data items
Data flows: how data moves through systems, with criteria for processing operations
In essence, data architecture answers the question: "What data does our organization need, where does it live, and how does it flow through our systems?"
Data Organization: From Concept to Implementation
Once we understand data architecture, we need to actually organize data using technology. This happens through multiple levels of abstraction, a concept called the three-schema architecture.
The Three Levels
The conceptual model is the highest level of abstraction. It describes what data the business needs without any concern for how it will be stored or accessed. This model uses business-friendly terms and focuses on entities (like "Customer" or "Order") and how they relate to each other.
The logical model translates the conceptual model into a format that can be implemented in a specific database technology. For a relational database, this means organizing data into tables and columns with defined relationships. The logical model is technology-neutral in principle—it could be implemented in many different database systems.
The physical model describes the actual storage implementation: how tables are stored on disk, which indexes exist, how data is partitioned across storage devices, and other technical details specific to a particular database system.
The logical model is typically derived from the conceptual model, but it may be adjusted based on performance requirements and usage patterns. For example, you might restructure how data is organized to make common queries run faster.
Data Properties: What Makes Data Good?
Not all data is equally useful. Understanding data properties (also called data quality attributes) helps you assess whether your data model captures data appropriately. These properties fall into several categories:
Definition-Related Properties
Relevance means the data actually serves the intended purpose. Data about customer complaints is relevant for improving service quality, but irrelevant for predicting warehouse inventory needs.
Clarity requires that there's a clear, shared understanding of what the data means. For example, "customer age" could mean age at registration, current age, or age at last purchase—without clarity, people will misinterpret it.
Consistency ensures that the same type of data from different sources is compatible and comparable. If one system records dates as "MM/DD/YYYY" and another as "YYYY-MM-DD", they're inconsistent.
Content-Related Properties
Timeliness means data is available when you need it. Real-time transaction data has high timeliness; annual reports have low timeliness.
Accuracy measures how closely data reflects reality. A customer's recorded phone number is accurate if it's actually their correct number.
Additional Properties
Completeness describes how much of the required data is actually available. If you're missing customer email addresses for 30% of your records, completeness is low.
Accessibility defines where, how, and to whom the data is available. Some data might exist but be locked in legacy systems or restricted by privacy regulations.
Cost is the practical expense of obtaining and maintaining the data. It's a trade-off: highly accurate, real-time data usually costs more to produce than approximate data.
Data Model Theory: The Three Components
To understand data models deeply, you need to know the theoretical framework they're built on. Every data model has three components:
The Structural Part
The structural part defines the collection of data structures that represent entities. These are the building blocks—the types of objects your model recognizes and the properties they have. In a customer management system, the structural part includes structures for "Customer," "Order," and "Payment," along with what properties each has (customer name, order date, payment amount, etc.).
The Integrity Part
The integrity part defines rules and constraints that maintain data quality and structural correctness. These rules ensure that:
Required data is present (you can't have an order without a customer)
Data values are valid (a discount percentage can't be more than 100)
Relationships are maintained (you can't delete a customer if they still have active orders)
The Manipulation Part
The manipulation part specifies operators and methods for updating, querying, and transforming data. These operations let you insert new data, update existing data, delete data, and retrieve information through queries.
The Relational Model Example
The most common data model used today is the relational model. Here's how its three components work:
Structural part: Based on the mathematical concept of a "relation" (which appears as a table with rows and columns)
Integrity part: Expressed using first-order logic, defining constraints like primary keys and foreign keys
Manipulation part: Uses relational algebra, tuple calculus, and domain calculus for querying and updating
When you take this theoretical model and apply it to a specific business problem, you create a data model instance—for example, transforming a semantic logical model into an actual physical database.
Entity-Relationship Models: Visualizing Structure
One of the most practical ways to represent a data model is through an entity-relationship model (ER model), also called an entity-relationship diagram or ERD.
What is an Entity-Relationship Model?
An ER model is a visual representation of a conceptual data model. It shows:
Entities: Objects or concepts the business cares about (represented as boxes)
Attributes: Properties of those entities (listed inside the boxes)
Relationships: Connections between entities (represented as lines connecting boxes)
Cardinality: Rules about how many instances of one entity can relate to another
Understanding Cardinality
Cardinality is one of the trickier concepts in ER modeling. It describes the multiplicity of relationships—that is, how many instances of one entity can be associated with instances of another.
For example:
One customer can place multiple orders (one-to-many relationship)
One order belongs to exactly one customer (cardinality constraint)
One product can appear on many orders (many-to-many relationship)
Different notations represent cardinality on ER diagrams:
Arrow heads point toward the entity that has lower cardinality
Crow's feet notation (inverted V shapes) indicate the "many" side of relationships
Numerical notations explicitly show cardinality (1:1, 1:N, M:N)
The key is understanding that cardinality constraints are business rules. They reflect real-world limitations. For instance, a business rule might state "each order must have at least one product" (minimum cardinality of 1) and "each order can have at most 100 line items" (maximum cardinality of 100).
Semantic Data Models: Capturing Meaning
While an ER model shows structure, a semantic data model goes deeper by capturing the meaning of data and how it relates to the real world.
A semantic data model defines the significance of data within its context. Rather than just saying "customer record has a phone number field," a semantic model explains what that phone number represents: the customer's primary contact number, when it was last verified, what type of number it is, and its relationship to other customer information.
This matters because:
Real-world accuracy: The model ensures that stored data actually represents real-world entities, ideas, events, and resources. Without semantic clarity, data can become meaningless or misleading.
Integration across systems: When organizations have multiple databases with similar information, semantic models help identify which data really represents the same thing.
Constraint specification: Understanding semantics helps define the rules that keep data valid. A "purchase date" should always be before or equal to a "delivery date"—this rule only makes sense when you understand the semantic meaning of both fields.
Data Structures: Abstract vs. Concrete
When designing a data model, one critical distinction is choosing between abstract and concrete entity classes.
Abstract entity classes represent general categories or roles. The class "Person" is abstract—it doesn't assume any specific role. This approach is robust because:
It adapts to change: if someone switches from being an employee to a vendor, they're still a person
It avoids redundancy: you don't duplicate person information across different role-specific classes
It's reusable: applications can access person data regardless of the role
Concrete entity classes represent specific roles. "Employee," "Vendor," and "Customer" are concrete. While specific, they have limitations:
If someone becomes both an employee and a vendor, you need complex rules to manage this
Information about the person might be duplicated across classes
Changing roles requires moving data between classes
Best practice: Start with abstract entity classes and add concrete specializations when you need role-specific properties or behaviors.
<extrainfo>
Modeling Patterns
Patterns are common data modeling structures that recur across many different data models. Think of them as templates for solving typical modeling problems. For example, there are established patterns for modeling hierarchies (like organizational charts), time-varying data (like price history), and many-to-many relationships.
Understanding patterns helps you model faster and more consistently, as you can apply proven solutions rather than inventing new approaches each time.
</extrainfo>
Key Types of Data Models
Beyond the ER model and semantic model, several other model types serve specific purposes:
Database Models
A database model is a general specification for how a database is structured and used. Different database management systems implement different models—relational databases implement the relational model, document databases implement document models, graph databases implement graph models, and so on.
Data Structure Diagrams
A data structure diagram is similar to an ER diagram but may include additional detail about how entities and relationships are constrained. It uses boxes for entities and arrows for relationships, with annotations describing constraints.
Generic Data Models
A generic data model takes conventional data models and generalizes them by defining standardized relationship types that can connect different types of entities. These are useful when you're trying to integrate multiple different data models of the same domain—they provide a common abstraction level that bridges different modeling approaches.
Bringing It Together: The Data Modeling Workflow
Understanding all these concepts, you can see how they fit together in practice:
Start with data architecture to understand organizational needs and data flows
Create a conceptual data model capturing entities, attributes, and relationships without technical concerns
Build an entity-relationship model to visualize and validate the structure
Consider semantic properties to ensure the model captures real-world meaning
Design the logical model adapting the conceptual model to a specific database technology
Implement the physical model in an actual database system
At each level, you apply data properties (relevance, clarity, completeness, accuracy, timeliness) as quality checks. And you use data model theory—structural, integrity, and manipulation components—as the framework ensuring your model is complete and consistent.
This structured approach transforms business requirements into a working system that reliably stores and retrieves information.
Flashcards
Which pillar of enterprise or solution architecture describes the data structures used by a business and its applications?
Data architecture.
Which data properties are considered content-related?
Timeliness (availability when required)
Accuracy (closeness to the truth)
Besides definition and content, what are the additional properties used to evaluate data?
Completeness
Accessibility
Cost
In the three-schema architecture, which model describes how data is arranged using a DBMS (e.g., tables and columns)?
Logical model.
In the three-schema architecture, what does the physical model describe?
Storage media (such as cylinders, tracks, and tablespaces).
In a robust data model, what kind of entity classes should be identified instead of concrete role-specific classes?
Abstract entity classes (e.g., "Person").
What are the three components of data model theory?
Structural part (data structures representing entities)
Integrity part (rules governing constraints)
Manipulation part (operators for updating and querying)
In the relational model, how are the three components of data model theory expressed?
Structural: Mathematical relation
Integrity: First-order logic
Manipulation: Relational algebra and calculus
What occurs when a data model theory is applied to solve a specific business requirement?
A data model instance.
What term describes reusable data modeling structures that solve common problems across many models?
Patterns.
Where are attributes specified within an Entity-Relationship Diagram (ERD)?
Inside entity boxes.
What are common notations used to represent cardinality in an ERD?
Arrow heads
Crow's feet (inverted arrow heads)
Numerical representations
How do generic data models address the difficulty of integrating different conventional models of the same domain?
By providing a common abstraction level and standardized relation types.
Which type of conceptual model defines data meaning based on interrelationships to ensure it truly represents real-world entities?
Semantic data model.
Quiz
Data model - Core Concepts and Practices Quiz Question 1: What does a database model describe?
- How a database is structured and used (correct)
- The physical hardware layout of storage devices
- Network security protocols for database access
- User authentication mechanisms for the database
What does a database model describe?
1 of 1
Key Concepts
Data Modeling Concepts
Data Modeling Process
Data Model Theory
Modeling Patterns
Entity‑Relationship Model
Generic Data Model
Semantic Data Model
Data Organization and Structure
Data Architecture
Data Organization
Data Structure
Database Model
Data Structure Diagram
Data Quality Attributes
Data Properties
Definitions
Data Architecture
The discipline of designing and planning the structure, storage, movement, and processing of data within an enterprise or solution architecture.
Data Modeling Process
The systematic creation of a data model that captures business requirements using formal modeling techniques.
Data Properties
Qualities of data such as relevance, clarity, consistency, timeliness, accuracy, completeness, accessibility, and cost.
Data Organization
The arrangement of data within a database management system, encompassing logical and physical models.
Data Structure
The formal representation of how data elements are organized and related within a domain.
Data Model Theory
The theoretical framework comprising structural definitions, integrity constraints, and manipulation operators for data models.
Modeling Patterns
Reusable, recurring data modeling structures that provide standard solutions to common design problems.
Entity‑Relationship Model
A conceptual diagrammatic approach that represents entities, attributes, and relationships to model structured data.
Generic Data Model
An abstract model that defines standardized relation types to integrate and unify diverse conventional data models.
Semantic Data Model
A conceptual model that captures the meaning of data by describing its interrelationships and real‑world context.
Database Model
A specification that describes the logical and physical structure of a database and how it is used.
Data Structure Diagram
A visual representation that documents entities, their relationships, and constraints using boxes and arrows.