RemNote Community
Community

Data model - Core Concepts and Practices

Understand core data modeling concepts, essential data properties and structures, and common model types and patterns.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

Which pillar of enterprise or solution architecture describes the data structures used by a business and its applications?
1 of 14

Summary

Core Topics in Data Modeling Introduction: What is Data Modeling? Data modeling is the process of creating a structured representation of how an organization's data is organized, stored, and used. Think of it as creating a blueprint for a database—just as architects design buildings before construction begins, data modelers design data structures before a system is built. At its core, data modeling serves two purposes. First, it captures business requirements by defining what data an organization needs and how that data relates to other data. Second, it guides implementation by providing a formal specification that developers and database administrators use to build actual systems. The result of data modeling is a data model—a formal description of the structure, constraints, and relationships within a specific domain of data. This model acts like a grammar for how we organize and understand information in that domain. Data Architecture: The Big Picture Before diving into models, it's important to understand data architecture, which provides the larger context for all data modeling work. Data architecture is the overall design that defines a target state for how an organization manages its data and plans how to achieve that state. It's a foundational pillar of enterprise architecture—the master plan for how an organization's technology works. Data architecture describes: Data structures used by the business and its applications Data in different states: data at rest (stored), data in motion (being transferred), and data being processed Data components: data stores where information lives, data groups (collections of related data), and individual data items Data flows: how data moves through systems, with criteria for processing operations In essence, data architecture answers the question: "What data does our organization need, where does it live, and how does it flow through our systems?" Data Organization: From Concept to Implementation Once we understand data architecture, we need to actually organize data using technology. This happens through multiple levels of abstraction, a concept called the three-schema architecture. The Three Levels The conceptual model is the highest level of abstraction. It describes what data the business needs without any concern for how it will be stored or accessed. This model uses business-friendly terms and focuses on entities (like "Customer" or "Order") and how they relate to each other. The logical model translates the conceptual model into a format that can be implemented in a specific database technology. For a relational database, this means organizing data into tables and columns with defined relationships. The logical model is technology-neutral in principle—it could be implemented in many different database systems. The physical model describes the actual storage implementation: how tables are stored on disk, which indexes exist, how data is partitioned across storage devices, and other technical details specific to a particular database system. The logical model is typically derived from the conceptual model, but it may be adjusted based on performance requirements and usage patterns. For example, you might restructure how data is organized to make common queries run faster. Data Properties: What Makes Data Good? Not all data is equally useful. Understanding data properties (also called data quality attributes) helps you assess whether your data model captures data appropriately. These properties fall into several categories: Definition-Related Properties Relevance means the data actually serves the intended purpose. Data about customer complaints is relevant for improving service quality, but irrelevant for predicting warehouse inventory needs. Clarity requires that there's a clear, shared understanding of what the data means. For example, "customer age" could mean age at registration, current age, or age at last purchase—without clarity, people will misinterpret it. Consistency ensures that the same type of data from different sources is compatible and comparable. If one system records dates as "MM/DD/YYYY" and another as "YYYY-MM-DD", they're inconsistent. Content-Related Properties Timeliness means data is available when you need it. Real-time transaction data has high timeliness; annual reports have low timeliness. Accuracy measures how closely data reflects reality. A customer's recorded phone number is accurate if it's actually their correct number. Additional Properties Completeness describes how much of the required data is actually available. If you're missing customer email addresses for 30% of your records, completeness is low. Accessibility defines where, how, and to whom the data is available. Some data might exist but be locked in legacy systems or restricted by privacy regulations. Cost is the practical expense of obtaining and maintaining the data. It's a trade-off: highly accurate, real-time data usually costs more to produce than approximate data. Data Model Theory: The Three Components To understand data models deeply, you need to know the theoretical framework they're built on. Every data model has three components: The Structural Part The structural part defines the collection of data structures that represent entities. These are the building blocks—the types of objects your model recognizes and the properties they have. In a customer management system, the structural part includes structures for "Customer," "Order," and "Payment," along with what properties each has (customer name, order date, payment amount, etc.). The Integrity Part The integrity part defines rules and constraints that maintain data quality and structural correctness. These rules ensure that: Required data is present (you can't have an order without a customer) Data values are valid (a discount percentage can't be more than 100) Relationships are maintained (you can't delete a customer if they still have active orders) The Manipulation Part The manipulation part specifies operators and methods for updating, querying, and transforming data. These operations let you insert new data, update existing data, delete data, and retrieve information through queries. The Relational Model Example The most common data model used today is the relational model. Here's how its three components work: Structural part: Based on the mathematical concept of a "relation" (which appears as a table with rows and columns) Integrity part: Expressed using first-order logic, defining constraints like primary keys and foreign keys Manipulation part: Uses relational algebra, tuple calculus, and domain calculus for querying and updating When you take this theoretical model and apply it to a specific business problem, you create a data model instance—for example, transforming a semantic logical model into an actual physical database. Entity-Relationship Models: Visualizing Structure One of the most practical ways to represent a data model is through an entity-relationship model (ER model), also called an entity-relationship diagram or ERD. What is an Entity-Relationship Model? An ER model is a visual representation of a conceptual data model. It shows: Entities: Objects or concepts the business cares about (represented as boxes) Attributes: Properties of those entities (listed inside the boxes) Relationships: Connections between entities (represented as lines connecting boxes) Cardinality: Rules about how many instances of one entity can relate to another Understanding Cardinality Cardinality is one of the trickier concepts in ER modeling. It describes the multiplicity of relationships—that is, how many instances of one entity can be associated with instances of another. For example: One customer can place multiple orders (one-to-many relationship) One order belongs to exactly one customer (cardinality constraint) One product can appear on many orders (many-to-many relationship) Different notations represent cardinality on ER diagrams: Arrow heads point toward the entity that has lower cardinality Crow's feet notation (inverted V shapes) indicate the "many" side of relationships Numerical notations explicitly show cardinality (1:1, 1:N, M:N) The key is understanding that cardinality constraints are business rules. They reflect real-world limitations. For instance, a business rule might state "each order must have at least one product" (minimum cardinality of 1) and "each order can have at most 100 line items" (maximum cardinality of 100). Semantic Data Models: Capturing Meaning While an ER model shows structure, a semantic data model goes deeper by capturing the meaning of data and how it relates to the real world. A semantic data model defines the significance of data within its context. Rather than just saying "customer record has a phone number field," a semantic model explains what that phone number represents: the customer's primary contact number, when it was last verified, what type of number it is, and its relationship to other customer information. This matters because: Real-world accuracy: The model ensures that stored data actually represents real-world entities, ideas, events, and resources. Without semantic clarity, data can become meaningless or misleading. Integration across systems: When organizations have multiple databases with similar information, semantic models help identify which data really represents the same thing. Constraint specification: Understanding semantics helps define the rules that keep data valid. A "purchase date" should always be before or equal to a "delivery date"—this rule only makes sense when you understand the semantic meaning of both fields. Data Structures: Abstract vs. Concrete When designing a data model, one critical distinction is choosing between abstract and concrete entity classes. Abstract entity classes represent general categories or roles. The class "Person" is abstract—it doesn't assume any specific role. This approach is robust because: It adapts to change: if someone switches from being an employee to a vendor, they're still a person It avoids redundancy: you don't duplicate person information across different role-specific classes It's reusable: applications can access person data regardless of the role Concrete entity classes represent specific roles. "Employee," "Vendor," and "Customer" are concrete. While specific, they have limitations: If someone becomes both an employee and a vendor, you need complex rules to manage this Information about the person might be duplicated across classes Changing roles requires moving data between classes Best practice: Start with abstract entity classes and add concrete specializations when you need role-specific properties or behaviors. <extrainfo> Modeling Patterns Patterns are common data modeling structures that recur across many different data models. Think of them as templates for solving typical modeling problems. For example, there are established patterns for modeling hierarchies (like organizational charts), time-varying data (like price history), and many-to-many relationships. Understanding patterns helps you model faster and more consistently, as you can apply proven solutions rather than inventing new approaches each time. </extrainfo> Key Types of Data Models Beyond the ER model and semantic model, several other model types serve specific purposes: Database Models A database model is a general specification for how a database is structured and used. Different database management systems implement different models—relational databases implement the relational model, document databases implement document models, graph databases implement graph models, and so on. Data Structure Diagrams A data structure diagram is similar to an ER diagram but may include additional detail about how entities and relationships are constrained. It uses boxes for entities and arrows for relationships, with annotations describing constraints. Generic Data Models A generic data model takes conventional data models and generalizes them by defining standardized relationship types that can connect different types of entities. These are useful when you're trying to integrate multiple different data models of the same domain—they provide a common abstraction level that bridges different modeling approaches. Bringing It Together: The Data Modeling Workflow Understanding all these concepts, you can see how they fit together in practice: Start with data architecture to understand organizational needs and data flows Create a conceptual data model capturing entities, attributes, and relationships without technical concerns Build an entity-relationship model to visualize and validate the structure Consider semantic properties to ensure the model captures real-world meaning Design the logical model adapting the conceptual model to a specific database technology Implement the physical model in an actual database system At each level, you apply data properties (relevance, clarity, completeness, accuracy, timeliness) as quality checks. And you use data model theory—structural, integrity, and manipulation components—as the framework ensuring your model is complete and consistent. This structured approach transforms business requirements into a working system that reliably stores and retrieves information.
Flashcards
Which pillar of enterprise or solution architecture describes the data structures used by a business and its applications?
Data architecture.
Which data properties are considered content-related?
Timeliness (availability when required) Accuracy (closeness to the truth)
Besides definition and content, what are the additional properties used to evaluate data?
Completeness Accessibility Cost
In the three-schema architecture, which model describes how data is arranged using a DBMS (e.g., tables and columns)?
Logical model.
In the three-schema architecture, what does the physical model describe?
Storage media (such as cylinders, tracks, and tablespaces).
In a robust data model, what kind of entity classes should be identified instead of concrete role-specific classes?
Abstract entity classes (e.g., "Person").
What are the three components of data model theory?
Structural part (data structures representing entities) Integrity part (rules governing constraints) Manipulation part (operators for updating and querying)
In the relational model, how are the three components of data model theory expressed?
Structural: Mathematical relation Integrity: First-order logic Manipulation: Relational algebra and calculus
What occurs when a data model theory is applied to solve a specific business requirement?
A data model instance.
What term describes reusable data modeling structures that solve common problems across many models?
Patterns.
Where are attributes specified within an Entity-Relationship Diagram (ERD)?
Inside entity boxes.
What are common notations used to represent cardinality in an ERD?
Arrow heads Crow's feet (inverted arrow heads) Numerical representations
How do generic data models address the difficulty of integrating different conventional models of the same domain?
By providing a common abstraction level and standardized relation types.
Which type of conceptual model defines data meaning based on interrelationships to ensure it truly represents real-world entities?
Semantic data model.

Quiz

What does a database model describe?
1 of 1
Key Concepts
Data Modeling Concepts
Data Modeling Process
Data Model Theory
Modeling Patterns
Entity‑Relationship Model
Generic Data Model
Semantic Data Model
Data Organization and Structure
Data Architecture
Data Organization
Data Structure
Database Model
Data Structure Diagram
Data Quality Attributes
Data Properties