Subjects/Technology/Data and AI/Database/Database

Fundamentals and Landscape of Databases

Understand the core concepts, major database models, and historical evolution of database systems.

Summary

Read Summary

Flashcards

Save Flashcards

Quiz

Take Quiz

Quick Practice

What is the definition of a database?

1 of 25

Summary

Overview of Databases What is a Database? A database is an organized collection of data that's stored and accessed through a specialized software system. To understand this fully, you need to know about two key components: The database itself — the actual organized data The database management system (DBMS) — the software that enables users and applications to capture, store, retrieve, and analyze that data When you combine the database, the DBMS, and the applications that use them together, you get a database system. Think of it this way: a database is like a filing cabinet, the DBMS is the organizational system that tells you how to find files, and the database system is the entire filing operation including the people using it. Core Responsibilities of Database Systems Database management systems handle several critical functions that are essential to keeping data safe, accessible, and reliable: Data security: Controlling who can access what data Data integrity: Ensuring data remains accurate and consistent Concurrency control: Managing simultaneous access by multiple users without conflicts Performance monitoring: Tracking how efficiently the system operates Recovery from failures: Protecting data when unexpected problems occur These responsibilities address the three main functional operations any DBMS must support: data definition (creating and modifying how data is organized), update (inserting, modifying, and deleting actual data), and retrieval (selecting and providing data based on criteria). Major Database Models Different database models organize and store data in fundamentally different ways. Understanding these models is crucial because each has distinct advantages depending on your use case. Relational Databases organize data as rows and columns in tables, similar to spreadsheets. They use Structured Query Language (SQL) for both defining data structure and querying. For example, you might have a table of films with columns for title, release year, and length—exactly like what you see in img1. The key insight with relational databases is that you can connect information across tables using primary keys, which uniquely identify each row. NoSQL Databases (meaning "Not Only SQL") take a completely different approach. Rather than rigid table structures, they use flexible schemas and various query languages. This category includes several important subtypes: Document-oriented databases store semi-structured documents (like JSON files) where different records can have different structures Key-value stores provide fast lookups using simple key-value pairs Graph databases represent data as nodes (entities), edges (relationships), and properties—perfect for social networks or recommendation systems Object-Oriented Databases emerged to solve a real problem programmers face: the object-relational impedance mismatch. When you write code using objects that have both data and behavior, converting those objects into rows and columns for a relational database feels clunky and inefficient. Object-oriented databases store data as objects directly, preserving both the state (data) and behavior (methods) of objects. Distributed and Cloud Databases extend these models across multiple computers. In a distributed database, both the data and the DBMS are spread across multiple machines, which improves performance and reliability. A cloud database takes this further by hosting the database and most of the DBMS remotely in a cloud environment. Data Warehouses serve a different purpose entirely—they aggregate and transform data from multiple operational databases and external sources for analytical processing and decision-making, rather than supporting day-to-day transactions. How Databases are Physically Organized Beyond the logical models above, databases can be organized physically in different architectural patterns, particularly important for distributed systems: Shared-memory architecture allows multiple processors to access the same main memory directly. This is simple but has limitations as you add more processors. Shared-disk architecture gives each processor its own memory but allows all processors to share common storage devices. This balances isolation with data sharing. Shared-nothing architecture provides each processor with its own memory and storage—completely separate. This eliminates contention but requires careful coordination between processors. Parallel database architecture improves performance by breaking tasks into parallel operations. Instead of loading data one record at a time, you might load multiple sections simultaneously. Similarly, index building and query evaluation can happen in parallel across the system. Historical Development of Databases Understanding how databases evolved helps you appreciate why different models exist today and when each is most appropriate. The Relational Revolution (1970s) The most important development in database history was Edgar Frank Codd's proposal of the relational model in 1970. Codd's key insight was representing data as tables with rows and columns—a simple but powerful idea that became the foundation for modern databases. The relational model introduced several crucial concepts: Primary keys uniquely identify each row in a table. They're the foundation for connecting data across tables—you reference one table's primary key in another table to establish relationships. Normalization is the process of organizing data to eliminate redundancy. Rather than storing the same fact in multiple places, you split data into separate tables so each fact appears only once. This dramatically simplifies updates—if customer information changes, you update it once and all queries automatically see the new information. Query optimization allows the DBMS to find efficient access paths for queries. When you write a query in SQL, you're describing what data you want, not how to get it. The DBMS's optimizer analyzes different possible approaches and chooses the fastest one. This is powerful because programmers can write simple, declarative queries without needing to understand the internal storage details. Views are virtual tables—they present data in alternative ways without requiring duplicate storage. For example, you might have a view that filters a customer table to show only active customers, or combines information from multiple tables. Importantly, views are read-only; you cannot directly update them (though changes to the underlying tables automatically appear in views). Before Relational: Hierarchical Databases (1960s) To appreciate what the relational model improved, it helps to understand the hierarchical model that preceded it. Hierarchical databases organized data in a tree-like structure where each record has a single parent. This worked well for naturally hierarchical data but became cumbersome for complex relationships. The Object-Oriented Era (1980s-1990s) As object-oriented programming became dominant, programmers faced a mismatch between objects in their code and relational tables in databases. Object databases were created to store data as objects with both state and behavior intact, avoiding the need for translation. However, fully replacing SQL proved impractical. Instead, the industry developed hybrid object-relational databases that combine object concepts with relational tables, and object-relational mapping (ORM) libraries that automatically convert between objects in code and relational tables in the database. Most modern applications use ORM—frameworks like Hibernate or SQLAlchemy handle this translation automatically. Also important from this era: the entity-relationship model emerged in the mid-1970s (gaining prominence in the 1980s) as a tool to aid database design with intuitive diagrams showing entities and their relationships. This helped designers think through database structure before implementation. <extrainfo> </extrainfo> The NoSQL and NewSQL Era (2000s-Present) As internet applications grew massive, relational databases began showing limitations at extreme scale. NoSQL databases emerged to provide fast key-value stores and document-oriented storage without requiring fixed schemas (you can store different structures in the same collection). The trade-off came down to consistency. Traditional relational databases use ACID properties (Atomicity, Consistency, Isolation, Durability)—strong guarantees that data is always correct. NoSQL systems often relaxed this to use eventual consistency, meaning data might not be perfectly consistent everywhere immediately, but will eventually become consistent. This sacrifice enabled better availability and partition tolerance when networks fail. This trade-off is formalized in the CAP theorem: a distributed system can provide at most two of these three guarantees simultaneously: Consistency: All nodes see the same data simultaneously Availability: The system continues operating even when nodes fail Partition tolerance: The system tolerates network failures Most NoSQL systems chose Availability and Partition tolerance, accepting eventual consistency instead of strong consistency. More recently, NewSQL databases aim to have it all—retaining the relational model and ACID guarantees while delivering scalable performance comparable to NoSQL systems. These represent an ongoing evolution toward better solutions. Using Databases in Practice Operational Databases Operational databases support the day-to-day business of organizations. They store detailed transaction data like customer contact information, employee records, product component details, and financial data. These databases prioritize immediate, accurate updates—when a customer makes a purchase, the system must immediately record it correctly. Analytical Systems Data warehouses serve a fundamentally different purpose. They aggregate, transform, and load data from multiple operational databases and external sources, then organize it for managerial analysis and decision-making. Unlike operational databases that answer "What happened right now?", data warehouses answer "What patterns do we see over time?" Real-time databases occupy a middle ground, processing transactions quickly enough for immediate action. Telecommunications switching systems exemplify this—decisions about routing calls must happen in milliseconds, requiring specialized real-time database technology.

Flashcards

What is the definition of a database?

An organized collection of data stored and accessed through a database management system.

What processes are involved in the data definition capability of a DBMS?

Creation, modification, and removal of definitions describing data organization.

Which specific actions are categorized under the update function of a database?

Insertion, modification, and deletion of data.

What is the primary goal of the retrieval function in a DBMS?

Selecting data according to specified criteria for the user or further processing.

How does the relational database model organize data?

As rows and columns in tables.

Which language is typically used for data definition and queries in relational databases?

Structured Query Language (SQL).

Who proposed the relational model in 1970?

Edgar Frank Codd.

What is the purpose of a primary key in a relational table?

To uniquely identify rows and establish cross-table relationships.

What is the purpose of normalization in database design?

To split data into separate tables so each fact is stored only once, simplifying updates.

What are views in a relational database?

Virtual tables that present data in alternative ways but cannot be directly updated.

What benefit do NoSQL databases provide regarding schemas?

They do not require fixed schemas.

How is data stored in an object-oriented database model?

As objects that encapsulate both state and behavior.

What type of data is stored in a document-oriented database?

Semi-structured documents.

Which components are used to represent data and relationships in a graph database?

Nodes, edges, and properties.

What is the primary purpose of a data warehouse?

Archiving data from various sources for analytical processing and decision making.

What is the relationship between memory and storage in a shared-disk architecture?

Each processing unit has its own main memory, but all share common storage devices.

How does shared-nothing architecture eliminate resource contention?

Each processing unit is provided with its own independent memory and storage.

Which tasks can be parallelized in a parallel database architecture to improve performance?

Data loading, index building, and query evaluation.

How is data organized in a hierarchical database model?

In a tree-like structure where each record has a single parent.

What was the purpose of the entity-relationship model emerging in the mid-1970s?

To aid relational database design using intuitive diagrams.

What is the function of object-relational mapping libraries?

To automatically map objects in code to relational database tables.

Why do NoSQL systems often use eventual consistency?

To achieve availability and partition tolerance by relaxing strict consistency.

What is the primary goal of NewSQL databases?

To retain relational semantics and ACID guarantees while matching NoSQL scalability.

What does the CAP theorem state regarding distributed systems?

A system can provide at most two of consistency, availability, and partition tolerance simultaneously.

What kind of data is stored in an operational database?

Detailed transaction data (e.g., customer contacts, employee records, financial data).

Quiz

What term refers to the combination of a database, its DBMS, and associated applications?

1 of 8

Key Concepts

Database Concepts

Database

Database Management System

Relational Database Model

NoSQL (Not Only Structured Query Language) Database

Object‑Oriented Database

Data Warehouse

Database Design and Theory

Normalization

Entity‑Relationship Model

CAP Theorem

Parallel Database Architecture

Definitions

Database

An organized collection of data managed by software that enables storage, retrieval, and manipulation.

Database Management System

Software that provides tools for defining, updating, retrieving, and securing data in a database.

Relational Database Model

A data model that structures information into tables of rows and columns, accessed primarily via SQL.

NoSQL (Not Only Structured Query Language) Database

A class of databases that use flexible schemas and alternative query languages, often emphasizing scalability and performance.

Object‑Oriented Database

A database that stores data as objects, encapsulating both state and behavior, to align with object‑oriented programming.

Data Warehouse

A centralized repository that aggregates and stores data from multiple sources for analytical processing and reporting.

CAP Theorem

A principle stating that a distributed system can simultaneously provide at most two of three guarantees: consistency, availability, and partition tolerance.

Normalization

The process of organizing database tables to reduce redundancy and improve data integrity.

Entity‑Relationship Model

A diagrammatic approach for modeling data entities and their relationships, commonly used in relational database design.

Parallel Database Architecture

A system design that distributes database operations across multiple processors to improve performance through parallelism.