Data warehouse Study Guide
Study Guide
📖 Core Concepts
Data Warehouse (DW) – Central, read‑only repository that integrates current & historical data for reporting & analysis (core of Business Intelligence).
ETL vs ELT – ETL: Extract → Transform → Load (transform before loading). ELT: Extract → Load → Transform (transform inside the DW).
Operational (OLTP) vs Analytic (OLAP) Databases – OLTP: row‑oriented, high‑volume short transactions, normalized. OLAP: column‑oriented, low‑volume complex queries, denormalized (star schema).
Data Mart – Sub‑DW focused on a single subject/department; can be dependent (from central DW), independent (direct from sources), or hybrid.
Facts & Dimensions – Facts = measurable metrics; Dimensions = contextual attributes (date, product, etc.) that give meaning to facts.
Design Approaches – Bottom‑up (build marts first, integrate later), Top‑down (build enterprise DW first, derive marts), Hybrid (spoke‑hub with normalized DW + dimensional marts).
Core DW Characteristics – Subject‑oriented, Integrated, Time‑variant, Non‑volatile.
---
📌 Must Remember
DW purpose: unified view, historical analysis, decision support, reduced lock contention.
ETL workflow: Staging → Integration (transform) → Warehouse (store as facts/dimensions) → Access layer.
ELT workflow: Staging → Load raw → Transform inside DW.
OLTP metric: Transactions per second.
OLAP metric: Query response time; supports roll‑up, drill‑down, slice‑and‑dice.
Star schema = central fact table + surrounding dimension tables (fast query, business‑friendly).
Normalized schema = many subject‑area tables, high join complexity.
Bottom‑up → Bus architecture shares common dimensions/facts across marts.
Top‑down → Atomic data stored first; marts are derived later.
---
🔄 Key Processes
ETL Process
Extract raw data from source systems (marketing, sales, etc.).
Stage raw data in a staging layer (no transformation).
Transform in integration layer (cleanse, harmonize, aggregate).
Load into warehouse (arrange into fact & dimension tables).
ELT Process
Extract → Load raw data directly into DW.
Transform inside DW (often using column‑oriented engine).
OLAP Operations
Roll‑up: aggregate data to a higher level (e.g., monthly → yearly).
Drill‑down: move from summary to finer detail.
Slice & dice: filter on one or more dimensions to create sub‑cubes.
---
🔍 Key Comparisons
ETL vs ELT
ETL: Transform before load → lower load‑time storage, good for complex pre‑load cleansing.
ELT: Load raw → Transform inside DW → leverages columnar DB power, faster initial load.
Operational DB vs Analytic DB
Row‑oriented vs Column‑oriented.
Normalized (many tables) vs Denormalized (star schema).
High insert/update vs High select/aggregation.
Dependent vs Independent Data Mart
Dependent = sourced from central DW; ensures consistency.
Independent = built directly from source systems; faster to deliver but may duplicate effort.
---
⚠️ Common Misunderstandings
“Data warehouses are just big databases.” – Not true; they are read‑only, subject‑oriented, and store historical data optimized for analysis.
“ETL is always better than ELT.” – Depends on infrastructure; modern columnar DWs make ELT often more efficient.
“Normalization is always best.” – For analytics, denormalized star schemas give far better query performance and usability.
“Data marts replace a warehouse.” – Data marts are subsets; a full DW provides enterprise‑wide integration.
---
🧠 Mental Models / Intuition
“Warehouse = Library” – Books (facts) are organized on shelves (dimensions) for easy lookup; the catalog (metadata) tells you where everything is.
“ETL = Cooking” – Raw ingredients (source data) are pre‑processed (cleaned) in the kitchen (staging), mixed (transformed), then plated (loaded) for consumption.
“OLAP = LEGO building” – Each dimension is a LEGO block; stacking them with facts creates a structure you can zoom in/out (drill‑down/roll‑up).
---
🚩 Exceptions & Edge Cases
Virtualization – Provides real‑time access without copying data, but requires constant connectivity; not suitable when source systems are unreliable.
Non‑volatility – Generally read‑only, but regulatory/legal mandates may require occasional updates or deletions.
Hybrid Data Mart – May pull some data from DW and some directly from sources; watch for duplicate cleansing steps.
---
📍 When to Use Which
Choose ETL when source data needs heavy cleansing or when target DW cannot efficiently perform transforms.
Choose ELT when using a modern columnar DW (e.g., Snowflake, Redshift) that can handle massive in‑DW transformations.
Pick a star schema for high‑performance reporting and business‑user friendliness.
Pick a normalized schema for highly detailed, transaction‑level analysis or when storage redundancy must be minimized.
Select dependent data marts for consistency across the enterprise; independent when rapid, department‑specific deployment is critical.
---
👀 Patterns to Recognize
“Fact + multiple dimensions = star” – Whenever a problem mentions numeric metrics plus several descriptive attributes, think star schema.
“Roll‑up → higher aggregation level; Drill‑down → lower level” – Look for keywords like “summary”, “total”, “by month” (roll‑up) vs “detail”, “by day”, “individual” (drill‑down).
“ETL steps listed in order” – If “staging” appears before “integration”, it signals an ETL workflow.
“Column‑oriented DB + complex query” → likely an OLAP/analytic workload.
---
🗂️ Exam Traps
Confusing OLTP with OLAP performance metrics – Transaction per second = OLTP; response time = OLAP.
Assuming “data warehouse = data mart” – A mart is a subset; a full warehouse serves the whole enterprise.
Selecting normalized schema for reporting – Normalized designs are poor for ad‑hoc reporting; star schema is the typical answer.
Mixing up “subject‑oriented” vs “application‑oriented” – DW data is organized by business subjects (customer, product), not by the source application’s tables.
Thinking virtualization eliminates all data latency – It still depends on source system response times; not a true “instantaneous” copy.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or