RemNote Community
Community

Study Guide

📖 Core Concepts Data Warehouse (DW) – Central, read‑only repository that integrates current & historical data for reporting & analysis (core of Business Intelligence). ETL vs ELT – ETL: Extract → Transform → Load (transform before loading). ELT: Extract → Load → Transform (transform inside the DW). Operational (OLTP) vs Analytic (OLAP) Databases – OLTP: row‑oriented, high‑volume short transactions, normalized. OLAP: column‑oriented, low‑volume complex queries, denormalized (star schema). Data Mart – Sub‑DW focused on a single subject/department; can be dependent (from central DW), independent (direct from sources), or hybrid. Facts & Dimensions – Facts = measurable metrics; Dimensions = contextual attributes (date, product, etc.) that give meaning to facts. Design Approaches – Bottom‑up (build marts first, integrate later), Top‑down (build enterprise DW first, derive marts), Hybrid (spoke‑hub with normalized DW + dimensional marts). Core DW Characteristics – Subject‑oriented, Integrated, Time‑variant, Non‑volatile. --- 📌 Must Remember DW purpose: unified view, historical analysis, decision support, reduced lock contention. ETL workflow: Staging → Integration (transform) → Warehouse (store as facts/dimensions) → Access layer. ELT workflow: Staging → Load raw → Transform inside DW. OLTP metric: Transactions per second. OLAP metric: Query response time; supports roll‑up, drill‑down, slice‑and‑dice. Star schema = central fact table + surrounding dimension tables (fast query, business‑friendly). Normalized schema = many subject‑area tables, high join complexity. Bottom‑up → Bus architecture shares common dimensions/facts across marts. Top‑down → Atomic data stored first; marts are derived later. --- 🔄 Key Processes ETL Process Extract raw data from source systems (marketing, sales, etc.). Stage raw data in a staging layer (no transformation). Transform in integration layer (cleanse, harmonize, aggregate). Load into warehouse (arrange into fact & dimension tables). ELT Process Extract → Load raw data directly into DW. Transform inside DW (often using column‑oriented engine). OLAP Operations Roll‑up: aggregate data to a higher level (e.g., monthly → yearly). Drill‑down: move from summary to finer detail. Slice & dice: filter on one or more dimensions to create sub‑cubes. --- 🔍 Key Comparisons ETL vs ELT ETL: Transform before load → lower load‑time storage, good for complex pre‑load cleansing. ELT: Load raw → Transform inside DW → leverages columnar DB power, faster initial load. Operational DB vs Analytic DB Row‑oriented vs Column‑oriented. Normalized (many tables) vs Denormalized (star schema). High insert/update vs High select/aggregation. Dependent vs Independent Data Mart Dependent = sourced from central DW; ensures consistency. Independent = built directly from source systems; faster to deliver but may duplicate effort. --- ⚠️ Common Misunderstandings “Data warehouses are just big databases.” – Not true; they are read‑only, subject‑oriented, and store historical data optimized for analysis. “ETL is always better than ELT.” – Depends on infrastructure; modern columnar DWs make ELT often more efficient. “Normalization is always best.” – For analytics, denormalized star schemas give far better query performance and usability. “Data marts replace a warehouse.” – Data marts are subsets; a full DW provides enterprise‑wide integration. --- 🧠 Mental Models / Intuition “Warehouse = Library” – Books (facts) are organized on shelves (dimensions) for easy lookup; the catalog (metadata) tells you where everything is. “ETL = Cooking” – Raw ingredients (source data) are pre‑processed (cleaned) in the kitchen (staging), mixed (transformed), then plated (loaded) for consumption. “OLAP = LEGO building” – Each dimension is a LEGO block; stacking them with facts creates a structure you can zoom in/out (drill‑down/roll‑up). --- 🚩 Exceptions & Edge Cases Virtualization – Provides real‑time access without copying data, but requires constant connectivity; not suitable when source systems are unreliable. Non‑volatility – Generally read‑only, but regulatory/legal mandates may require occasional updates or deletions. Hybrid Data Mart – May pull some data from DW and some directly from sources; watch for duplicate cleansing steps. --- 📍 When to Use Which Choose ETL when source data needs heavy cleansing or when target DW cannot efficiently perform transforms. Choose ELT when using a modern columnar DW (e.g., Snowflake, Redshift) that can handle massive in‑DW transformations. Pick a star schema for high‑performance reporting and business‑user friendliness. Pick a normalized schema for highly detailed, transaction‑level analysis or when storage redundancy must be minimized. Select dependent data marts for consistency across the enterprise; independent when rapid, department‑specific deployment is critical. --- 👀 Patterns to Recognize “Fact + multiple dimensions = star” – Whenever a problem mentions numeric metrics plus several descriptive attributes, think star schema. “Roll‑up → higher aggregation level; Drill‑down → lower level” – Look for keywords like “summary”, “total”, “by month” (roll‑up) vs “detail”, “by day”, “individual” (drill‑down). “ETL steps listed in order” – If “staging” appears before “integration”, it signals an ETL workflow. “Column‑oriented DB + complex query” → likely an OLAP/analytic workload. --- 🗂️ Exam Traps Confusing OLTP with OLAP performance metrics – Transaction per second = OLTP; response time = OLAP. Assuming “data warehouse = data mart” – A mart is a subset; a full warehouse serves the whole enterprise. Selecting normalized schema for reporting – Normalized designs are poor for ad‑hoc reporting; star schema is the typical answer. Mixing up “subject‑oriented” vs “application‑oriented” – DW data is organized by business subjects (customer, product), not by the source application’s tables. Thinking virtualization eliminates all data latency – It still depends on source system response times; not a true “instantaneous” copy. ---
or

Or, immediately create your own study flashcards:

Upload a PDF.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or