Feature engineering Study Guide
Study Guide
📖 Core Concepts
Feature Engineering – Pre‑processing step that turns raw data into useful model inputs (features).
Feature – An attribute/column representing a measurable property of each observation.
Feature Explosion – When engineered features proliferate so much that the model becomes over‑parameterized.
Feature Store – Centralized repo that version‑controls and serves features for training and real‑time inference.
Dimensionality Reduction – Techniques (PCA, ICA, LDA) that compress many features into fewer, information‑rich components.
Regularization – Adding a penalty (L1 or L2) to the loss to shrink coefficients of irrelevant/ redundant features.
Deep Feature Synthesis (DFS) – Automated method that stacks relational operations to create features from relational data.
Non‑Negative Matrix Factorization (NMF) – Decomposes a data matrix X into W·H, with W, H ≥ 0, yielding part‑based clusters.
📌 Must Remember
Feature engineering ≠ feature selection; the former creates/transform features, the latter picks a subset.
L1 regularization (Lasso): penalty = $\lambda\sum|wi|$ → drives many weights to exactly 0 (feature pruning).
L2 regularization (Ridge): penalty = $\lambda\sum wi^2$ → shrinks weights but rarely zeros them out.
DFS often beats human‑crafted features in Kaggle‑style competitions.
Hashing trick maps high‑cardinality categorical variables to a fixed‑size vector via a hash function, avoiding explicit dictionary storage.
NMF only works when data are non‑negative; the resulting factors are interpretable as additive parts.
🔄 Key Processes
Feature Creation
Identify raw columns → apply domain knowledge → generate new columns (e.g., ratios, interaction terms, time‑lag aggregates).
Feature Transformation & Imputation
Detect missing/invalid values → impute (mean, median, model‑based).
Apply scaling (standardization, min‑max) or encoding (one‑hot, ordinal, target encoding).
Dimensionality Reduction Workflow
Center & optionally scale data → compute covariance matrix → eigen‑decompose (PCA) → retain top k components explaining desired variance.
Feature Selection Pipeline
Compute importance scores (e.g., tree‑based Gini, mutual information).
Remove highly correlated (> 0.9) features using correlation matrix.
Apply statistical tests (χ² for categorical, ANOVA for continuous) to keep significant predictors.
Regularization to Tame Feature Explosion
Choose L1 when you want automatic sparsity.
Choose L2 when you need smooth shrinkage without hard zeroing.
🔍 Key Comparisons
Feature Engineering vs. Feature Selection
Engineering: creates/ modifies features.
Selection: picks a subset of existing features.
L1 vs. L2 Regularization
L1 → sparsity (many zeros).
L2 → small but non‑zero weights.
Manual Features vs. Automated (DFS)
Manual: high domain insight, slower, may miss interactions.
DFS: fast, exhaustive, may generate noisy features → needs downstream pruning.
NMF vs. PCA
NMF: non‑negative, parts‑based, additive.
PCA: orthogonal components, can be negative, captures variance direction.
⚠️ Common Misunderstandings
“More features always improve accuracy.” → After a point, extra features cause over‑fitting and computational blow‑up.
“Deep learning eliminates the need for feature engineering.” → For limited data or specific domains, engineered features still boost performance.
“Hashing trick is lossless.” → Collisions can occur; they introduce noise but are acceptable when dimensionality must be bounded.
“Regularization only prevents over‑fitting.” – It also performs implicit feature selection (especially L1).
🧠 Mental Models / Intuition
“Feature pipeline as a factory line.” Raw data → cleaning → transformation → creation → selection → model. Each step must output clean, compact, and informative parts.
“Regularization as a weight‑tax.” Imagine each coefficient pays a tax proportional to its size (L2) or absolute value (L1); high‑tax items shrink or disappear.
“NMF as Lego bricks.” Non‑negative factors are like building blocks that can only be added, never subtracted, yielding intuitive part‑based clusters.
🚩 Exceptions & Edge Cases
NMF fails on datasets with negative values; need to shift/scale to non‑negative domain first.
Hashing trick collisions become problematic when cardinality ≫ hash space; increase hash size or use hybrid encoding.
DFS may produce redundant features when relational depth is high; combine with correlation‑based pruning.
📍 When to Use Which
Use DFS when you have relational/tabular data with many joinable tables and limited time for manual engineering.
Pick L1 if you need a sparse model for interpretability or to meet memory constraints.
Pick L2 when all features are believed to have some predictive signal and you want stability.
Apply NMF for clustering or topic modeling on non‑negative data (e.g., image pixel intensities, word counts).
Use PCA when you need orthogonal components for downstream linear models or visualization.
👀 Patterns to Recognize
Repeated high cardinality categorical → hashing trick is a quick fix.
Many zero or near‑zero variance columns → drop early (they add noise, no information).
Strong linear correlation (> 0.9) between two features → keep one, drop the other.
Model performance plateaus after adding > N engineered features → suspect feature explosion → apply regularization or selection.
🗂️ Exam Traps
Choosing L2 when the question asks for “feature elimination”. L2 shrinks but rarely zeroes weights.
Selecting PCA as a “part‑based” method. PCA yields orthogonal directions, not additive parts—NMF is the correct answer.
Assuming “feature store” only serves training data. It also serves real‑time inference features.
Confusing “feature extraction” (e.g., spectral coefficients) with “feature engineering”. Extraction derives from raw signals; engineering manipulates existing tabular features.
Believing the hashing trick preserves exact categories. Collisions mean different categories may map to the same bucket.
---
Prepared for quick review: focus on definitions, when/why to apply each technique, and the common pitfalls that turn a good answer into a wrong one.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or