Artificial intelligence Study Guide
Study Guide
📖 Core Concepts
Artificial Intelligence (AI): Machines that perceive environments and act to maximize goal achievement.
Agent: Entity with sensors (perception) and effectors (action) that selects actions to reach its goals.
Goal / Utility: Numeric value representing desirability of a state; agents try to maximize expected utility.
Knowledge Base & Ontology: Structured storage of facts, concepts, and relations for reasoning.
Learning Paradigms:
Supervised – learns from labeled examples (classification, regression).
Unsupervised – finds patterns without labels.
Reinforcement – learns via rewards/penalties from interaction.
Search: Systematic exploration of a state‑space to find a goal state; heuristics guide search.
Probabilistic Reasoning: Uses probability distributions (e.g., Bayesian networks) to handle uncertainty.
Neural Network Layers: Input → hidden → output; deep networks have ≥2 hidden layers.
Transformer: Architecture based on self‑attention; powers modern large language models (LLMs).
Ethical Pillars: Fairness, transparency/explainability, privacy, accountability.
---
📌 Must Remember
Expected Utility: \(\displaystyle EU(a)=\sum{s'} P(s'|a,s)\,U(s')\) – weight each outcome’s utility by its probability.
Combinatorial Explosion: Search time grows exponentially with problem size; heuristics are essential.
Markov Decision Process (MDP): Tuple \((S,A,T,R,\gamma)\); optimal policy \(\pi^\) maximizes discounted return.
Backpropagation: Gradient of loss w.r.t. each weight computed via chain rule; updates via gradient descent.
Universal Approximation Theorem: One hidden‑layer feedforward net can approximate any continuous function on a compact domain.
Bias Sources: Training data bias, sample‑size disparity, deployment context, proxy variables.
Fairness Types:
Distributive – equal outcomes/statistical parity.
Representational – avoid harmful stereotypes.
Procedural – fair decision‑making process.
RLHF (Reinforcement Learning from Human Feedback): Fine‑tunes LLMs for truthfulness and usefulness, reducing hallucinations.
AI “Winter”: Periods of reduced funding after unmet expectations (mid‑1970s, late‑1980s).
---
🔄 Key Processes
Heuristic Search (A\)
Compute \(f(n)=g(n)+h(n)\) where \(g\) = cost so far, \(h\) = admissible heuristic estimate to goal.
Expand node with lowest \(f\) until goal reached.
Training a Neural Network
Forward pass → compute loss.
Backward pass → calculate gradients via backpropagation.
Update weights: \(w \leftarrow w - \eta \nablaw L\) (gradient descent).
MDP Policy Iteration
Policy Evaluation: Compute state values \(V^\pi(s)=\suma \pi(a|s)\sum{s'}P(s'|s,a)[R(s,a,s')+\gamma V^\pi(s')]\).
Policy Improvement: Set \(\pi'(s)=\arg\maxa \sum{s'}P(s'|s,a)[R+\gamma V^\pi(s')]\).
Iterate until \(\pi\) stabilizes.
Transformer Self‑Attention
For each token: \( \text{Attention}(Q,K,V)=\text{softmax}\!\left(\frac{QK^\top}{\sqrt{dk}}\right)V\).
Stack multiple attention heads, add positional encodings, feed into feed‑forward layers.
Explainability (SHAP/LIME)
SHAP: Compute Shapley values for each feature → contribution to prediction.
LIME: Fit a simple interpretable model locally around the instance to approximate the complex model’s behavior.
---
🔍 Key Comparisons
Supervised vs. Unsupervised Learning
Supervised: Labeled data → predict specific output.
Unsupervised: No labels → discover structure (clustering, dimensionality reduction).
Symbolic AI vs. Connectionist AI
Symbolic: Logic, rules, explicit knowledge representation.
Connectionist: Neural nets, learning from data, implicit knowledge.
Deterministic Planning vs. Stochastic Planning
Deterministic: Action outcomes are certain; classic state‑space search applies.
Stochastic: Outcomes probabilistic; use MDPs or POMDPs.
Heuristic Search vs. Local Optimization
Heuristic Search: Global exploration with informed guidance (A\, Greedy).
Local Optimization: Starts from a guess, iteratively improves (gradient descent, hill‑climbing).
Rule‑Based (Hard) Computing vs. Soft Computing
Hard: Guarantees exact optimality; often intractable.
Soft: Accepts approximation, uncertainty (fuzzy logic, evolutionary algorithms).
---
⚠️ Common Misunderstandings
“AI equals deep learning.” AI encompasses symbolic reasoning, planning, perception, etc.; deep learning is one powerful subfield.
“More data always fixes bias.” Biased data or proxy features can still propagate unfairness even at large scale.
“A high‑performing model is trustworthy.” Accuracy does not guarantee explainability, safety, or alignment.
“Removing sensitive attributes eliminates bias.” Proxy variables can re‑introduce the same information.
“The Turing Test proves general intelligence.” It only measures imitation of human conversation, not broad problem solving.
---
🧠 Mental Models / Intuition
Utility Landscape: Think of an agent climbing a hill where height = utility; heuristics give a rough map, gradient descent finds the steepest ascent.
Bayesian Updating: Treat belief as a jar of colored marbles; new evidence adds or removes marbles proportionally, yielding a revised probability.
Transformer Attention: Imagine each word casting “spotlights” onto every other word; the brighter the spotlight (similarity), the more influence it has.
Bias as a Leaky Faucet: Even a small leak (biased feature) can flood the downstream decision if not caught early.
---
🚩 Exceptions & Edge Cases
Partial Observability: In many real‑world tasks agents cannot see the full state → use POMDPs or belief states.
Non‑Monotonic Reasoning: Default conclusions can be retracted when new information arrives (e.g., “birds typically fly”).
Combinatorial Explosion in Planning: For very large state spaces, exact planning is infeasible; resort to hierarchical or approximate methods.
RLHF Hallucinations: Even after RLHF, LLMs may generate false statements when prompts are out‑of‑distribution.
Fairness vs. Legal Constraints: Using protected attributes to correct bias may violate anti‑discrimination law in some jurisdictions.
---
📍 When to Use Which
Symbolic Reasoning (logic, planning) → problems requiring explicit rules, interpretability, or guaranteed correctness (e.g., theorem proving, safety‑critical planning).
Neural Networks / Deep Learning → high‑dimensional perceptual tasks (vision, speech, language) where feature engineering is infeasible.
Probabilistic Graphical Models → domains with known causal structure and need for uncertainty quantification (diagnostics, fault detection).
Heuristic Search (A\) → deterministic path‑finding with a reliable admissible heuristic (routing, puzzle solving).
Reinforcement Learning → sequential decision problems where a reward signal is available but model of environment is unknown (games, robotics).
Transfer Learning → when a large pre‑trained model exists and the target task has limited data.
Explainability Techniques (SHAP/LIME) → high‑risk applications (credit scoring, medical diagnosis) where stakeholder trust is required.
---
👀 Patterns to Recognize
“Explosion → Approximation” – Whenever problem size grows exponentially, look for heuristics, sampling, or hierarchical decomposition.
“Data → Feature → Model” pipeline: quality/representativeness of data often dictates model bias more than algorithm choice.
“Reward Shaping → Unintended Behavior” – In RL, overly specific rewards can lead to gaming the reward function.
“Attention Weights ≈ Importance” – In transformers, high attention scores often (but not always) indicate key contextual influence.
“Bias ↔ Proxy Variables” – Identify features that correlate strongly with protected attributes; they are red flags.
---
🗂️ Exam Traps
Distractor: “Neural nets are always more accurate than symbolic methods.” – Wrong; accuracy depends on task and data.
Near‑miss: Claiming A\ is optimal only with an admissible heuristic; forgetting that consistent heuristic guarantees optimality without re‑expansion.
Confusion: Equating “deep learning” with “any multi‑layer network.” Deep learning specifically stacks many layers to learn hierarchical features.
Misreading: Assuming “expected utility” ignores probabilities; remember the probability weighting term.
Edge‑case: Selecting “gradient descent” for discrete combinatorial optimization – inefficient; local search or evolutionary algorithms are preferred.
---
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or