Mathematical optimization Study Guide
Study Guide
📖 Core Concepts
Mathematical Optimization – choosing the best element from a set of alternatives by maximizing or minimizing a real‑valued objective function.
Search space (feasible set) \(A\) – all candidate solutions that satisfy any constraints.
Objective / loss / utility function \(f(x)\) – maps each candidate \(x\in A\) to a real number to be minimized \(\min{x\in A} f(x)\) or maximized \(\max{x\in A} f(x)\).
Feasible solution – any \(x\) that belongs to \(A\).
Optimal solution – feasible solution that attains the smallest (or largest) objective value.
Local vs. Global Optimum – a local optimum is best only within a neighbourhood; a global optimum is best over the entire feasible set.
Convexity – if \(f\) is convex (and \(A\) convex), any interior local minimum is automatically global.
Problem families – discrete (integer, combinatorial), continuous (constrained/unconstrained), multi‑objective, and global (multimodal).
Optimality conditions – first‑order (gradient/subgradient = 0, KKT for constraints) and second‑order (Hessian definiteness).
📌 Must Remember
Standard forms: \(\min{x\in A} f(x)\) or \(\max{x\in A} f(x)\).
Convex ⇒ local = global (interior points).
KKT conditions are necessary for optimality in problems with equality/inequality constraints (under regularity).
Fermat’s theorem: stationary point ⇔ \(\nabla f(x)=0\) for unconstrained problems.
Extreme Value Theorem: a continuous function on a compact set attains its min and max.
Simplex algorithm solves linear programs exactly in a finite number of pivots.
Newton’s method gives quadratic convergence when the Hessian is positive‑definite.
BFGS and other quasi‑Newton methods achieve super‑linear convergence without forming the full Hessian.
Pareto optimality: no other feasible point improves one objective without worsening another.
🔄 Key Processes
Formulating an Optimization Problem
Identify decision variables \(x\).
Write objective \(f(x)\) (min or max).
List constraints (equalities/inequalities) that define \(A\).
Checking Convexity
Verify \(f\) is convex (second derivative \(\ge 0\) or Hessian PSD).
Verify feasible set \(A\) is convex (linear/convex constraints).
Applying First‑Order Optimality
Compute gradient \(\nabla f(x)\).
Set \(\nabla f(x)=0\) for unconstrained or construct Lagrangian \(L(x,\lambda,\mu)\) for constrained.
Solve KKT system.
Second‑Order Test
Form Hessian \(H = \nabla^2 f(x)\).
Positive‑definite ⇒ local min; negative‑definite ⇒ local max; indefinite ⇒ saddle.
Algorithm Selection Workflow
Is the problem linear? → Simplex or interior‑point LP solver.
Convex quadratic? → Interior‑point QP or active‑set methods.
General convex? → Gradient‑based (proximal, ADMM) or interior‑point.
Non‑convex/discrete? → Heuristics (GA, SA, DE) + local refinement.
🔍 Key Comparisons
Linear Programming vs. Integer Programming
LP: variables continuous, feasible region polyhedron, solved exactly.
IP: some/all variables integer → feasible set non‑convex, NP‑hard, often solved with branch‑and‑bound.
Gradient Descent vs. Newton’s Method
GD: uses \(-\nabla f\) only, linear convergence, cheap per iteration.
Newton: uses \(-H^{-1}\nabla f\), quadratic convergence, expensive Hessian inversion.
Local Search (e.g., hill‑climbing) vs. Global Heuristics (e.g., Simulated Annealing)
Local: fast, may get trapped in local minima.
Global: stochastic moves allow escape, slower but better for multimodal problems.
KKT Conditions vs. Lagrange Multipliers
Lagrange multipliers: equality constraints only.
KKT: extends to inequality constraints with complementary slackness.
⚠️ Common Misunderstandings
“Convex ⇒ easy” – convex problems are polynomial‑time solvable, but large‑scale instances still need careful algorithm choice.
“Gradient = 0 guarantees a minimum” – could be a maximum or saddle; second‑order test required.
“Global optimum = best solution found by any heuristic” – heuristics provide no guarantee; only deterministic global methods do.
“Feasibility implies optimality” – a feasible point may be far from optimal; must still satisfy optimality conditions.
🧠 Mental Models / Intuition
“Landscape view” – imagine \(f(x)\) as a terrain; gradients point uphill, negatives point downhill. Convex terrain has a single basin (global minimum).
“Cone of feasibility” – for convex constraints, the feasible region behaves like a cone; any line segment between feasible points stays feasible.
“Pareto frontier as a skyline” – each point on the skyline cannot be dominated; moving along the frontier trades one objective for another.
🚩 Exceptions & Edge Cases
Non‑convex but locally convex – interior local minima may still be global if the region is isolated.
Boundary optima – KKT first‑order conditions may involve active constraints; gradient need not be zero in the full space.
Degenerate Hessian – positive semidefinite (not definite) gives a flat direction; additional criteria needed.
Discrete feasible set – concepts of gradient/Hessian lose meaning; rely on combinatorial algorithms or relaxations.
📍 When to Use Which
Linear objective & linear constraints → Simplex or interior‑point LP solver.
Quadratic objective, linear constraints → Quadratic programming (active‑set or interior‑point).
Convex but non‑quadratic → Gradient‑based (projected gradient, proximal) or interior‑point methods.
Non‑convex continuous → Start from multiple random seeds with a local optimizer (BFGS, Newton) + global heuristic (GA, SA) for exploration.
Integer/ combinatorial → Branch‑and‑bound, cutting planes, or problem‑specific combinatorial algorithms.
Multi‑objective → Compute Pareto set via weighted‑sum, ε‑constraint, or evolutionary multi‑objective algorithms (NSGA‑II).
👀 Patterns to Recognize
“All constraints linear → convex polyhedron” → LP or IP depending on variable type.
“Quadratic term only in objective, constraints linear → convex QP if Hessian PSD.
“Presence of inequality constraints + stationarity → check KKT complementary slackness.
“Multiple local minima + rugged landscape → likely multimodal → need global/metaheuristic.
“Objective separable into sum of terms → coordinate descent or ADMM may be efficient.
🗂️ Exam Traps
Confusing local with global optimum – exam may ask which statement is always true; only convex interior optima are guaranteed global.
Misapplying KKT without checking regularity – complementary slackness fails if constraint qualifications are violated.
Assuming gradient descent always converges quickly – step‑size choice critical; too large → divergence, too small → extremely slow.
Choosing Simplex for a non‑linear problem – Simplex solves linear programs only; a non‑linear convex problem needs interior‑point or other convex solvers.
Treating integer variables as continuous – relaxation gives a bound but not a feasible integer solution; answer choices that ignore integrality are distractors.
---
Keep this sheet handy. Review each bullet before the exam, and practice identifying which bullet applies to a given problem statement.
or
Or, immediately create your own study flashcards:
Upload a PDF.
Master Study Materials.
Master Study Materials.
Start learning in seconds
Drop your PDFs here or
or