RemNote Community
Community

Mathematical optimization - Fundamentals of Optimization

Learn the core concepts of optimization, the necessary and sufficient conditions for optimality, and the main algorithms for solving both local and global problems.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

Into which two main branches is the field of optimization divided?
1 of 24

Summary

Introduction to Mathematical Optimization What is Mathematical Optimization? Mathematical optimization is the study of finding the best solution from a set of alternatives. More formally, it's about selecting candidate values for some variables that either minimize or maximize an objective function while satisfying constraints. You encounter optimization problems everywhere: minimizing cost in business, maximizing profit in economics, finding the fastest route in navigation, or training neural networks in machine learning. The field applies across computer science, engineering, operations research, economics, and essentially any quantitative discipline that involves decision-making. The Basic Setup: Optimization Problem Formulation Every optimization problem has two essential components: The objective function $f$ is a mathematical function that assigns a real number to each candidate solution. This number represents how "good" that solution is. The feasible set $A$ (also called the search space or choice set) is the collection of all candidate solutions we're allowed to choose from. These candidates are called feasible solutions. With these pieces, we write an optimization problem in one of two standard forms: Minimization: $\displaystyle \min{x \in A} f(x)$ Maximization: $\displaystyle \max{x \in A} f(x)$ The value that achieves the minimum (or maximum) objective is called an optimal solution or optimizer. Terminology Matters Different fields use different names for the objective function depending on context. You might see it called a: Loss function (machine learning, statistics) Cost function (economics, operations research) Utility function (economics) Fitness function (evolutionary computation) Energy function (physics) They all mean the same thing: the function we're trying to minimize or maximize. Local vs. Global Optima: A Critical Distinction This is perhaps the most important concept in optimization, so let's be very careful here. A local minimum at $x^{}$ means $f(x^{}) \le f(x)$ for all $x$ near $x^{}$ (in some neighborhood around it). In other words, you can't improve the objective value by taking small steps away from $x^{}$. A global minimum at $x^{}$ means $f(x^{}) \le f(x)$ for every feasible solution $x \in A$. It's the absolute best solution across the entire feasible set. Why is this distinction so important? Many algorithms for optimization only guarantee finding a local minimum, not a global one. For practical problems, being stuck at a local minimum when a better global minimum exists is a serious problem. A Visual Example Consider the function shown in the image below, which has multiple local minima: The darker regions show local minima—places where the function value is low compared to nearby points. But there's a global minimum somewhere in this landscape that beats all the others. A naive algorithm starting from the wrong location might find a local minimum and incorrectly think it's the best solution. When Can We Guarantee Finding the Global Minimum? Here's good news: if the objective function is convex, then any local minimum is also a global minimum. This makes convex optimization problems much easier to solve—we don't have to worry about getting trapped at a bad local solution. However, many real-world problems are non-convex. They may have multiple local minima, and finding the global one becomes much harder. This is why optimization researchers study specialized techniques for global optimization. Theoretical Foundations: When Do Optima Exist? The Extreme Value Theorem Before we even try to find an optimum, we should know whether one exists! Weierstrass's Extreme Value Theorem tells us: if the objective function $f$ is continuous and the feasible set $A$ is compact (closed and bounded in Euclidean space), then $f$ attains both a maximum and a minimum on $A$. This is reassuring—it tells us that for well-behaved problems on bounded domains, we're guaranteed that an optimal solution exists somewhere. Necessary Conditions for Optimality: Finding Candidates Once we know an optimum exists, how do we find it? We start by identifying candidates—places where an optimum might occur. Fermat's Theorem and First-Order Conditions Fermat's Theorem states: if $x^{}$ is a local minimum of an unconstrained problem and $f$ is differentiable at $x^{}$, then the gradient must be zero: $$\nabla f(x^{}) = 0$$ Points where the gradient is zero are called stationary points. They're necessary candidates for optima in the interior of the feasible set. Critical points occur in three situations: Where the gradient equals zero Where the gradient is undefined On the boundary of the feasible set You cannot ignore boundaries—an optimum can occur there even if the gradient is nonzero! Constrained Optimization: Lagrange Multipliers In constrained problems, we can't simply set the gradient to zero. Instead, we use Lagrange multipliers. For a problem of the form: $$\min{x} f(x) \quad \text{subject to} \quad g(x) = 0$$ We form the Lagrangian: $$L(x, \lambda) = f(x) + \lambda g(x)$$ where $\lambda$ is the Lagrange multiplier. At an interior optimum: $$\nablax L(x^{}, \lambda^{}) = 0$$ The multiplier $\lambda^{}$ has a beautiful interpretation: it tells you how much the optimal objective value would improve if the constraint were relaxed slightly. Inequality Constraints: Karush–Kuhn–Tucker Conditions Real problems often have inequality constraints like $g(x) \le 0$. The Karush–Kuhn–Tucker (KKT) conditions generalize Lagrange multipliers to handle these: At an optimum $x^{}$, there exist multipliers $\lambda^{}, \mu^{}$ satisfying: Stationarity: $\nabla f(x^{}) + \lambda^{} \nabla g(x^{}) + \mu^{} \nabla h(x^{}) = 0$ Dual feasibility: $\mu^{} \ge 0$ Complementary slackness: $\mu^{} h(x^{}) = 0$ (if a constraint is inactive at optimum, its multiplier is zero) These conditions are necessary for optimality. Every interior local minimum satisfies them. Sufficient Conditions for Optimality: Confirming You Have an Optimum Finding a stationary point isn't enough—you need to verify it's actually an optimum. That's where second-order conditions come in. The Hessian Matrix The Hessian matrix $H$ contains all second partial derivatives: $$H{ij} = \frac{\partial^2 f}{\partial xi \partial xj}$$ The Hessian's properties tell you the nature of a critical point: Positive-definite Hessian: You have a local minimum Negative-definite Hessian: You have a local maximum Indefinite Hessian: You have a saddle point (neither min nor max) For a critical point to be a proven optimum, it must satisfy both first-order conditions (gradient is zero) and appropriate second-order conditions (Hessian has the right sign). Constrained Problems For constrained problems, you don't examine the Hessian of $f$ directly. Instead, you examine the bordered Hessian, which incorporates information about the constraints. This is more involved, but the principle is the same: second-order conditions confirm optimality. Convexity: The Game-Changer One critical insight appears repeatedly in optimization theory: if the objective function is convex, then any local minimum is guaranteed to be a global minimum. Why? Because a convex function has no "valleys within valleys." Once you find a point where the gradient is zero, you know you've found the absolute best solution. This is why convex optimization is considered solved (in a theoretical sense), while non-convex optimization remains an active research area. <extrainfo> Sensitivity Analysis and the Envelope Theorem The envelope theorem describes how the optimal objective value changes when the underlying parameters of a problem change. This process is called comparative statics in economics. For example, if a constraint becomes slightly less restrictive, how much does the optimal solution improve? The Lagrange multiplier tells you exactly this. This is valuable for understanding how robust your solution is to changes in the problem setup. </extrainfo> Moving from Theory to Practice: Numerical Algorithms In practice, we rarely solve optimization problems by hand using calculus. Instead, we use algorithms that iteratively improve candidate solutions. Line-Search Methods These methods pick a promising direction and then optimize along that line. At each iteration: Choose a direction $d$ (often the gradient direction) Find the best step size $\alpha$ along that direction Move to the new point $x{\text{new}} = x + \alpha d$ Repeat This is simple and intuitive, but step 1 (choosing the direction) requires care. Trust-Region Methods Instead of picking a direction and optimizing along it, trust-region methods take a different approach: Build a simple model of the objective function (usually quadratic) Define a "trust region"—a nearby area where the model is believed accurate Optimize the model within this region Move to the best point found Update the trust region based on how accurate the model was If the model was accurate, expand the trust region; if not, shrink it. This adaptive approach can be more robust than line search. <extrainfo> Advanced Algorithms: BFGS and Beyond The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is a sophisticated local optimizer that uses approximate Hessian information to choose smart directions. It's much faster than simple gradient descent but requires more computation per step. For very large-scale problems, stochastic gradient descent (using only noisy gradient estimates) often works better than exact methods. A common practical strategy: if you're unsure whether you've found the global optimum, run a local optimizer from multiple random starting points and compare results. The best solution found is likely to be close to the global optimum. </extrainfo>
Flashcards
Into which two main branches is the field of optimization divided?
Discrete optimization Continuous optimization
In mathematical terms, how is the goal of an optimization problem defined?
To maximize or minimize a real-valued function by choosing input values from an allowed set.
What is the role of an objective function $f$ in an optimization problem?
It maps each candidate solution to a real number.
What does the set $A$ represent in the formulation of an optimization problem?
The search space (or choice set) containing all candidate solutions.
How is an optimal solution defined in the context of the objective value?
A feasible solution that achieves the smallest (for minimization) or largest (for maximization) objective value.
What condition defines a local minimum $x^{}$ in relation to its neighborhood?
$f(x^{}) \le f(x)$ for all $x$ in a neighborhood of $x^{}$.
What condition defines a global minimum $x^{}$ across the entire search space?
$f(x^{}) \le f(x)$ for every feasible $x \in A$.
Under what condition is any interior local minimum guaranteed to be a global minimum?
If the objective function is convex.
According to Karl Weierstrass, what conditions guarantee that a function attains its maximum and minimum?
A continuous real-valued function on a compact set.
According to Fermat's theorem, where do unconstrained optima occur?
At stationary points where the gradient of the objective function is zero.
Where can critical points occur in an optimization problem?
Where the gradient is zero Where the gradient is undefined On the boundary of the feasible set
Which method is used to solve problems with specifically equality constraints?
The method of Lagrange multipliers.
What conditions are satisfied by problems involving both equality and inequality constraints?
The Karush–Kuhn–Tucker (KKT) conditions.
What technique transforms constrained problems into easier forms to provide approximate solutions?
Lagrangian relaxation.
What mathematical structure is examined to distinguish between minima, maxima, and saddle points?
The Hessian matrix of second derivatives.
What specific matrix is used to examine second-order conditions in constrained problems?
The bordered Hessian.
How does the Hessian matrix indicate a local minimum?
The Hessian is positive-definite.
How does the Hessian matrix indicate a local maximum?
The Hessian is negative-definite.
What does an indefinite Hessian matrix signify at a stationary point?
A saddle point.
What does the envelope theorem describe regarding optimal values?
How an optimal value changes when underlying parameters vary.
What is the study of changes in optimal values due to parameter variation called?
Comparative statics.
How do line-search methods operate at each iteration?
They optimize the objective along a single direction.
How do trust-region methods restrict the optimization step?
They restrict each step to a region where a model of the objective is trusted.
What is a common strategy for achieving global optimization using local methods like BFGS?
Running a local optimizer from multiple starting points.

Quiz

Into which two main categories is the field of mathematical optimization divided?
1 of 14
Key Concepts
Optimization Concepts
Mathematical optimization
Convex optimization
Global optimization
Optimality Conditions
Lagrange multipliers
Karush–Kuhn–Tucker conditions
Extreme value theorem
Optimization Techniques
Hessian matrix
BFGS algorithm
Trust‑region method
Envelope theorem