Mathematical optimization - Fundamentals of Optimization
Learn the core concepts of optimization, the necessary and sufficient conditions for optimality, and the main algorithms for solving both local and global problems.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
Into which two main branches is the field of optimization divided?
1 of 24
Summary
Introduction to Mathematical Optimization
What is Mathematical Optimization?
Mathematical optimization is the study of finding the best solution from a set of alternatives. More formally, it's about selecting candidate values for some variables that either minimize or maximize an objective function while satisfying constraints.
You encounter optimization problems everywhere: minimizing cost in business, maximizing profit in economics, finding the fastest route in navigation, or training neural networks in machine learning. The field applies across computer science, engineering, operations research, economics, and essentially any quantitative discipline that involves decision-making.
The Basic Setup: Optimization Problem Formulation
Every optimization problem has two essential components:
The objective function $f$ is a mathematical function that assigns a real number to each candidate solution. This number represents how "good" that solution is.
The feasible set $A$ (also called the search space or choice set) is the collection of all candidate solutions we're allowed to choose from. These candidates are called feasible solutions.
With these pieces, we write an optimization problem in one of two standard forms:
Minimization: $\displaystyle \min{x \in A} f(x)$
Maximization: $\displaystyle \max{x \in A} f(x)$
The value that achieves the minimum (or maximum) objective is called an optimal solution or optimizer.
Terminology Matters
Different fields use different names for the objective function depending on context. You might see it called a:
Loss function (machine learning, statistics)
Cost function (economics, operations research)
Utility function (economics)
Fitness function (evolutionary computation)
Energy function (physics)
They all mean the same thing: the function we're trying to minimize or maximize.
Local vs. Global Optima: A Critical Distinction
This is perhaps the most important concept in optimization, so let's be very careful here.
A local minimum at $x^{}$ means $f(x^{}) \le f(x)$ for all $x$ near $x^{}$ (in some neighborhood around it). In other words, you can't improve the objective value by taking small steps away from $x^{}$.
A global minimum at $x^{}$ means $f(x^{}) \le f(x)$ for every feasible solution $x \in A$. It's the absolute best solution across the entire feasible set.
Why is this distinction so important? Many algorithms for optimization only guarantee finding a local minimum, not a global one. For practical problems, being stuck at a local minimum when a better global minimum exists is a serious problem.
A Visual Example
Consider the function shown in the image below, which has multiple local minima:
The darker regions show local minima—places where the function value is low compared to nearby points. But there's a global minimum somewhere in this landscape that beats all the others. A naive algorithm starting from the wrong location might find a local minimum and incorrectly think it's the best solution.
When Can We Guarantee Finding the Global Minimum?
Here's good news: if the objective function is convex, then any local minimum is also a global minimum. This makes convex optimization problems much easier to solve—we don't have to worry about getting trapped at a bad local solution.
However, many real-world problems are non-convex. They may have multiple local minima, and finding the global one becomes much harder. This is why optimization researchers study specialized techniques for global optimization.
Theoretical Foundations: When Do Optima Exist?
The Extreme Value Theorem
Before we even try to find an optimum, we should know whether one exists!
Weierstrass's Extreme Value Theorem tells us: if the objective function $f$ is continuous and the feasible set $A$ is compact (closed and bounded in Euclidean space), then $f$ attains both a maximum and a minimum on $A$.
This is reassuring—it tells us that for well-behaved problems on bounded domains, we're guaranteed that an optimal solution exists somewhere.
Necessary Conditions for Optimality: Finding Candidates
Once we know an optimum exists, how do we find it? We start by identifying candidates—places where an optimum might occur.
Fermat's Theorem and First-Order Conditions
Fermat's Theorem states: if $x^{}$ is a local minimum of an unconstrained problem and $f$ is differentiable at $x^{}$, then the gradient must be zero:
$$\nabla f(x^{}) = 0$$
Points where the gradient is zero are called stationary points. They're necessary candidates for optima in the interior of the feasible set.
Critical points occur in three situations:
Where the gradient equals zero
Where the gradient is undefined
On the boundary of the feasible set
You cannot ignore boundaries—an optimum can occur there even if the gradient is nonzero!
Constrained Optimization: Lagrange Multipliers
In constrained problems, we can't simply set the gradient to zero. Instead, we use Lagrange multipliers.
For a problem of the form: $$\min{x} f(x) \quad \text{subject to} \quad g(x) = 0$$
We form the Lagrangian: $$L(x, \lambda) = f(x) + \lambda g(x)$$
where $\lambda$ is the Lagrange multiplier. At an interior optimum: $$\nablax L(x^{}, \lambda^{}) = 0$$
The multiplier $\lambda^{}$ has a beautiful interpretation: it tells you how much the optimal objective value would improve if the constraint were relaxed slightly.
Inequality Constraints: Karush–Kuhn–Tucker Conditions
Real problems often have inequality constraints like $g(x) \le 0$. The Karush–Kuhn–Tucker (KKT) conditions generalize Lagrange multipliers to handle these:
At an optimum $x^{}$, there exist multipliers $\lambda^{}, \mu^{}$ satisfying:
Stationarity: $\nabla f(x^{}) + \lambda^{} \nabla g(x^{}) + \mu^{} \nabla h(x^{}) = 0$
Dual feasibility: $\mu^{} \ge 0$
Complementary slackness: $\mu^{} h(x^{}) = 0$ (if a constraint is inactive at optimum, its multiplier is zero)
These conditions are necessary for optimality. Every interior local minimum satisfies them.
Sufficient Conditions for Optimality: Confirming You Have an Optimum
Finding a stationary point isn't enough—you need to verify it's actually an optimum. That's where second-order conditions come in.
The Hessian Matrix
The Hessian matrix $H$ contains all second partial derivatives:
$$H{ij} = \frac{\partial^2 f}{\partial xi \partial xj}$$
The Hessian's properties tell you the nature of a critical point:
Positive-definite Hessian: You have a local minimum
Negative-definite Hessian: You have a local maximum
Indefinite Hessian: You have a saddle point (neither min nor max)
For a critical point to be a proven optimum, it must satisfy both first-order conditions (gradient is zero) and appropriate second-order conditions (Hessian has the right sign).
Constrained Problems
For constrained problems, you don't examine the Hessian of $f$ directly. Instead, you examine the bordered Hessian, which incorporates information about the constraints. This is more involved, but the principle is the same: second-order conditions confirm optimality.
Convexity: The Game-Changer
One critical insight appears repeatedly in optimization theory: if the objective function is convex, then any local minimum is guaranteed to be a global minimum.
Why? Because a convex function has no "valleys within valleys." Once you find a point where the gradient is zero, you know you've found the absolute best solution.
This is why convex optimization is considered solved (in a theoretical sense), while non-convex optimization remains an active research area.
<extrainfo>
Sensitivity Analysis and the Envelope Theorem
The envelope theorem describes how the optimal objective value changes when the underlying parameters of a problem change. This process is called comparative statics in economics.
For example, if a constraint becomes slightly less restrictive, how much does the optimal solution improve? The Lagrange multiplier tells you exactly this. This is valuable for understanding how robust your solution is to changes in the problem setup.
</extrainfo>
Moving from Theory to Practice: Numerical Algorithms
In practice, we rarely solve optimization problems by hand using calculus. Instead, we use algorithms that iteratively improve candidate solutions.
Line-Search Methods
These methods pick a promising direction and then optimize along that line. At each iteration:
Choose a direction $d$ (often the gradient direction)
Find the best step size $\alpha$ along that direction
Move to the new point $x{\text{new}} = x + \alpha d$
Repeat
This is simple and intuitive, but step 1 (choosing the direction) requires care.
Trust-Region Methods
Instead of picking a direction and optimizing along it, trust-region methods take a different approach:
Build a simple model of the objective function (usually quadratic)
Define a "trust region"—a nearby area where the model is believed accurate
Optimize the model within this region
Move to the best point found
Update the trust region based on how accurate the model was
If the model was accurate, expand the trust region; if not, shrink it. This adaptive approach can be more robust than line search.
<extrainfo>
Advanced Algorithms: BFGS and Beyond
The Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is a sophisticated local optimizer that uses approximate Hessian information to choose smart directions. It's much faster than simple gradient descent but requires more computation per step.
For very large-scale problems, stochastic gradient descent (using only noisy gradient estimates) often works better than exact methods.
A common practical strategy: if you're unsure whether you've found the global optimum, run a local optimizer from multiple random starting points and compare results. The best solution found is likely to be close to the global optimum.
</extrainfo>
Flashcards
Into which two main branches is the field of optimization divided?
Discrete optimization
Continuous optimization
In mathematical terms, how is the goal of an optimization problem defined?
To maximize or minimize a real-valued function by choosing input values from an allowed set.
What is the role of an objective function $f$ in an optimization problem?
It maps each candidate solution to a real number.
What does the set $A$ represent in the formulation of an optimization problem?
The search space (or choice set) containing all candidate solutions.
How is an optimal solution defined in the context of the objective value?
A feasible solution that achieves the smallest (for minimization) or largest (for maximization) objective value.
What condition defines a local minimum $x^{}$ in relation to its neighborhood?
$f(x^{}) \le f(x)$ for all $x$ in a neighborhood of $x^{}$.
What condition defines a global minimum $x^{}$ across the entire search space?
$f(x^{}) \le f(x)$ for every feasible $x \in A$.
Under what condition is any interior local minimum guaranteed to be a global minimum?
If the objective function is convex.
According to Karl Weierstrass, what conditions guarantee that a function attains its maximum and minimum?
A continuous real-valued function on a compact set.
According to Fermat's theorem, where do unconstrained optima occur?
At stationary points where the gradient of the objective function is zero.
Where can critical points occur in an optimization problem?
Where the gradient is zero
Where the gradient is undefined
On the boundary of the feasible set
Which method is used to solve problems with specifically equality constraints?
The method of Lagrange multipliers.
What conditions are satisfied by problems involving both equality and inequality constraints?
The Karush–Kuhn–Tucker (KKT) conditions.
What technique transforms constrained problems into easier forms to provide approximate solutions?
Lagrangian relaxation.
What mathematical structure is examined to distinguish between minima, maxima, and saddle points?
The Hessian matrix of second derivatives.
What specific matrix is used to examine second-order conditions in constrained problems?
The bordered Hessian.
How does the Hessian matrix indicate a local minimum?
The Hessian is positive-definite.
How does the Hessian matrix indicate a local maximum?
The Hessian is negative-definite.
What does an indefinite Hessian matrix signify at a stationary point?
A saddle point.
What does the envelope theorem describe regarding optimal values?
How an optimal value changes when underlying parameters vary.
What is the study of changes in optimal values due to parameter variation called?
Comparative statics.
How do line-search methods operate at each iteration?
They optimize the objective along a single direction.
How do trust-region methods restrict the optimization step?
They restrict each step to a region where a model of the objective is trusted.
What is a common strategy for achieving global optimization using local methods like BFGS?
Running a local optimizer from multiple starting points.
Quiz
Mathematical optimization - Fundamentals of Optimization Quiz Question 1: Into which two main categories is the field of mathematical optimization divided?
- Discrete optimization and continuous optimization (correct)
- Linear and nonlinear optimization
- Stochastic and deterministic optimization
- Convex and non‑convex optimization
Mathematical optimization - Fundamentals of Optimization Quiz Question 2: What does the objective function $f$ do in an optimization problem?
- Maps each candidate solution to a real number (correct)
- Defines the feasible region of the problem
- Specifies the algorithm used to solve the problem
- Identifies the constraints of the problem
Mathematical optimization - Fundamentals of Optimization Quiz Question 3: What term is used for elements of the set $A$ in an optimization problem?
- Feasible solutions (correct)
- Objective values
- Constraint multipliers
- Search directions
Mathematical optimization - Fundamentals of Optimization Quiz Question 4: How is a minimization problem typically expressed?
- $\displaystyle \min_{x\in A} f(x)$ (correct)
- $\displaystyle \max_{x\in A} f(x)$
- $\displaystyle \int_{A} f(x)\,dx$
- $\displaystyle \sum_{x\in A} f(x)$
Mathematical optimization - Fundamentals of Optimization Quiz Question 5: What is the term for a feasible solution that attains the best objective value?
- Optimal solution (correct)
- Suboptimal solution
- Constraint violation
- Search direction
Mathematical optimization - Fundamentals of Optimization Quiz Question 6: For a convex objective function, what can be said about any interior local minimum?
- It is also a global minimum (correct)
- It may not be a global minimum
- It is always a saddle point
- It violates the first‑order conditions
Mathematical optimization - Fundamentals of Optimization Quiz Question 7: Which theorem guarantees that a continuous real‑valued function on a compact set attains its maximum and minimum?
- The extreme value theorem (Weierstrass) (correct)
- The mean value theorem
- The intermediate value theorem
- The fundamental theorem of calculus
Mathematical optimization - Fundamentals of Optimization Quiz Question 8: What do first‑order conditions require for interior optimality?
- The gradient (or subgradient) must be zero (correct)
- The Hessian must be positive definite
- The objective value must be minimal among all feasible points
- The constraints must be linear
Mathematical optimization - Fundamentals of Optimization Quiz Question 9: What matrix is examined in second‑order conditions to differentiate minima, maxima, and saddle points?
- The Hessian matrix (correct)
- The Jacobian matrix
- The covariance matrix
- The incidence matrix
Mathematical optimization - Fundamentals of Optimization Quiz Question 10: What theorem describes how an optimal value changes when underlying parameters vary?
- The envelope theorem (correct)
- The fixed‑point theorem
- The divergence theorem
- The central limit theorem
Mathematical optimization - Fundamentals of Optimization Quiz Question 11: What certifies a local minimum for convex functions in terms of subgradients?
- A zero subgradient (correct)
- A positive subgradient
- An undefined subgradient
- A non‑zero gradient
Mathematical optimization - Fundamentals of Optimization Quiz Question 12: What does an indefinite Hessian indicate about a critical point?
- It is a saddle point (correct)
- It is a strict local minimum
- It is a strict local maximum
- It violates feasibility
Mathematical optimization - Fundamentals of Optimization Quiz Question 13: What technique transforms constrained problems into easier forms and can provide approximate solutions?
- Lagrangian relaxation (correct)
- Branch and bound
- Genetic algorithms
- Simulated annealing
Mathematical optimization - Fundamentals of Optimization Quiz Question 14: What common strategy is used to improve global optimization performance involving local methods?
- Run a local optimizer from multiple starting points (correct)
- Use only a single deterministic global optimizer
- Ignore local minima and search only for global maxima
- Apply random perturbations without local refinement
Into which two main categories is the field of mathematical optimization divided?
1 of 14
Key Concepts
Optimization Concepts
Mathematical optimization
Convex optimization
Global optimization
Optimality Conditions
Lagrange multipliers
Karush–Kuhn–Tucker conditions
Extreme value theorem
Optimization Techniques
Hessian matrix
BFGS algorithm
Trust‑region method
Envelope theorem
Definitions
Mathematical optimization
The discipline of selecting the best element from a set of alternatives according to a specified objective function.
Convex optimization
A subclass of optimization problems where the objective function is convex and any local minimum is also a global minimum.
Global optimization
The study of algorithms that guarantee convergence to the absolute best solution of a possibly non‑convex problem.
Lagrange multipliers
A method for finding extrema of a function subject to equality constraints by introducing auxiliary variables.
Karush–Kuhn–Tucker conditions
Necessary (and under convexity, sufficient) conditions for optimality in problems with equality and inequality constraints.
Extreme value theorem
A result stating that a continuous real‑valued function on a compact set attains its maximum and minimum values.
Hessian matrix
The square matrix of second‑order partial derivatives of a scalar function, used to assess curvature and optimality.
BFGS algorithm
A quasi‑Newton iterative method for unconstrained optimization that approximates the Hessian to achieve rapid convergence.
Trust‑region method
An optimization technique that restricts each iteration to a region where a model of the objective function is trusted to be accurate.
Envelope theorem
A principle describing how the optimal value of an objective function changes with variations in underlying parameters.