Introduction to Mathematical Optimization
Understand the basics of mathematical optimization, the key problem classes (linear, integer, nonlinear), and how to select appropriate solution algorithms.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz
Quick Practice
What are decision variables in the context of an optimization model?
1 of 18
Summary
Fundamentals of Mathematical Optimization
What is Optimization?
Mathematical optimization is the study of how to make the best possible choice from available alternatives. In practical terms, this means selecting a decision that minimizes (or maximizes) some objective—like reducing cost, increasing profit, or minimizing distance traveled—while respecting whatever constraints apply to the problem.
Every optimization problem has three essential components: what you can control (decision variables), what you want to achieve (objective function), and what limits your choices (constraints). Understanding these three pieces is the key to formulating and solving any optimization problem.
Decision Variables and the Decision Vector
The foundation of any optimization problem lies in identifying the decision variables—the quantities you can control and adjust to achieve your goal.
For example, in a manufacturing problem, decision variables might be:
The number of units to produce of each product
How much of each resource to allocate
Once you've identified all the decision variables in a problem, you collect them into a single mathematical object called the decision vector, denoted as $x$. If there are $n$ decision variables, the decision vector is:
$$x = (x1, x2, \ldots, xn)$$
The decision vector serves as a compact way to represent all the choices you can make at once. Each specific choice of values for $x1, x2, \ldots, xn$ corresponds to one particular decision.
The Objective Function
Once you've defined your decision variables, you need to express what you're trying to achieve mathematically. This is where the objective function comes in.
The objective function $f(x)$ is a mathematical formula that assigns a single numerical value to each decision vector $x$. This value might represent cost, profit, time, distance, or any other quantity relevant to your problem.
The fundamental goal of optimization is then to find the decision vector that produces either the minimum or maximum value of this objective function:
Minimization: $\min f(x)$ — used when you want to reduce cost, time, risk, or error
Maximization: $\max f(x)$ — used when you want to increase profit, efficiency, or satisfaction
For example, if you're routing a delivery vehicle, the objective function might be the total distance traveled, and you'd want to minimize it. If you're producing goods for sale, the objective function might be total profit, and you'd want to maximize it.
Constraints: Limiting Your Choices
In the real world, you rarely have complete freedom in your decisions. There are always constraints—limits imposed by resources, laws, physical reality, or business rules—that restrict which decisions are actually feasible.
Constraints are expressed mathematically in two forms:
Inequality constraints express upper or lower bounds: $$gi(x) \le 0 \quad \text{for } i = 1, \ldots, m$$
These might represent statements like "you cannot use more than 100 units of raw material" or "the product quality must exceed a minimum threshold."
Equality constraints require certain conditions to be met exactly: $$hj(x) = 0 \quad \text{for } j = 1, \ldots, p$$
These might represent statements like "supply must equal demand" or "the total weight must balance on both sides."
In practice, constraints arise from:
Resource limitations (budget, material, time available)
Logical requirements (you can't produce negative quantities)
Physical laws (heat equations, flow conservation)
Business rules (minimum service levels)
The Feasible Set
Now that you have constraints, you can define which decisions are actually allowed. The feasible set $\mathcal{X}$ is the collection of all decision vectors $x$ that satisfy every constraint in the problem.
Mathematically: $$\mathcal{X} = \{x : gi(x) \le 0 \text{ for all } i, \text{ and } hj(x) = 0 \text{ for all } j\}$$
The feasible set might be:
The entire $n$-dimensional space $\mathbb{R}^n$ (if there are no constraints)
A specific geometric region bounded by the constraints (the typical case)
Empty (if constraints contradict each other—meaning there's no valid solution)
Optimization is the process of finding the decision vector in $\mathcal{X}$ that gives the best value of $f(x)$.
The Main Classes of Optimization Problems
Different problem structures require different solution strategies. The three most important classes are linear programming, integer programming, and nonlinear programming. Recognizing which class a problem belongs to immediately suggests which algorithms will work best.
Linear Programming (LP)
A linear programming problem has both a linear objective function and linear constraints. "Linear" means the functions involve only sums of the variables multiplied by constants—no products, squares, or other nonlinear operations.
The standard form of an LP problem is:
$$\text{minimize } c1 x1 + c2 x2 + \cdots + cn xn$$ $$\text{subject to } a{ij} xj \le bi \quad \text{(and possibly some equality constraints)}$$
Why are LP problems special? Linear programs have remarkable properties:
The feasible region is a polyhedron (a geometric shape with flat faces in $n$ dimensions)
The optimal solution always occurs at a vertex (corner) of this polyhedron
When an LP has an optimal solution, it always has a global optimum—meaning there's no better solution anywhere else
These properties make LP problems convex, which guarantees that any local optimum is also the global optimum. This is why LP can be solved very efficiently.
Solving LPs: Two main algorithmic approaches dominate:
The Simplex Method moves from vertex to vertex of the feasible polyhedron, always improving the objective value, until reaching the optimal vertex
Interior-Point Methods move through the interior of the feasible region (rather than along edges), reaching the optimum in fewer iterations for large problems
Integer Programming (IP)
An integer programming problem is like a linear program, but with an additional requirement: some or all decision variables must take integer values (whole numbers like 1, 5, 100—not decimals).
Why add integrality constraints? In real problems, some quantities naturally must be integers:
The number of trucks to purchase (you can't buy 2.3 trucks)
Whether to open a facility (yes/no decision = 0 or 1)
The number of workers to hire
However, requiring integrality fundamentally changes the problem's difficulty. While LP problems can be solved in polynomial time, integer programs are generally NP-hard—meaning there's no known efficient algorithm that works for all cases.
Why is IP harder? The feasible region is no longer a simple polyhedron. Instead, it consists of isolated integer points scattered throughout space. The optimal solution might not be at a vertex of the relaxed LP solution, and the problem may have many local optima—solutions that are better than nearby alternatives but not globally best.
Solving IPs: Because of their difficulty, IP algorithms typically:
Use branch-and-bound: systematically divide the problem into smaller sub-problems, solving LP relaxations to bound how good solutions can be
Apply cutting-plane methods: add extra linear constraints to tighten the relaxation and guide the search
Often resort to heuristics that find good (but not necessarily optimal) solutions quickly for very large problems
Nonlinear Programming (NLP)
A nonlinear programming problem contains at least one nonlinear function in either the objective or the constraints. Nonlinear functions include squares, products, exponentials, and other complex relationships.
The critical challenge with NLP is that the structure is much more complex. Unlike LP, the feasible region may not have vertices, and unlike the convexity guarantee in LP, an NLP might have many local optima—points where the function is better than nearby neighbors but far from the global best.
Convex vs. Non-Convex NLPs:
A convex NLP has a special structure where any local optimum is guaranteed to be the global optimum. These can be solved reliably.
A non-convex NLP may have many local optima with different objective values. Finding the true global best is much harder.
Solving NLPs: Because the structure varies widely, solution methods focus on local optimization using gradient information:
Gradient descent follows the steepest downward direction to reduce the objective value
Newton's method uses second-order information (the Hessian matrix, which describes the curvature) to take more intelligent steps that account for how the function curves
Sequential quadratic programming repeatedly solves quadratic approximations of the problem, refining the solution iteratively
These methods are effective for finding local optima but don't guarantee finding the global best in non-convex problems.
Comparing the Problem Classes
Understanding the characteristics of each class helps you select the right algorithm:
Linear Programming is the easiest: the feasible region is geometrically simple (a polyhedron), and any local optimum is global. Problems with thousands or millions of variables can be solved reliably in seconds.
Integer Programming is harder: the integrality requirement breaks the nice geometric structure, and problems are NP-hard. Exact algorithms may require examining many sub-problems, though modern solvers use sophisticated techniques to prune the search tree effectively.
Nonlinear Programming varies widely: convex NLPs are tractable and nearly as reliable as LPs, but general non-convex NLPs can be extremely difficult. Local optimization methods work well from good starting points, but there's no guarantee of finding the global optimum.
Solution Techniques and Algorithms
Newton's Method for Nonlinear Problems
For smooth nonlinear problems, Newton's method is a classical approach that uses second-order information about the objective function.
Newton's method works by examining the Hessian matrix $H(x)$, which measures how the function curves in each direction. Rather than just following the steepest downward direction (as in basic gradient descent), Newton's method takes curvature-adjusted steps that account for the landscape's shape:
$$x{\text{new}} = x - H(x)^{-1} \nabla f(x)$$
This often leads to faster convergence than gradient descent, especially near a solution, because it's correcting for the function's curvature.
However, Newton's method requires computing and inverting the Hessian, which is computationally expensive for large problems.
Sequential Quadratic Programming (SQP) for Nonlinear Problems
Sequential quadratic programming is a practical approach that divides a difficult nonlinear problem into a series of smaller, simpler problems.
At each iteration, SQP:
Builds a quadratic approximation of the objective function around the current solution
Solves this quadratic subproblem (which is easier than the original)
Takes a step in the direction suggested by the solution
Repeats at the new point
By solving a sequence of quadratic problems instead of the original nonlinear one, SQP often converges faster and more reliably than simpler gradient-based methods, especially when constraints are present.
<extrainfo>
Real-World Applications
Fitting Statistical Models to Data
One important application of optimization is statistical model fitting. When you have data and want to estimate a model's parameters, you typically formulate this as an optimization problem:
Decision variables: the model's parameters (like slope and intercept in linear regression)
Objective function: a measure of prediction error, such as the sum of squared residuals (the sum of squared differences between predicted and actual values)
Optimization goal: minimize the prediction error
This transforms the fitting problem into: find the parameter values that minimize the model's error on the training data. Most statistical estimation—from simple linear regression to complex neural networks—is solved using optimization algorithms in the background.
</extrainfo>
Key Principles for Solving Optimization Problems
Matching Algorithm to Problem Class
The first step in solving any optimization problem should be identifying its class:
Is it linear? If yes, use LP algorithms (simplex or interior-point methods)
Does it require integer variables? If yes, use IP algorithms (branch-and-bound, cutting planes)
Is it nonlinear? If yes, use gradient-based methods appropriate for NLP
The problem's structure directly determines which algorithms will be effective. An algorithm designed for linear problems will fail on nonlinear ones, and methods suited for non-convex problems may be overkill (and inefficient) for convex ones.
Why Problem Class Matters
Recognizing the problem class is crucial because:
It predicts solution difficulty: Linear problems are easy, integer problems are hard, nonlinear depends on convexity
It suggests the right algorithm: Each class has specialized algorithms that exploit its structure
It guides expectations: LP gives global optima efficiently; IP may be NP-hard; general NLP may only find local optima
It informs problem reformulation: Sometimes reformulating as a different problem class makes it easier to solve
As you encounter optimization problems, always start by asking: "What class is this problem? What are its structure and properties? Which algorithm family is designed for this?"
Flashcards
What are decision variables in the context of an optimization model?
The quantities that can be controlled within the model.
What does the decision vector $x$ represent?
A single vector that collects all decision variables.
What is the role of the objective function $f(x)$?
It assigns a numerical value (such as cost, profit, or distance) to each decision vector to be minimized or maximized.
How are inequality constraints typically expressed mathematically?
$gi(x) \le 0$ for $i = 1, \dots, m$.
How are equality constraints typically expressed mathematically?
$hj(x) = 0$ for $j = 1, \dots, p$.
What do constraints represent in an optimization problem?
Limits such as resource capacities, logical requirements, or physical laws.
What is the feasible set $\mathcal{X}$?
The set containing all decision vectors that satisfy the given constraints.
What defines a Linear Programming (LP) problem?
It has a linear objective function and linear constraints.
Which common algorithms are used to solve Linear Programming problems efficiently?
Simplex method
Interior-point algorithms
How does the simplex method locate the optimal solution?
By traversing the vertices of the feasible polyhedron.
How do interior-point methods reach the optimum differently than the simplex method?
They move through the interior of the feasible region rather than following the vertices.
What is the defining requirement of Integer Programming (IP)?
Some or all decision variables must take integer values.
What is the typical computational complexity and landscape of Integer Programming problems?
They are generally NP-hard and may have many local optima.
What is the purpose of cutting-plane methods in Integer Programming?
To add linear constraints that tighten the IP relaxation.
What characterizes a Nonlinear Programming (NLP) problem?
It contains at least one nonlinear function in either the objective or the constraints.
What is the difference in solvability between convex and non-convex NLPs?
Convex NLPs have tractable global solutions, while non-convex NLPs may require heuristic approaches.
How does Newton’s method use the Hessian matrix in optimization?
To take curvature-adjusted steps toward a stationary point.
How does Sequential Quadratic Programming (SQP) approximate an NLP problem?
By solving a series of quadratic subproblems around the current iterate.
Quiz
Introduction to Mathematical Optimization Quiz Question 1: Which description correctly characterizes a linear programming (LP) problem?
- Both its objective function and all constraints are linear (correct)
- Only the objective function is linear; constraints may be nonlinear
- Only the constraints are linear; the objective may be nonlinear
- Neither the objective function nor the constraints need to be linear
Introduction to Mathematical Optimization Quiz Question 2: In Newton’s method for nonlinear programming, which mathematical object is used to adjust the search steps?
- The Hessian matrix (correct)
- The gradient vector only
- The simplex tableau
- Cutting‑plane constraints
Introduction to Mathematical Optimization Quiz Question 3: When fitting a statistical model to data, which error measure is commonly minimized?
- Sum of squared residuals (correct)
- Maximum likelihood
- Mean absolute error
- Coefficient of determination (R²)
Introduction to Mathematical Optimization Quiz Question 4: What is a defining characteristic of integer programming (IP) problems?
- Some or all decision variables must take integer values (correct)
- All constraints must be linear equalities
- The objective function must be quadratic
- Decision variables are restricted to binary values only
Introduction to Mathematical Optimization Quiz Question 5: In mathematical optimization, what are decision variables?
- Quantities that can be controlled in a model (correct)
- Parameters that must remain fixed
- Outputs produced by the model
- Constants defining the feasible region
Introduction to Mathematical Optimization Quiz Question 6: Which method systematically explores sub‑problems to solve integer programming problems to optimality?
- Branch‑and‑bound (correct)
- Simplex method
- Gradient descent
- Cutting‑plane method
Introduction to Mathematical Optimization Quiz Question 7: How are inequality constraints typically expressed in a mathematical optimization model?
- $g_i(x) \le 0\;$ for $i = 1,\dots,m$ (correct)
- $g_i(x) = 0\;$ for $i = 1,\dots,m$
- $h_j(x) \ge 0\;$ for $j = 1,\dots,p$
- $f(x) \ge 0\;$
Introduction to Mathematical Optimization Quiz Question 8: Which statement best describes a typical computational property of integer programming (IP) problems?
- They are generally NP‑hard and may have many local optima due to integrality restrictions. (correct)
- They always possess a unique global optimum that can be found in polynomial time.
- They can be solved efficiently by the simplex method because they are always convex.
- The feasible region is always a polyhedron, guaranteeing tractability.
Introduction to Mathematical Optimization Quiz Question 9: Identifying whether an optimization problem is linear, integer, or nonlinear primarily helps a practitioner to:
- Select the most appropriate and efficient solution technique. (correct)
- Determine the exact optimal value without performing any computation.
- Eliminate the need for any constraints in the model.
- Guarantee that the feasible region is convex.
Introduction to Mathematical Optimization Quiz Question 10: Which statement correctly identifies a nonlinear programming (NLP) problem?
- It includes at least one nonlinear function in the objective or constraints (correct)
- All decision variables must be integer‑valued
- The feasible region is a polyhedron defined by linear constraints
- It can be solved solely by the simplex method
Introduction to Mathematical Optimization Quiz Question 11: Which property of an optimization problem would most likely discourage the use of a basic gradient‑descent method?
- Non‑convex objective function (correct)
- Linear objective with linear constraints
- All variables are continuous
- Convex feasible region
Introduction to Mathematical Optimization Quiz Question 12: When formulating an optimization problem, what are the two possible goals regarding the objective function $f(x)$?
- Minimize $f(x)$ or maximize $f(x)$ (correct)
- Convert $f(x)$ to a linear function
- Find the gradient of $f(x)$
- Ensure $f(x)$ is always positive
Introduction to Mathematical Optimization Quiz Question 13: In Sequential Quadratic Programming, each iteration solves which type of subproblem to approximate the original nonlinear problem?
- A quadratic programming subproblem (correct)
- A linear programming subproblem
- A mixed‑integer programming subproblem
- A nonlinear programming subproblem
Which description correctly characterizes a linear programming (LP) problem?
1 of 13
Key Concepts
Optimization Fundamentals
Mathematical optimization
Decision variable
Objective function
Constraint (optimization)
Feasible set
Optimization Techniques
Linear programming
Integer programming
Nonlinear programming
Simplex method
Branch and bound
Definitions
Mathematical optimization
The field that studies how to select the best decision from a set of feasible alternatives by optimizing a numerical objective.
Decision variable
A controllable quantity in an optimization model, whose values constitute the decision vector.
Objective function
A mathematical expression that assigns a numerical value (e.g., cost, profit) to each decision vector, to be minimized or maximized.
Constraint (optimization)
A condition, expressed as an equality or inequality, that limits the allowable values of decision variables.
Feasible set
The collection of all decision vectors that satisfy every constraint of an optimization problem.
Linear programming
An optimization class where both the objective function and all constraints are linear, solvable efficiently by methods such as the simplex algorithm.
Integer programming
An optimization class that requires some or all decision variables to take integer values, often NP‑hard.
Nonlinear programming
An optimization class containing at least one nonlinear objective or constraint, addressed with gradient‑based or heuristic methods.
Simplex method
An algorithm that moves along vertices of the feasible polyhedron to locate the optimal solution of a linear program.
Branch and bound
A systematic tree‑search technique that partitions an integer programming problem into subproblems to find the global optimum.