RemNote Community
Community

Introduction to Mathematical Optimization

Understand the basics of mathematical optimization, the key problem classes (linear, integer, nonlinear), and how to select appropriate solution algorithms.
Summary
Read Summary
Flashcards
Save Flashcards
Quiz
Take Quiz

Quick Practice

What are decision variables in the context of an optimization model?
1 of 18

Summary

Fundamentals of Mathematical Optimization What is Optimization? Mathematical optimization is the study of how to make the best possible choice from available alternatives. In practical terms, this means selecting a decision that minimizes (or maximizes) some objective—like reducing cost, increasing profit, or minimizing distance traveled—while respecting whatever constraints apply to the problem. Every optimization problem has three essential components: what you can control (decision variables), what you want to achieve (objective function), and what limits your choices (constraints). Understanding these three pieces is the key to formulating and solving any optimization problem. Decision Variables and the Decision Vector The foundation of any optimization problem lies in identifying the decision variables—the quantities you can control and adjust to achieve your goal. For example, in a manufacturing problem, decision variables might be: The number of units to produce of each product How much of each resource to allocate Once you've identified all the decision variables in a problem, you collect them into a single mathematical object called the decision vector, denoted as $x$. If there are $n$ decision variables, the decision vector is: $$x = (x1, x2, \ldots, xn)$$ The decision vector serves as a compact way to represent all the choices you can make at once. Each specific choice of values for $x1, x2, \ldots, xn$ corresponds to one particular decision. The Objective Function Once you've defined your decision variables, you need to express what you're trying to achieve mathematically. This is where the objective function comes in. The objective function $f(x)$ is a mathematical formula that assigns a single numerical value to each decision vector $x$. This value might represent cost, profit, time, distance, or any other quantity relevant to your problem. The fundamental goal of optimization is then to find the decision vector that produces either the minimum or maximum value of this objective function: Minimization: $\min f(x)$ — used when you want to reduce cost, time, risk, or error Maximization: $\max f(x)$ — used when you want to increase profit, efficiency, or satisfaction For example, if you're routing a delivery vehicle, the objective function might be the total distance traveled, and you'd want to minimize it. If you're producing goods for sale, the objective function might be total profit, and you'd want to maximize it. Constraints: Limiting Your Choices In the real world, you rarely have complete freedom in your decisions. There are always constraints—limits imposed by resources, laws, physical reality, or business rules—that restrict which decisions are actually feasible. Constraints are expressed mathematically in two forms: Inequality constraints express upper or lower bounds: $$gi(x) \le 0 \quad \text{for } i = 1, \ldots, m$$ These might represent statements like "you cannot use more than 100 units of raw material" or "the product quality must exceed a minimum threshold." Equality constraints require certain conditions to be met exactly: $$hj(x) = 0 \quad \text{for } j = 1, \ldots, p$$ These might represent statements like "supply must equal demand" or "the total weight must balance on both sides." In practice, constraints arise from: Resource limitations (budget, material, time available) Logical requirements (you can't produce negative quantities) Physical laws (heat equations, flow conservation) Business rules (minimum service levels) The Feasible Set Now that you have constraints, you can define which decisions are actually allowed. The feasible set $\mathcal{X}$ is the collection of all decision vectors $x$ that satisfy every constraint in the problem. Mathematically: $$\mathcal{X} = \{x : gi(x) \le 0 \text{ for all } i, \text{ and } hj(x) = 0 \text{ for all } j\}$$ The feasible set might be: The entire $n$-dimensional space $\mathbb{R}^n$ (if there are no constraints) A specific geometric region bounded by the constraints (the typical case) Empty (if constraints contradict each other—meaning there's no valid solution) Optimization is the process of finding the decision vector in $\mathcal{X}$ that gives the best value of $f(x)$. The Main Classes of Optimization Problems Different problem structures require different solution strategies. The three most important classes are linear programming, integer programming, and nonlinear programming. Recognizing which class a problem belongs to immediately suggests which algorithms will work best. Linear Programming (LP) A linear programming problem has both a linear objective function and linear constraints. "Linear" means the functions involve only sums of the variables multiplied by constants—no products, squares, or other nonlinear operations. The standard form of an LP problem is: $$\text{minimize } c1 x1 + c2 x2 + \cdots + cn xn$$ $$\text{subject to } a{ij} xj \le bi \quad \text{(and possibly some equality constraints)}$$ Why are LP problems special? Linear programs have remarkable properties: The feasible region is a polyhedron (a geometric shape with flat faces in $n$ dimensions) The optimal solution always occurs at a vertex (corner) of this polyhedron When an LP has an optimal solution, it always has a global optimum—meaning there's no better solution anywhere else These properties make LP problems convex, which guarantees that any local optimum is also the global optimum. This is why LP can be solved very efficiently. Solving LPs: Two main algorithmic approaches dominate: The Simplex Method moves from vertex to vertex of the feasible polyhedron, always improving the objective value, until reaching the optimal vertex Interior-Point Methods move through the interior of the feasible region (rather than along edges), reaching the optimum in fewer iterations for large problems Integer Programming (IP) An integer programming problem is like a linear program, but with an additional requirement: some or all decision variables must take integer values (whole numbers like 1, 5, 100—not decimals). Why add integrality constraints? In real problems, some quantities naturally must be integers: The number of trucks to purchase (you can't buy 2.3 trucks) Whether to open a facility (yes/no decision = 0 or 1) The number of workers to hire However, requiring integrality fundamentally changes the problem's difficulty. While LP problems can be solved in polynomial time, integer programs are generally NP-hard—meaning there's no known efficient algorithm that works for all cases. Why is IP harder? The feasible region is no longer a simple polyhedron. Instead, it consists of isolated integer points scattered throughout space. The optimal solution might not be at a vertex of the relaxed LP solution, and the problem may have many local optima—solutions that are better than nearby alternatives but not globally best. Solving IPs: Because of their difficulty, IP algorithms typically: Use branch-and-bound: systematically divide the problem into smaller sub-problems, solving LP relaxations to bound how good solutions can be Apply cutting-plane methods: add extra linear constraints to tighten the relaxation and guide the search Often resort to heuristics that find good (but not necessarily optimal) solutions quickly for very large problems Nonlinear Programming (NLP) A nonlinear programming problem contains at least one nonlinear function in either the objective or the constraints. Nonlinear functions include squares, products, exponentials, and other complex relationships. The critical challenge with NLP is that the structure is much more complex. Unlike LP, the feasible region may not have vertices, and unlike the convexity guarantee in LP, an NLP might have many local optima—points where the function is better than nearby neighbors but far from the global best. Convex vs. Non-Convex NLPs: A convex NLP has a special structure where any local optimum is guaranteed to be the global optimum. These can be solved reliably. A non-convex NLP may have many local optima with different objective values. Finding the true global best is much harder. Solving NLPs: Because the structure varies widely, solution methods focus on local optimization using gradient information: Gradient descent follows the steepest downward direction to reduce the objective value Newton's method uses second-order information (the Hessian matrix, which describes the curvature) to take more intelligent steps that account for how the function curves Sequential quadratic programming repeatedly solves quadratic approximations of the problem, refining the solution iteratively These methods are effective for finding local optima but don't guarantee finding the global best in non-convex problems. Comparing the Problem Classes Understanding the characteristics of each class helps you select the right algorithm: Linear Programming is the easiest: the feasible region is geometrically simple (a polyhedron), and any local optimum is global. Problems with thousands or millions of variables can be solved reliably in seconds. Integer Programming is harder: the integrality requirement breaks the nice geometric structure, and problems are NP-hard. Exact algorithms may require examining many sub-problems, though modern solvers use sophisticated techniques to prune the search tree effectively. Nonlinear Programming varies widely: convex NLPs are tractable and nearly as reliable as LPs, but general non-convex NLPs can be extremely difficult. Local optimization methods work well from good starting points, but there's no guarantee of finding the global optimum. Solution Techniques and Algorithms Newton's Method for Nonlinear Problems For smooth nonlinear problems, Newton's method is a classical approach that uses second-order information about the objective function. Newton's method works by examining the Hessian matrix $H(x)$, which measures how the function curves in each direction. Rather than just following the steepest downward direction (as in basic gradient descent), Newton's method takes curvature-adjusted steps that account for the landscape's shape: $$x{\text{new}} = x - H(x)^{-1} \nabla f(x)$$ This often leads to faster convergence than gradient descent, especially near a solution, because it's correcting for the function's curvature. However, Newton's method requires computing and inverting the Hessian, which is computationally expensive for large problems. Sequential Quadratic Programming (SQP) for Nonlinear Problems Sequential quadratic programming is a practical approach that divides a difficult nonlinear problem into a series of smaller, simpler problems. At each iteration, SQP: Builds a quadratic approximation of the objective function around the current solution Solves this quadratic subproblem (which is easier than the original) Takes a step in the direction suggested by the solution Repeats at the new point By solving a sequence of quadratic problems instead of the original nonlinear one, SQP often converges faster and more reliably than simpler gradient-based methods, especially when constraints are present. <extrainfo> Real-World Applications Fitting Statistical Models to Data One important application of optimization is statistical model fitting. When you have data and want to estimate a model's parameters, you typically formulate this as an optimization problem: Decision variables: the model's parameters (like slope and intercept in linear regression) Objective function: a measure of prediction error, such as the sum of squared residuals (the sum of squared differences between predicted and actual values) Optimization goal: minimize the prediction error This transforms the fitting problem into: find the parameter values that minimize the model's error on the training data. Most statistical estimation—from simple linear regression to complex neural networks—is solved using optimization algorithms in the background. </extrainfo> Key Principles for Solving Optimization Problems Matching Algorithm to Problem Class The first step in solving any optimization problem should be identifying its class: Is it linear? If yes, use LP algorithms (simplex or interior-point methods) Does it require integer variables? If yes, use IP algorithms (branch-and-bound, cutting planes) Is it nonlinear? If yes, use gradient-based methods appropriate for NLP The problem's structure directly determines which algorithms will be effective. An algorithm designed for linear problems will fail on nonlinear ones, and methods suited for non-convex problems may be overkill (and inefficient) for convex ones. Why Problem Class Matters Recognizing the problem class is crucial because: It predicts solution difficulty: Linear problems are easy, integer problems are hard, nonlinear depends on convexity It suggests the right algorithm: Each class has specialized algorithms that exploit its structure It guides expectations: LP gives global optima efficiently; IP may be NP-hard; general NLP may only find local optima It informs problem reformulation: Sometimes reformulating as a different problem class makes it easier to solve As you encounter optimization problems, always start by asking: "What class is this problem? What are its structure and properties? Which algorithm family is designed for this?"
Flashcards
What are decision variables in the context of an optimization model?
The quantities that can be controlled within the model.
What does the decision vector $x$ represent?
A single vector that collects all decision variables.
What is the role of the objective function $f(x)$?
It assigns a numerical value (such as cost, profit, or distance) to each decision vector to be minimized or maximized.
How are inequality constraints typically expressed mathematically?
$gi(x) \le 0$ for $i = 1, \dots, m$.
How are equality constraints typically expressed mathematically?
$hj(x) = 0$ for $j = 1, \dots, p$.
What do constraints represent in an optimization problem?
Limits such as resource capacities, logical requirements, or physical laws.
What is the feasible set $\mathcal{X}$?
The set containing all decision vectors that satisfy the given constraints.
What defines a Linear Programming (LP) problem?
It has a linear objective function and linear constraints.
Which common algorithms are used to solve Linear Programming problems efficiently?
Simplex method Interior-point algorithms
How does the simplex method locate the optimal solution?
By traversing the vertices of the feasible polyhedron.
How do interior-point methods reach the optimum differently than the simplex method?
They move through the interior of the feasible region rather than following the vertices.
What is the defining requirement of Integer Programming (IP)?
Some or all decision variables must take integer values.
What is the typical computational complexity and landscape of Integer Programming problems?
They are generally NP-hard and may have many local optima.
What is the purpose of cutting-plane methods in Integer Programming?
To add linear constraints that tighten the IP relaxation.
What characterizes a Nonlinear Programming (NLP) problem?
It contains at least one nonlinear function in either the objective or the constraints.
What is the difference in solvability between convex and non-convex NLPs?
Convex NLPs have tractable global solutions, while non-convex NLPs may require heuristic approaches.
How does Newton’s method use the Hessian matrix in optimization?
To take curvature-adjusted steps toward a stationary point.
How does Sequential Quadratic Programming (SQP) approximate an NLP problem?
By solving a series of quadratic subproblems around the current iterate.

Quiz

Which description correctly characterizes a linear programming (LP) problem?
1 of 13
Key Concepts
Optimization Fundamentals
Mathematical optimization
Decision variable
Objective function
Constraint (optimization)
Feasible set
Optimization Techniques
Linear programming
Integer programming
Nonlinear programming
Simplex method
Branch and bound