Penalty method

Last updated October 10, 2024 • 2 min readFrom Wikipedia, The Free Encyclopedia

Penalty methods are a certain class of algorithms for solving constrained optimization problems.

A penalty method replaces a constrained optimization problem by a series of unconstrained problems whose solutions ideally converge to the solution of the original constrained problem. The unconstrained problems are formed by adding a term, called a penalty function, to the objective function that consists of a penalty parameter multiplied by a measure of violation of the constraints. The measure of violation is nonzero when the constraints are violated and is zero in the region where constraints are not violated.

Description

Let us say we are solving the following constrained problem:

\min _{x}f(\mathbf {x} )

subject to

c_{i}(\mathbf {x} )\leq 0~\forall i\in I.

This problem can be solved as a series of unconstrained minimization problems

\min f_{p}(\mathbf {x} ):=f(\mathbf {x} )+p~\sum _{i\in I}~g(c_{i}(\mathbf {x} ))

where

g(c_{i}(\mathbf {x} ))=\max(0,c_{i}(\mathbf {x} ))^{2}.

In the above equations, $g(c_{i}(\mathbf {x} ))$ is the exterior penalty function while $p$ is the penalty coefficient. When the penalty coefficient is 0, f_p=f. In each iteration of the method, we increase the penalty coefficient $p$ (e.g. by a factor of 10), solve the unconstrained problem and use the solution as the initial guess for the next iteration. Solutions of the successive unconstrained problems will asymptotically converge to the solution of the original constrained problem.

Common penalty functions in constrained optimization are the quadratic penalty function and the deadzone-linear penalty function.^[1]

Convergence

We first consider the set of global optimizers of the original problem, X*.^[2]^{: Thm.9.2.1}Assume that the objective f has bounded level sets, and that the original problem is feasible. Then:

For every penalty coefficient p, the set of global optimizers of the penalized problem, X_p*, is non-empty.
For every ε>0, there exists a penalty coefficient p such that the set X_p* is contained in an ε-neighborhood of the set X*.

This theorem is helpful mostly when f_p is convex, since in this case, we can find the global optimizers of f_p.

A second theorem considers local optimizers.^[2]^{: Thm.9.2.2} Let x* be a non-degenerate local optimizer of the original problem ("nondegenerate" means that the gradients of the active constraints are linearly independent and the second-order sufficient optimality condition is satisfied). Then, there exists a neighborhood V* of x*, and some p₀>0, such that for all p>p₀, the penalized objective f_p has exactly one critical point in V* (denoted by x*(p)), and x*(p) approaches x* as p→∞. Also, the objective value f(x*(p)) is weakly-increasing with p.

Practical applications

Image compression optimization algorithms can make use of penalty functions for selecting how best to compress zones of colour to single representative values.^[3]^[4] The penalty method is often used in computational mechanics, especially in the Finite element method, to enforce conditions such as e.g. contact.

The advantage of the penalty method is that, once we have a penalized objective with no constraints, we can use any unconstrained optimization method to solve it. The disadvantage is that, as the penalty coefficient p grows, the unconstrained problem becomes ill-conditioned - the coefficients are very large, and this may cause numeric errors and slow convergence of the unconstrained minimization.^[2]^: Sub.9.2

Related Research Articles

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming.

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for minimizing a differentiable multivariate function.

Multi-disciplinary design optimization (MDO) is a field of engineering that uses optimization methods to solve design problems incorporating a number of disciplines. It is also known as multidisciplinary system design optimization (MSDO), and multidisciplinary design analysis and optimization (MDAO).

In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints are not linear equalities or the objective function is not a linear function. An optimization problem is one of calculation of the extrema of an objective function over a set of unknown real variables and conditional to the satisfaction of a system of equalities and inequalities, collectively termed constraints. It is the sub-field of mathematical optimization that deals with problems that are not linear.

In mathematical optimization, the active-set method is an algorithm used to identify the active constraints in a set of inequality constraints. The active constraints are then expressed as equality constraints, thereby transforming an inequality-constrained problem into a simpler equality-constrained subproblem.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

<span class="mw-page-title-main">Differential evolution</span> Method of mathematical optimization

In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the optimized problem and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found.

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

In constrained optimization, a field of mathematics, a barrier function is a continuous function whose value increases to infinity as its argument approaches the boundary of the feasible region of an optimization problem. Such functions are used to replace inequality constraints by a penalizing term in the objective function that is easier to handle. A barrier function is also called an interior penalty function, as it is a penalty function that forces the solution to remain within the interior of the feasible region.

In mathematical optimization, constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

Limited-memory BFGS is an optimization algorithm in the family of quasi-Newton methods that approximates the Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) using a limited amount of computer memory. It is a popular algorithm for parameter estimation in machine learning. The algorithm's target problem is to minimize $over unconstrained values of the real-vector where is a differentiable scalar function.$

Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization which may be considered a quasi-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable, but not necessarily convex.

In computational chemistry, a constraint algorithm is a method for satisfying the Newtonian motion of a rigid body which consists of mass points. A restraint algorithm is used to ensure that the distance between mass points is maintained. The general steps involved are: (i) choose novel unconstrained coordinates, (ii) introduce explicit constraint forces, (iii) minimize constraint forces implicitly by the technique of Lagrange multipliers or projection methods.

Guided local search is a metaheuristic search method. A meta-heuristic method is a method that sits on top of a local search algorithm to change its behavior.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix $A$ and a (column) vector of response variables $y$ , the goal is to find

References

↑ Boyd, Stephen; Vandenberghe, Lieven (2004). "6.1". Convex Optimization. Cambridge university press. p. 309. ISBN 978-0521833783.
1 2 3 Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization" (PDF).
↑ Galar, M.; Jurio, A.; Lopez-Molina, C.; Paternain, D.; Sanz, J.; Bustince, H. (2013). "Aggregation functions to combine RGB color channels in stereo matching". Optics Express. 21 (1): 1247–1257. doi:10.1364/oe.21.001247. hdl: 2454/21074 . PMID 23389018.
↑ "Researchers restore image using version containing between 1 and 10 percent of information". Phys.org (Omicron Technology Limited). Retrieved 26 October 2013.

Smith, Alice E.; Coit David W. Penalty functions Handbook of Evolutionary Computation, Section C 5.2. Oxford University Press and Institute of Physics Publishing, 1996.

Coello, A.C.: Theoretical and Numerical Constraint-Handling Techniques Used with Evolutionary Algorithms: A Survey of the State of the Art. Comput. Methods Appl. Mech. Engrg. 191(11-12), 1245-1287

Courant, R. Variational methods for the solution of problems of equilibrium and vibrations. Bull. Amer. Math. Soc., 49, 1–23, 1943.

Wotao, Y. Optimization Algorithms for constrained optimization. Department of Mathematics, UCLA, 2015.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Boyd, Stephen; Vandenberghe, Lieven (2004). "6.1". Convex Optimization. Cambridge university press. p. 309. ISBN 978-0521833783.

[:0-2] 1 2 3 Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization" (PDF).

[3] Galar, M.; Jurio, A.; Lopez-Molina, C.; Paternain, D.; Sanz, J.; Bustince, H. (2013). "Aggregation functions to combine RGB color channels in stereo matching". Optics Express. 21 (1): 1247–1257. doi:10.1364/oe.21.001247. hdl: 2454/21074 . PMID 23389018.

[4] "Researchers restore image using version containing between 1 and 10 percent of information". Phys.org (Omicron Technology Limited). Retrieved 26 October 2013.

[1]

[2]

[3]

[4]