Revised simplex method

Last updated October 31, 2022

In mathematical optimization, the revised simplex method is a variant of George Dantzig's simplex method for linear programming.

The revised simplex method is mathematically equivalent to the standard simplex method but differs in implementation. Instead of maintaining a tableau which explicitly represents the constraints adjusted to a set of basic variables, it maintains a representation of a basis of the matrix representing the constraints. The matrix-oriented approach allows for greater computational efficiency by enabling sparse matrix operations.^[1]

Problem formulation

For the rest of the discussion, it is assumed that a linear programming problem has been converted into the following standard form:

{\begin{array}{rl}{\text{minimize}}&{\boldsymbol {c}}^{\mathrm {T} }{\boldsymbol {x}}\\{\text{subject to}}&{\boldsymbol {Ax}}={\boldsymbol {b}},{\boldsymbol {x}}\geq {\boldsymbol {0}}\end{array}}

where $A \in ℝ m \times n$ . Without loss of generality, it is assumed that the constraint matrix $A$ has full row rank and that the problem is feasible, i.e., there is at least one $x \geq 0$ such that $Ax = b$ . If $A$ is rank-deficient, either there are redundant constraints, or the problem is infeasible. Both situations can be handled by a presolve step.

Algorithmic description

Optimality conditions

For linear programming, the Karush–Kuhn–Tucker conditions are both necessary and sufficient for optimality. The KKT conditions of a linear programming problem in the standard form is

{\begin{aligned}{\boldsymbol {Ax}}&={\boldsymbol {b}},\\{\boldsymbol {A}}^{\mathrm {T} }{\boldsymbol {\lambda }}+{\boldsymbol {s}}&={\boldsymbol {c}},\\{\boldsymbol {x}}&\geq {\boldsymbol {0}},\\{\boldsymbol {s}}&\geq {\boldsymbol {0}},\\{\boldsymbol {s}}^{\mathrm {T} }{\boldsymbol {x}}&=0\end{aligned}}

where $λ$ and $s$ are the Lagrange multipliers associated with the constraints $Ax = b$ and $x \geq 0$ , respectively.^[2] The last condition, which is equivalent to $s i x i = 0$ for all $1 < i < n$ , is called the complementary slackness condition.

By what is sometimes known as the fundamental theorem of linear programming, a vertex $x$ of the feasible polytope can be identified by being a basis $B$ of $A$ chosen from the latter's columns.^{[lower-alpha 1]} Since $A$ has full rank, $B$ is nonsingular. Without loss of generality, assume that $A = [B N]$ . Then $x$ is given by

{\boldsymbol {x}}={\begin{bmatrix}{\boldsymbol {x_{B}}}\\{\boldsymbol {x_{N}}}\end{bmatrix}}={\begin{bmatrix}{\boldsymbol {B}}^{-1}{\boldsymbol {b}}\\{\boldsymbol {0}}\end{bmatrix}}

where $x B \geq 0$ . Partition $c$ and $s$ accordingly into

{\begin{aligned}{\boldsymbol {c}}&={\begin{bmatrix}{\boldsymbol {c_{B}}}\\{\boldsymbol {c_{N}}}\end{bmatrix}},\\{\boldsymbol {s}}&={\begin{bmatrix}{\boldsymbol {s_{B}}}\\{\boldsymbol {s_{N}}}\end{bmatrix}}.\end{aligned}}

To satisfy the complementary slackness condition, let $s B = 0$ . It follows that

{\begin{aligned}{\boldsymbol {B}}^{\mathrm {T} }{\boldsymbol {\lambda }}&={\boldsymbol {c_{B}}},\\{\boldsymbol {N}}^{\mathrm {T} }{\boldsymbol {\lambda }}+{\boldsymbol {s_{N}}}&={\boldsymbol {c_{N}}},\end{aligned}}

which implies that

{\begin{aligned}{\boldsymbol {\lambda }}&=({\boldsymbol {B}}^{\mathrm {T} })^{-1}{\boldsymbol {c_{B}}},\\{\boldsymbol {s_{N}}}&={\boldsymbol {c_{N}}}-{\boldsymbol {N}}^{\mathrm {T} }{\boldsymbol {\lambda }}.\end{aligned}}

If $s N \geq 0$ at this point, the KKT conditions are satisfied, and thus $x$ is optimal.

Pivot operation

If the KKT conditions are violated, a pivot operation consisting of introducing a column of $N$ into the basis at the expense of an existing column in $B$ is performed. In the absence of degeneracy, a pivot operation always results in a strict decrease in $c T x$ . Therefore, if the problem is bounded, the revised simplex method must terminate at an optimal vertex after repeated pivot operations because there are only a finite number of vertices.^[4]

Select an index $m < q \leq n$ such that $s q < 0$ as the entering index. The corresponding column of $A$ , $A q$ , will be moved into the basis, and $x q$ will be allowed to increase from zero. It can be shown that

{\frac {\partial ({\boldsymbol {c}}^{\mathrm {T} }{\boldsymbol {x}})}{\partial x_{q}}}=s_{q},

i.e., every unit increase in $x q$ results in a decrease by $- s q$ in $c T x$ .^[5] Since

{\boldsymbol {Bx_{B}}}+{\boldsymbol {A}}_{q}x_{q}={\boldsymbol {b}},

$x B$ must be correspondingly decreased by $Δ x B = B -1 A q x q$ subject to $x B - Δ x B \geq 0$ . Let $d = B -1 A q$ . If $d \leq 0$ , no matter how much $x q$ is increased, $x B - Δ x B$ will stay nonnegative. Hence, $c T x$ can be arbitrarily decreased, and thus the problem is unbounded. Otherwise, select an index $p = argmin 1\leq i \leq m {x i / d i | d i > 0}$ as the leaving index. This choice effectively increases $x q$ from zero until $x p$ is reduced to zero while maintaining feasibility. The pivot operation concludes with replacing $A p$ with $A q$ in the basis.

Numerical example

Consider a linear program where

{\begin{aligned}{\boldsymbol {c}}&={\begin{bmatrix}-2&-3&-4&0&0\end{bmatrix}}^{\mathrm {T} },\\{\boldsymbol {A}}&={\begin{bmatrix}3&2&1&1&0\\2&5&3&0&1\end{bmatrix}},\\{\boldsymbol {b}}&={\begin{bmatrix}10\\15\end{bmatrix}}.\end{aligned}}

Let

{\begin{aligned}{\boldsymbol {B}}&={\begin{bmatrix}{\boldsymbol {A}}_{4}&{\boldsymbol {A}}_{5}\end{bmatrix}},\\{\boldsymbol {N}}&={\begin{bmatrix}{\boldsymbol {A}}_{1}&{\boldsymbol {A}}_{2}&{\boldsymbol {A}}_{3}\end{bmatrix}}\end{aligned}}

initially, which corresponds to a feasible vertex $x = [0001015] T$ . At this moment,

{\begin{aligned}{\boldsymbol {\lambda }}&={\begin{bmatrix}0&0\end{bmatrix}}^{\mathrm {T} },\\{\boldsymbol {s_{N}}}&={\begin{bmatrix}-2&-3&-4\end{bmatrix}}^{\mathrm {T} }.\end{aligned}}

Choose $q = 3$ as the entering index. Then $d = [13] T$ , which means a unit increase in $x 3$ results in $x 4$ and $x 5$ being decreased by $1$ and $3$ , respectively. Therefore, $x 3$ is increased to $5$ , at which point $x 5$ is reduced to zero, and $p = 5$ becomes the leaving index.

After the pivot operation,

{\begin{aligned}{\boldsymbol {B}}&={\begin{bmatrix}{\boldsymbol {A}}_{3}&{\boldsymbol {A}}_{4}\end{bmatrix}},\\{\boldsymbol {N}}&={\begin{bmatrix}{\boldsymbol {A}}_{1}&{\boldsymbol {A}}_{2}&{\boldsymbol {A}}_{5}\end{bmatrix}}.\end{aligned}}

Correspondingly,

{\begin{aligned}{\boldsymbol {x}}&={\begin{bmatrix}0&0&5&5&0\end{bmatrix}}^{\mathrm {T} },\\{\boldsymbol {\lambda }}&={\begin{bmatrix}0&-4/3\end{bmatrix}}^{\mathrm {T} },\\{\boldsymbol {s_{N}}}&={\begin{bmatrix}2/3&11/3&4/3\end{bmatrix}}^{\mathrm {T} }.\end{aligned}}

A positive $s N$ indicates that $x$ is now optimal.

Practical issues

Degeneracy

Because the revised simplex method is mathematically equivalent to the simplex method, it also suffers from degeneracy, where a pivot operation does not result in a decrease in $c T x$ , and a chain of pivot operations causes the basis to cycle. A perturbation or lexicographic strategy can be used to prevent cycling and guarantee termination.^[6]

Basis representation

Two types of linear systems involving $B$ are present in the revised simplex method:

{\begin{aligned}{\boldsymbol {Bz}}&={\boldsymbol {y}},\\{\boldsymbol {B}}^{\mathrm {T} }{\boldsymbol {z}}&={\boldsymbol {y}}.\end{aligned}}

Instead of refactorizing $B$ , usually an LU factorization is directly updated after each pivot operation, for which purpose there exist several strategies such as the Forrest−Tomlin and Bartels−Golub methods. However, the amount of data representing the updates as well as numerical errors builds up over time and makes periodic refactorization necessary.^[1]^[7]

Notes and references

Notes

↑ The same theorem also states that the feasible polytope has at least one vertex and that there is at least one vertex which is optimal.^[3]

Related Research Articles

In physics, the Lorentz transformations are a six-parameter family of linear transformations from a coordinate frame in spacetime to another frame that moves at a constant velocity relative to the former. The respective inverse transformation is then parameterized by the negative of this velocity. The transformations are named after the Dutch physicist Hendrik Lorentz.

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Ray transfer matrix analysis is a mathematical form for performing ray tracing calculations in sufficiently simple problems which can be solved considering only paraxial rays. Each optical element is described by a 2×2 ray transfer matrix which operates on a vector describing an incoming light ray to calculate the outgoing ray. Multiplication of the successive matrices thus yields a concise ray transfer matrix describing the entire optical system. The same mathematics is also used in accelerator physics to track particles through the magnet installations of a particle accelerator, see electron optics.

<span class="mw-page-title-main">Hooke's law</span> Physical law: force needed to deform a spring scales linearly with distance

In physics, Hooke's law is an empirical law which states that the force needed to extend or compress a spring by some distance scales linearly with respect to that distance—that is, $F s = kx$ , where $k$ is a constant factor characteristic of the spring, and $x$ is small compared to the total possible deformation of the spring. The law is named after 17th-century British physicist Robert Hooke. He first stated the law in 1676 as a Latin anagram. He published the solution of his anagram in 1678 as: ut tensio, sic vis. Hooke states in the 1678 work that he was aware of the law since 1660.

In linear algebra, a square matrix $is called diagonalizable or non-defective if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix and a diagonal matrix such that, or equivalently . For a finite-dimensional vector space, a linear map is called diagonalizable if there exists an ordered basis of consisting of eigenvectors of . These definitions are equivalent: if has a matrix representation as above, then the column vectors of form a basis consisting of eigenvectors of, and the diagonal entries of are the corresponding eigenvalues of; with respect to this eigenvector basis, is represented by . Diagonalization is the process of finding the above and .$

In linear algebra, a Jordan normal form, also known as a Jordan canonical form or JCF, is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal, and with identical diagonal entries to the left and below them.

In mathematical optimization, Dantzig's simplex algorithm is a popular algorithm for linear programming.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

In mathematical optimization theory, the linear complementarity problem (LCP) arises frequently in computational mechanics and encompasses the well-known quadratic programming as a special case. It was proposed by Cottle and Dantzig in 1968.

Mehrotra's predictor–corrector method in optimization is a specific interior point method for linear programming. It was proposed in 1989 by Sanjay Mehrotra.

In applied mathematics, in particular the context of nonlinear system analysis, a phase plane is a visual display of certain characteristics of certain kinds of differential equations; a coordinate plane with axes being the values of the two state variables, say, or etc.. It is a two-dimensional case of the general n-dimensional phase space.

In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted by $, is the factor by which the eigenvector is scaled.$

In continuum mechanics, a Mooney–Rivlin solid is a hyperelastic material model where the strain energy density function $is a linear combination of two invariants of the left Cauchy-Green deformation tensor . The model was proposed by Melvin Mooney in 1940 and expressed in terms of invariants by Ronald Rivlin in 1948.$

In cartography, a Tissot's indicatrix is a mathematical contrivance presented by French mathematician Nicolas Auguste Tissot in 1859 and 1871 in order to characterize local distortions due to map projection. It is the geometry that results from projecting a circle of infinitesimal radius from a curved geometric model, such as a globe, onto a map. Tissot proved that the resulting diagram is an ellipse whose axes indicate the two principal directions along which scale is maximal and minimal at that point on the map.

<span class="mw-page-title-main">Prony's method</span>

Prony analysis was developed by Gaspard Riche de Prony in 1795. However, practical use of the method awaited the digital computer. Similar to the Fourier transform, Prony's method extracts valuable information from a uniformly sampled signal and builds a series of damped complex exponentials or damped sinusoids. This allows for the estimation of frequency, amplitude, phase and damping components of a signal.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In physics, deformation is the continuum mechanics transformation of a body from a reference configuration to a current configuration. A configuration is a set containing the positions of all particles of the body.

In numerical linear algebra, the conjugate gradient method is an iterative method for numerically solving the linear system

References

1 2 Morgan 1997, §2.
↑ Nocedal & Wright 2006, p. 358, Eq. 13.4.
↑ Nocedal & Wright 2006, p. 363, Theorem 13.2.
↑ Nocedal & Wright 2006, p. 370, Theorem 13.4.
↑ Nocedal & Wright 2006, p. 369, Eq. 13.24.
↑ Nocedal & Wright 2006, p. 381, §13.5.
↑ Nocedal & Wright 2006, p. 372, §13.4.

Bibliography

Morgan, S. S. (1997). A Comparison of Simplex Method Algorithms (MSc thesis). University of Florida. Archived from the original on 7 August 2011.
Nocedal, J.; Wright, S. J. (2006). Mikosch, T. V.; Resnick, S. I.; Robinson, S. M. (eds.). Numerical Optimization. Springer Series in Operations Research and Financial Engineering (2nd ed.). New York, NY, USA: Springer. ISBN 978-0-387-30303-1.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[4] The same theorem also states that the feasible polytope has at least one vertex and that there is at least one vertex which is optimal.^[3]

[FOOTNOTEMorgan1997§2-1] 1 2 Morgan 1997, §2.

[FOOTNOTENocedalWright2006358Eq.&nbsp;13.4-2] Nocedal & Wright 2006, p. 358, Eq. 13.4.

[FOOTNOTENocedalWright2006363Theorem&nbsp;13.2-3] Nocedal & Wright 2006, p. 363, Theorem 13.2.

[FOOTNOTENocedalWright2006370Theorem&nbsp;13.4-5] Nocedal & Wright 2006, p. 370, Theorem 13.4.

[FOOTNOTENocedalWright2006369Eq.&nbsp;13.24-6] Nocedal & Wright 2006, p. 369, Eq. 13.24.

[FOOTNOTENocedalWright2006381§13.5-7] Nocedal & Wright 2006, p. 381, §13.5.

[FOOTNOTENocedalWright2006372§13.4-8] Nocedal & Wright 2006, p. 372, §13.4.

[1]

[2]

[lower-alpha 1]

[4]

[5]

[6]

[7]

[3]

v t e Complementarity problems and algorithms
Complementarity Problems	Linear programming (LP) Quadratic programming (QP) Linear complementarity problem (LCP) Mixed linear (MLCP) Mixed (MCP) Nonlinear(NCP)
Basis-exchange algorithms	Simplex (Dantzig) Revised simplex Criss-cross Lemke