Sum-of-squares optimization

Last updated January 19, 2025

A sum-of-squares optimization program is an optimization problem with a linear cost function and a particular type of constraint on the decision variables. These constraints are of the form that when the decision variables are used as coefficients in certain polynomials, those polynomials should have the polynomial SOS property. When fixing the maximum degree of the polynomials involved, sum-of-squares optimization is also known as the Lasserre hierarchy of relaxations in semidefinite programming.

Optimization problem

Given a vector $c\in \mathbb {R} ^{n}$ and polynomials $a_{k,j}$ for $k=1,\dots N_{s}$ , $j=0,1,\dots ,n$ , a sum-of-squares optimization problem is written as

${\begin{aligned}{\underset {u\in \mathbb {R} ^{n}}{\text{maximize}}}\quad &c^{T}u\\{\text{subject to}}\quad &a_{k,0}(x)+a_{k,1}(x)u_{1}+\cdots +a_{k,n}(x)u_{n}\in {\text{SOS}}\quad (k=1,\ldots ,N_{s}).\end{aligned}}$

Here "SOS" represents the class of sum-of-squares (SOS) polynomials. The quantities $u\in \mathbb {R} ^{n}$ are the decision variables. SOS programs can be converted to semidefinite programs (SDPs) using the duality of the SOS polynomial program and a relaxation for constrained polynomial optimization using positive-semidefinite matrices, see the following section.

Dual problem: constrained polynomial optimization

Suppose we have an $n$ -variate polynomial $p(x):\mathbb {R} ^{n}\to \mathbb {R}$ , and suppose that we would like to minimize this polynomial over a subset ${\textstyle A\subseteq \mathbb {R} ^{n}}$ . Suppose furthermore that the constraints on the subset ${\textstyle A}$ can be encoded using ${\textstyle m}$ polynomial equalities of degree at most $2d$ , each of the form ${\textstyle a_{i}(x)=0}$ where $a_{i}:\mathbb {R} ^{n}\to \mathbb {R}$ is a polynomial of degree at most $2d$ . A natural, though generally non-convex program for this optimization problem is the following: $\min _{x\in \mathbb {R} ^{n}}\langle C,x^{\leq d}(x^{\leq d})^{\top }\rangle$ subject to:

\langle A_{i},x^{\leq d}(x^{\leq d})^{\top }\rangle =0\qquad \forall \ i\in [m],

1

$x_{\emptyset }=1,$ where ${\textstyle x^{\leq d}}$ is the $n^{O(d)}$ -dimensional vector with one entry for every monomial in $x$ of degree at most $d$ , so that for each multiset $S\subset [n],|S|\leq d,$ ${\textstyle x_{S}=\prod _{i\in S}x_{i}}$ , ${\textstyle C}$ is a matrix of coefficients of the polynomial ${\textstyle p(x)}$ that we want to minimize, and ${\textstyle A_{i}}$ is a matrix of coefficients of the polynomial ${\textstyle a_{i}(x)}$ encoding the $i$ -th constraint on the subset $A\subset \mathbb {R} ^{n}$ . The additional, fixed constant index in our search space, $x_{\emptyset }=1$ , is added for the convenience of writing the polynomials ${\textstyle p(x)}$ and ${\textstyle a_{i}(x)}$ in a matrix representation.

This program is generally non-convex, because the constraints ( 1 ) are not convex. One possible convex relaxation for this minimization problem uses semidefinite programming to replace the rank-one matrix of variables $x^{\leq d}(x^{\leq d})^{\top }$ with a positive-semidefinite matrix $X$ : we index each monomial of size at most $2d$ by a multiset $S$ of at most $2d$ indices, $S\subset [n],|S|\leq 2d$ . For each such monomial, we create a variable $X_{S}$ in the program, and we arrange the variables $X_{S}$ to form the matrix ${\textstyle X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}$ , where $\mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}$ is the set of real matrices whose rows and columns are identified with multisets of elements from $n$ of size at most $d$ . We then write the following semidefinite program in the variables $X_{S}$ : $\min _{X\in \mathbb {R} ^{[n]^{\leq d}\times [n]^{\leq d}}}\langle C,X\rangle$ subject to: $\langle A_{i},X\rangle =0\qquad \forall \ i\in [m],Q$ $X_{\emptyset }=1,$ $X_{U\cup V}=X_{S\cup T}\qquad \forall \ U,V,S,T\subseteq [n],|U|,|V|,|S|,|T|\leq d,{\text{ and}}\ U\cup V=S\cup T,$ $X\succeq 0,$

where again ${\textstyle C}$ is the matrix of coefficients of the polynomial ${\textstyle p(x)}$ that we want to minimize, and ${\textstyle A_{i}}$ is the matrix of coefficients of the polynomial ${\textstyle a_{i}(x)}$ encoding the $i$ -th constraint on the subset $A\subset \mathbb {R} ^{n}$ .

The third constraint ensures that the value of a monomial that appears several times within the matrix is equal throughout the matrix, and is added to make $X$ respect the symmetries present in the quadratic form $x^{\leq d}(x^{\leq d})^{\top }$ .

Duality

One can take the dual of the above semidefinite program and obtain the following program: $\max _{y\in \mathbb {R} ^{m'}}y_{0},$ subject to: $C-y_{0}e_{\emptyset }-\sum _{i\in [m]}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\succeq 0.$

We have a variable $y_{0}$ corresponding to the constraint $\langle e_{\emptyset },X\rangle =1$ (where $e_{\emptyset }$ is the matrix with all entries zero save for the entry indexed by $(\varnothing ,\varnothing )$ ), a real variable $y_{i}$ for each polynomial constraint $\langle X,A_{i}\rangle =0\quad s.t.i\in [m],$ and for each group of multisets $S,T,U,V\subset [n],|S|,|T|,|U|,|V|\leq d,S\cup T=U\cup V$ , we have a dual variable $y_{S,T,U,V}$ for the symmetry constraint $\langle X,e_{S,T}-e_{U,V}\rangle =0$ . The positive-semidefiniteness constraint ensures that $p(x)-y_{0}$ is a sum-of-squares of polynomials over $A\subset \mathbb {R} ^{n}$ : by a characterization of positive-semidefinite matrices, for any positive-semidefinite matrix ${\textstyle Q\in \mathbb {R} ^{m\times m}}$ , we can write ${\textstyle Q=\sum _{i\in [m]}f_{i}f_{i}^{\top }}$ for vectors ${\textstyle f_{i}\in \mathbb {R} ^{m}}$ . Thus for any ${\textstyle x\in A\subset \mathbb {R} ^{n}}$ , ${\begin{aligned}p(x)-y_{0}&=p(x)-y_{0}-\sum _{i\in [m']}y_{i}a_{i}(x)\qquad {\text{since }}x\in A\\&=(x^{\leq d})^{\top }\left(C-y_{0}e_{\emptyset }-\sum _{i\in [m']}y_{i}A_{i}-\sum _{S\cup T=U\cup V}y_{S,T,U,V}(e_{S,T}-e_{U,V})\right)x^{\leq d}\qquad {\text{by symmetry}}\\&=(x^{\leq d})^{\top }\left(\sum _{i}f_{i}f_{i}^{\top }\right)x^{\leq d}\\&=\sum _{i}\langle x^{\leq d},f_{i}\rangle ^{2}\\&=\sum _{i}f_{i}(x)^{2},\end{aligned}}$

where we have identified the vectors ${\textstyle f_{i}}$ with the coefficients of a polynomial of degree at most $d$ . This gives a sum-of-squares proof that the value ${\textstyle p(x)\geq y_{0}}$ over $A\subset \mathbb {R} ^{n}$ .

The above can also be extended to regions $A\subset \mathbb {R} ^{n}$ defined by polynomial inequalities.

Sum-of-squares hierarchy

The sum-of-squares hierarchy (SOS hierarchy), also known as the Lasserre hierarchy, is a hierarchy of convex relaxations of increasing power and increasing computational cost. For each natural number ${\textstyle d\in \mathbb {N} }$ the corresponding convex relaxation is known as the ${\textstyle d}$ th level or ${\textstyle d}$ -th round of the SOS hierarchy. The ${\textstyle 1}$ st round, when ${\textstyle d=1}$ , corresponds to a basic semidefinite program, or to sum-of-squares optimization over polynomials of degree at most $2$ . To augment the basic convex program at the ${\textstyle 1}$ st level of the hierarchy to ${\textstyle d}$ -th level, additional variables and constraints are added to the program to have the program consider polynomials of degree at most $2d$ .

The SOS hierarchy derives its name from the fact that the value of the objective function at the ${\textstyle d}$ -th level is bounded with a sum-of-squares proof using polynomials of degree at most ${\textstyle 2d}$ via the dual (see "Duality" above). Consequently, any sum-of-squares proof that uses polynomials of degree at most ${\textstyle 2d}$ can be used to bound the objective value, allowing one to prove guarantees on the tightness of the relaxation.

In conjunction with a theorem of Berg, this further implies that given sufficiently many rounds, the relaxation becomes arbitrarily tight on any fixed interval. Berg's result^[5]^[6] states that every non-negative real polynomial within a bounded interval can be approximated within accuracy ${\textstyle \varepsilon }$ on that interval with a sum-of-squares of real polynomials of sufficiently high degree, and thus if ${\textstyle OBJ(x)}$ is the polynomial objective value as a function of the point ${\textstyle x}$ , if the inequality ${\textstyle c+\varepsilon -OBJ(x)\geq 0}$ holds for all ${\textstyle x}$ in the region of interest, then there must be a sum-of-squares proof of this fact. Choosing ${\textstyle c}$ to be the minimum of the objective function over the feasible region, we have the result.

Computational cost

When optimizing over a function in ${\textstyle n}$ variables, the ${\textstyle d}$ -th level of the hierarchy can be written as a semidefinite program over ${\textstyle n^{O(d)}}$ variables, and can be solved in time ${\textstyle n^{O(d)}}$ using the ellipsoid method.

Sum-of-squares background

A polynomial $p$ is a sum of squares (SOS) if there exist polynomials $\{f_{i}\}_{i=1}^{m}$ such that ${\textstyle p=\sum _{i=1}^{m}f_{i}^{2}}$ . For example, $p=x^{2}-4xy+7y^{2}$ is a sum of squares since $p=f_{1}^{2}+f_{2}^{2}$ where $f_{1}=(x-2y){\text{ and }}f_{2}={\sqrt {3}}y.$ Note that if $p$ is a sum of squares then $p(x)\geq 0$ for all $x\in \mathbb {R} ^{n}$ . Detailed descriptions of polynomial SOS are available.^[7]^[8]^[9]

Quadratic forms can be expressed as $p(x)=x^{T}Qx$ where $Q$ is a symmetric matrix. Similarly, polynomials of degree ≤ 2d can be expressed as $p(x)=z(x)^{\mathsf {T}}Qz(x),$ where the vector $z$ contains all monomials of degree $\leq d$ . This is known as the Gram matrix form. An important fact is that $p$ is SOS if and only if there exists a symmetric and positive-semidefinite matrix $Q$ such that $p(x)=z(x)^{\mathsf {T}}Qz(x)$ . This provides a connection between SOS polynomials and positive-semidefinite matrices.

Software tools

SOSTOOLS, licensed under the GNU GPL. The reference guide is available at arXiv:1310.4716 [math.OC], and a presentation about its internals is available here.
CDCS-sos, a package from CDCS, an augmented Lagrangian method solver, to deal with large scale SOS programs.
The SumOfSquares extension of JuMP for Julia.
TSSOS for Julia, a polynomial optimization tool based on the sparsity adapted moment-SOS hierarchies.
For the dual problem of constrained polynomial optimization, GloptiPoly for MATLAB/Octave, Ncpol2sdpa for Python and MomentOpt for Julia.

Related Research Articles

In mathematics, and more specifically in linear algebra, a linear map is a mapping $between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.$

In mathematics, a symmetric matrix $with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the row vector transpose of More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of$

<span class="mw-page-title-main">Taylor's theorem</span> Approximation of a function by a truncated power series

In calculus, Taylor's theorem gives an approximation of a $-times differentiable function around a given point by a polynomial of degree, called the -th-order Taylor polynomial . For a smooth function, the Taylor polynomial is the truncation at the order of the Taylor series of the function. The first-order Taylor polynomial is the linear approximation of the function, and the second-order Taylor polynomial is often referred to as the quadratic approximation . There are several versions of Taylor's theorem, some giving explicit estimates of the approximation error of the function by its Taylor polynomial.$

In abstract algebra, a semiring is an algebraic structure. Semirings are a generalization of rings, dropping the requirement that each element must have an additive inverse. At the same time, semirings are a generalization of bounded distributive lattices.

In linear algebra, the Gram matrix of a set of vectors $in an inner product space is the Hermitian matrix of inner products, whose entries are given by the inner product . If the vectors are the columns of matrix then the Gram matrix is in the general case that the vector coordinates are complex numbers, which simplifies to for the case that the vector coordinates are real numbers.$

In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also interact with matrix multiplication.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

Reed–Muller codes are error-correcting codes that are used in wireless communications applications, particularly in deep-space communication. Moreover, the proposed 5G standard relies on the closely related polar codes for error correction in the control channel. Due to their favorable theoretical and mathematical properties, Reed–Muller codes have also been extensively studied in theoretical computer science.

In mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix $B$ is said to be a square root of $A$ if the matrix product $BB$ is equal to $A$ .

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In convex optimization, a linear matrix inequality (LMI) is an expression of the form

A second-order cone program (SOCP) is a convex optimization problem of the form

In mathematics, the Grothendieck inequality states that there is a universal constant $with the following property. If M ij is an n \times n matrix with$

Conic optimization is a subfield of convex optimization that studies problems consisting of minimizing a convex function over the intersection of an affine subspace and a convex cone.

A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for optimization using Newton's method A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in interior point methods for optimization.

Large margin nearest neighbor (LMNN) classification is a statistical machine learning algorithm for metric learning. It learns a pseudometric designed for k-nearest neighbor classification. The algorithm is based on semidefinite programming, a sub-class of convex optimization.

In mathematics, a positive polynomial on a particular set is a polynomial whose values are positive on that set. Precisely, Let $be a polynomial in variables with real coefficients and let be a subset of the -dimensional Euclidean space . We say that:$

In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

In applied mathematics, Graver bases enable iterative solutions of linear and various nonlinear integer programming problems in polynomial time. They were introduced by Jack E. Graver. Their connection to the theory of Gröbner bases was discussed by Bernd Sturmfels. The algorithmic theory of Graver bases and its application to integer programming is described by Shmuel Onn.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

References

↑ Sum of squares : theory and applications : AMS short course, sum of squares : theory and applications, January 14-15, 2019, Baltimore, Maryland. Parrilo, Pablo A.; Thomas, Rekha R. Providence, Rhode Island: American Mathematical Society. 2020. ISBN 978-1-4704-5025-0. OCLC 1157604983.{{cite book}}: CS1 maint: others (link)
↑ Tan, W., Packard, A., 2004. "Searching for control Lyapunov functions using sums of squares programming". In: Allerton Conf. on Comm., Control and Computing. pp. 210–219.
↑ Tan, W., Topcu, U., Seiler, P., Balas, G., Packard, A., 2008. Simulation-aided reachability and local gain analysis for nonlinear dynamical systems. In: Proc. of the IEEE Conference on Decision and Control. pp. 4097–4102.
↑ A. Chakraborty, P. Seiler, and G. Balas, "Susceptibility of F/A-18 Flight Controllers to the Falling-Leaf Mode: Nonlinear Analysis," AIAA Journal of Guidance, Control, and Dynamics, vol. 34 no. 1 (2011), pp. 73–85.
↑ Berg, Christian (1987). "The multidimensional moment problem and semigroups". In Landau, Henry J. (ed.). Moments in Mathematics. Proceedings of Symposia in Applied Mathematics. Vol. 37. pp. 110–124. doi:10.1090/psapm/037/921086. ISBN 9780821801147.
↑ Lasserre, J. (2007-01-01). "A Sum of Squares Approximation of Nonnegative Polynomials". SIAM Review. 49 (4): 651–669. arXiv: math/0412398 . Bibcode:2007SIAMR..49..651L. doi:10.1137/070693709. ISSN 0036-1445.
↑ Parrilo, P., (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization . Ph.D. thesis, California Institute of Technology.
↑ Parrilo, P. (2003) "Semidefinite programming relaxations for semialgebraic problems". Mathematical Programming Ser. B 96 (2), 293–320.
↑ Lasserre, J. (2001) "Global optimization with polynomials and the problem of moments". SIAM Journal on Optimization, 11 (3), 796{817.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Sum of squares : theory and applications : AMS short course, sum of squares : theory and applications, January 14-15, 2019, Baltimore, Maryland. Parrilo, Pablo A.; Thomas, Rekha R. Providence, Rhode Island: American Mathematical Society. 2020. ISBN 978-1-4704-5025-0. OCLC 1157604983.{{cite book}}: CS1 maint: others (link)

[2] Tan, W., Packard, A., 2004. "Searching for control Lyapunov functions using sums of squares programming". In: Allerton Conf. on Comm., Control and Computing. pp. 210–219.

[3] Tan, W., Topcu, U., Seiler, P., Balas, G., Packard, A., 2008. Simulation-aided reachability and local gain analysis for nonlinear dynamical systems. In: Proc. of the IEEE Conference on Decision and Control. pp. 4097–4102.

[4] A. Chakraborty, P. Seiler, and G. Balas, "Susceptibility of F/A-18 Flight Controllers to the Falling-Leaf Mode: Nonlinear Analysis," AIAA Journal of Guidance, Control, and Dynamics, vol. 34 no. 1 (2011), pp. 73–85.

[5] Berg, Christian (1987). "The multidimensional moment problem and semigroups". In Landau, Henry J. (ed.). Moments in Mathematics. Proceedings of Symposia in Applied Mathematics. Vol. 37. pp. 110–124. doi:10.1090/psapm/037/921086. ISBN 9780821801147.

[6] Lasserre, J. (2007-01-01). "A Sum of Squares Approximation of Nonnegative Polynomials". SIAM Review. 49 (4): 651–669. arXiv: math/0412398 . Bibcode:2007SIAMR..49..651L. doi:10.1137/070693709. ISSN 0036-1445.

[7] Parrilo, P., (2000) Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization . Ph.D. thesis, California Institute of Technology.

[8] Parrilo, P. (2003) "Semidefinite programming relaxations for semialgebraic problems". Mathematical Programming Ser. B 96 (2), 293–320.

[9] Lasserre, J. (2001) "Global optimization with polynomials and the problem of moments". SIAM Journal on Optimization, 11 (3), 796{817.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]