Non-negative least squares

Last updated

In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix A and a (column) vector of response variables y, the goal is to find [1]

Contents

subject to x ≥ 0.

Here x ≥ 0 means that each component of the vector x should be non-negative, and ‖·‖2 denotes the Euclidean norm.

Non-negative least squares problems turn up as subproblems in matrix decomposition, e.g. in algorithms for PARAFAC [2] and non-negative matrix/tensor factorization. [3] [4] The latter can be considered a generalization of NNLS. [1]

Another generalization of NNLS is bounded-variable least squares (BVLS), with simultaneous upper and lower bounds αixi ≤ βi. [5] :291 [6]

Quadratic programming version

The NNLS problem is equivalent to a quadratic programming problem

where Q = ATA and c = ATy. This problem is convex, as Q is positive semidefinite and the non-negativity constraints form a convex feasible set. [7]

Algorithms

The first widely used algorithm for solving this problem is an active-set method published by Lawson and Hanson in their 1974 book Solving Least Squares Problems. [5] :291 In pseudocode, this algorithm looks as follows: [1] [2]

  • Inputs:
    • a real-valued matrix A of dimension m × n,
    • a real-valued vector y of dimension m,
    • a real value ε, the tolerance for the stopping criterion.
  • Initialize:
    • Set P = ∅.
    • Set R = {1, ..., n}.
    • Set x to an all-zero vector of dimension n.
    • Set w = AT(yAx).
    • Let wR denote the sub-vector with indexes from R
  • Main loop: while R ≠ ∅ and max(wR) > ε:
    • Let j in R be the index of max(wR) in w.
    • Add j to P.
    • Remove j from R.
    • Let AP be A restricted to the variables included in P.
    • Let s be vector of same length as x. Let sP denote the sub-vector with indexes from P, and let sR denote the sub-vector with indexes from R.
    • Set sP = ((AP)TAP)−1 (AP)Ty
    • Set sR to zero
    • While min(sP) ≤ 0:
      • Let α = min xi/xisi for i in P where si ≤ 0.
      • Set x to x + α(sx).
      • Move to R all indices j in P such that xj ≤ 0.
      • Set sP = ((AP)TAP)−1 (AP)Ty
      • Set sR to zero.
    • Set x to s.
    • Set w to AT(yAx).
  • Output: x

This algorithm takes a finite number of steps to reach a solution and smoothly improves its candidate solution as it goes (so it can find good approximate solutions when cut off at a reasonable number of iterations), but is very slow in practice, owing largely to the computation of the pseudoinverse ((AP)TAP)−1. [1] Variants of this algorithm are available in MATLAB as the routine lsqnonneg [8] [1] and in SciPy as optimize.nnls. [9]

Many improved algorithms have been suggested since 1974. [1] Fast NNLS (FNNLS) is an optimized version of the Lawson—Hanson algorithm. [2] Other algorithms include variants of Landweber's gradient descent method [10] and coordinate-wise optimization based on the quadratic programming problem above. [7]

See also

Related Research Articles

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

In mathematics, a symmetric matrix with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the transpose of . More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of

<span class="mw-page-title-main">Linear programming</span> Method to solve some optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where the variation in the data can be described with fewer dimensions than the initial data. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition.

In mathematical optimization, Dantzig's simplex algorithm is a popular algorithm for linear programming.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

<span class="mw-page-title-main">Total least squares</span>

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

<span class="mw-page-title-main">Nonlinear regression</span> Regression analysis

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

<span class="mw-page-title-main">Conjugate gradient method</span> Mathematical optimization algorithm

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron (1907) and Georg Frobenius (1912), asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.

<span class="mw-page-title-main">Non-negative matrix factorization</span> Algorithms for matrix decomposition

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

Semidefinite programming (SDP) is a subfield of convex optimization concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing a condition number of the problem. The preconditioned problem is then usually solved by an iterative method.

<span class="mw-page-title-main">Bundle adjustment</span>

In photogrammetry and computer stereo vision, bundle adjustment is simultaneous refining of the 3D coordinates describing the scene geometry, the parameters of the relative motion, and the optical characteristics of the camera(s) employed to acquire the images, given a set of images depicting a number of 3D points from different viewpoints. Its name refers to the geometrical bundles of light rays originating from each 3D feature and converging on each camera's optical center, which are adjusted optimally according to an optimality criterion involving the corresponding image projections of all points.

<span class="mw-page-title-main">Matrix (mathematics)</span> Array of numbers

In mathematics, a matrix is a rectangular array or table of numbers, symbols, or expressions, arranged in rows and columns, which is used to represent a mathematical object or a property of such an object.

In mathematics, a submodular set function is a set function whose value, informally, has the property that the difference in the incremental value of the function that a single element makes when added to an input set decreases as the size of the input set increases. Submodular functions have a natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found immense utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

In constrained least squares one solves a linear least squares problem with an additional constraint on the solution. This means, the unconstrained equation must be fit as closely as possible while ensuring that some other property of is maintained.

In the theory of linear programming, a basic feasible solution (BFS) is a solution with a minimal set of non-zero variables. Geometrically, each BFS corresponds to a corner of the polyhedron of feasible solutions. If there exists an optimal solution, then there exists an optimal BFS. Hence, to find an optimal solution, it is sufficient to consider the BFS-s. This fact is used by the simplex algorithm, which essentially travels from one BFS to another until an optimal solution is found.

References

  1. 1 2 3 4 5 6 Chen, Donghui; Plemmons, Robert J. (2009). Nonnegativity constraints in numerical analysis. Symposium on the Birth of Numerical Analysis. CiteSeerX   10.1.1.157.9203 .
  2. 1 2 3 Bro, Rasmus; De Jong, Sijmen (1997). "A fast non-negativity-constrained least squares algorithm". Journal of Chemometrics. 11 (5): 393. doi:10.1002/(SICI)1099-128X(199709/10)11:5<393::AID-CEM483>3.0.CO;2-L.
  3. Lin, Chih-Jen (2007). "Projected Gradient Methods for Nonnegative Matrix Factorization" (PDF). Neural Computation . 19 (10): 2756–2779. CiteSeerX   10.1.1.308.9135 . doi:10.1162/neco.2007.19.10.2756. PMID   17716011.
  4. Boutsidis, Christos; Drineas, Petros (2009). "Random projections for the nonnegative least-squares problem". Linear Algebra and Its Applications. 431 (5–7): 760–771. arXiv: 0812.4547 . doi:10.1016/j.laa.2009.03.026.
  5. 1 2 Lawson, Charles L.; Hanson, Richard J. (1995). "23. Linear Least Squares with Linear Inequality Constraints". Solving Least Squares Problems. SIAM. p. 161.
  6. Stark, Philip B.; Parker, Robert L. (1995). "Bounded-variable least-squares: an algorithm and applications" (PDF). Computational Statistics. 10: 129.
  7. 1 2 Franc, Vojtěch; Hlaváč, Václav; Navara, Mirko (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. Computer Analysis of Images and Patterns. Lecture Notes in Computer Science. Vol. 3691. pp. 407–414. doi:10.1007/11556121_50. ISBN   978-3-540-28969-2.
  8. "lsqnonneg". MATLAB Documentation. Retrieved October 28, 2022.
  9. "scipy.optimize.nnls". SciPy v0.13.0 Reference Guide. Retrieved 25 January 2014.
  10. Johansson, B. R.; Elfving, T.; Kozlov, V.; Censor, Y.; Forssén, P. E.; Granlund, G. S. (2006). "The application of an oblique-projected Landweber method to a model of supervised learning". Mathematical and Computer Modelling. 43 (7–8): 892. doi: 10.1016/j.mcm.2005.12.010 .