Low-rank approximation

Last updated August 08, 2024

In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix (the data) and an approximating matrix (the optimization variable), subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

Definition
Applications
Basic low-rank approximation problem
Proof of Eckart–Young–Mirsky theorem (for spectral norm)
Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)
Weighted low-rank approximation problems
Entry-wise Lp low-rank approximation problems
Distance low-rank approximation problem
Distributed/Streaming low-rank approximation problem
Image and kernel representations of the rank constraints
Alternating projections algorithm
Variable projections algorithm
A Variant: convex-restricted low rank approximation
See also
References
External links

Low-rank approximation is closely related to numerous other techniques, including principal component analysis, factor analysis, total least squares, latent semantic analysis, orthogonal regression, and dynamic mode decomposition.

Definition

Given

structure specification ${\mathcal {S}}:\mathbb {R} ^{n_{p}}\to \mathbb {R} ^{m\times n}$ ,
vector of structure parameters $p\in \mathbb {R} ^{n_{p}}$ ,
norm $\|\cdot \|$ , and
desired rank $r$ ,

{\text{minimize}}\quad {\text{over }}{\widehat {p}}\quad \|p-{\widehat {p}}\|\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\mathcal {S}}({\widehat {p}}){\big )}\leq r.

Applications

Linear system identification, in which case the approximating matrix is Hankel structured.
Machine learning, in which case the approximating matrix is nonlinearly structured.
Recommender systems, in which cases the data matrix has missing values and the approximation is categorical.
Distance matrix completion, in which case there is a positive definiteness constraint.
Natural language processing, in which case the approximation is nonnegative.
Computer algebra, in which case the approximation is Sylvester structured.

Basic low-rank approximation problem

The unstructured problem with fit measured by the Frobenius norm, i.e.,

{\text{minimize}}\quad {\text{over }}{\widehat {D}}\quad \|D-{\widehat {D}}\|_{\text{F}}\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\widehat {D}}{\big )}\leq r

has an analytic solution in terms of the singular value decomposition of the data matrix. The result is referred to as the matrix approximation lemma or Eckart–Young–Mirsky theorem. This problem was originally solved by Erhard Schmidt ^[1] in the infinite dimensional context of integral operators (although his methods easily generalize to arbitrary compact operators on Hilbert spaces) and later rediscovered by C. Eckart and G. Young.^[2] L. Mirsky generalized the result to arbitrary unitarily invariant norms.^[3] Let

D=U\Sigma V^{\top }\in \mathbb {R} ^{m\times n},\quad m\geq n

be the singular value decomposition of $D$ , where $\Sigma =:\operatorname {diag} (\sigma _{1},\ldots ,\sigma _{m})$ is the $m\times m$ rectangular diagonal matrix with the singular values $\sigma _{1}\geq \ldots \geq \sigma _{m}$ . For a given $r\in \{1,\dots ,m-1\}$ , partition $U$ , $\Sigma$ , and $V$ as follows:

U=:{\begin{bmatrix}U_{1}&U_{2}\end{bmatrix}},\quad \Sigma =:{\begin{bmatrix}\Sigma _{1}&0\\0&\Sigma _{2}\end{bmatrix}},\quad {\text{and}}\quad V=:{\begin{bmatrix}V_{1}&V_{2}\end{bmatrix}},

where $U_{1}$ is $m\times r$ , $\Sigma _{1}$ is $r\times r$ , and $V_{1}$ is $r\times n$ . Then the rank- $r$ matrix, obtained from the truncated singular value decomposition

{\widehat {D}}^{*}=U_{1}\Sigma _{1}V_{1}^{\top },

is such that

\|D-{\widehat {D}}^{*}\|_{\text{F}}=\min _{\operatorname {rank} ({\widehat {D}})\leq r}\|D-{\widehat {D}}\|_{\text{F}}={\sqrt {\sigma _{r+1}^{2}+\cdots +\sigma _{m}^{2}}}.

The minimizer ${\widehat {D}}^{*}$ is unique if and only if $\sigma _{r+1}\neq \sigma _{r}$ .

Proof of Eckart–Young–Mirsky theorem (for spectral norm)

Let $A\in \mathbb {R} ^{m\times n}$ be a real (possibly rectangular) matrix with $m\leq n$ . Suppose that

A=U\Sigma V^{\top }

is the singular value decomposition of $A$ . Recall that $U$ and $V$ are orthogonal matrices, and $\Sigma$ is an $m\times n$ diagonal matrix with entries $(\sigma _{1},\sigma _{2},\cdots ,\sigma _{m})$ such that $\sigma _{1}\geq \sigma _{2}\geq \cdots \geq \sigma _{m}\geq 0$ .

We claim that the best rank- $k$ approximation to $A$ in the spectral norm, denoted by $\|\cdot \|_{2}$ , is given by

A_{k}:=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }

where $u_{i}$ and $v_{i}$ denote the $i$ th column of $U$ and $V$ , respectively.

First, note that we have

\|A-A_{k}\|_{2}=\left\|\sum _{i=1}^{\color {red}{n}}\sigma _{i}u_{i}v_{i}^{\top }-\sum _{i=1}^{\color {red}{k}}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\left\|\sum _{i=\color {red}{k+1}}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{2}=\sigma _{k+1}

Therefore, we need to show that if $B_{k}=XY^{\top }$ where $X$ and $Y$ have $k$ columns then $\|A-A_{k}\|_{2}=\sigma _{k+1}\leq \|A-B_{k}\|_{2}$ .

Since $Y$ has $k$ columns, then there must be a nontrivial linear combination of the first $k+1$ columns of $V$ , i.e.,

w=\gamma _{1}v_{1}+\cdots +\gamma _{k+1}v_{k+1},

such that $Y^{\top }w=0$ . Without loss of generality, we can scale $w$ so that $\|w\|_{2}=1$ or (equivalently) $\gamma _{1}^{2}+\cdots +\gamma _{k+1}^{2}=1$ . Therefore,

\|A-B_{k}\|_{2}^{2}\geq \|(A-B_{k})w\|_{2}^{2}=\|Aw\|_{2}^{2}=\gamma _{1}^{2}\sigma _{1}^{2}+\cdots +\gamma _{k+1}^{2}\sigma _{k+1}^{2}\geq \sigma _{k+1}^{2}.

The result follows by taking the square root of both sides of the above inequality.

Proof of Eckart–Young–Mirsky theorem (for Frobenius norm)

Let $A\in \mathbb {R} ^{m\times n}$ be a real (possibly rectangular) matrix with $m\leq n$ . Suppose that

A=U\Sigma V^{\top }

is the singular value decomposition of $A$ .

We claim that the best rank $k$ approximation to $A$ in the Frobenius norm, denoted by $\|\cdot \|_{F}$ , is given by

A_{k}=\sum _{i=1}^{k}\sigma _{i}u_{i}v_{i}^{\top }

where $u_{i}$ and $v_{i}$ denote the $i$ th column of $U$ and $V$ , respectively.

First, note that we have

\|A-A_{k}\|_{F}^{2}=\left\|\sum _{i=k+1}^{n}\sigma _{i}u_{i}v_{i}^{\top }\right\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}

Therefore, we need to show that if $B_{k}=XY^{\top }$ where $X$ and $Y$ have $k$ columns then

\|A-A_{k}\|_{F}^{2}=\sum _{i=k+1}^{n}\sigma _{i}^{2}\leq \|A-B_{k}\|_{F}^{2}.

By the triangle inequality with the spectral norm, if $A=A'+A''$ then $\sigma _{1}(A)\leq \sigma _{1}(A')+\sigma _{1}(A'')$ . Suppose $A'_{k}$ and $A''_{k}$ respectively denote the rank $k$ approximation to $A'$ and $A''$ by SVD method described above. Then, for any $i,j\geq 1$

{\begin{aligned}\sigma _{i}(A')+\sigma _{j}(A'')&=\sigma _{1}(A'-A'_{i-1})+\sigma _{1}(A''-A''_{j-1})\\&\geq \sigma _{1}(A-A'_{i-1}-A''_{j-1})\\&\geq \sigma _{1}(A-A_{i+j-2})\qquad ({\text{since }}{\rm {rank}}(A'_{i-1}+A''_{j-1})\leq i+j-2))\\&=\sigma _{i+j-1}(A).\end{aligned}}

Since $\sigma _{k+1}(B_{k})=0$ , when $A'=A-B_{k}$ and $A''=B_{k}$ we conclude that for $i\geq 1,j=k+1$

\sigma _{i}(A-B_{k})\geq \sigma _{k+i}(A).

Therefore,

\|A-B_{k}\|_{F}^{2}=\sum _{i=1}^{n}\sigma _{i}(A-B_{k})^{2}\geq \sum _{i=k+1}^{n}\sigma _{i}(A)^{2}=\|A-A_{k}\|_{F}^{2},

as required.

Weighted low-rank approximation problems

The Frobenius norm weights uniformly all elements of the approximation error $D-{\widehat {D}}$ . Prior knowledge about distribution of the errors can be taken into account by considering the weighted low-rank approximation problem

{\text{minimize}}\quad {\text{over }}{\widehat {D}}\quad \operatorname {vec} (D-{\widehat {D}})^{\top }W\operatorname {vec} (D-{\widehat {D}})\quad {\text{subject to}}\quad \operatorname {rank} ({\widehat {D}})\leq r,

where ${\text{vec}}(A)$ vectorizes the matrix $A$ column wise and $W$ is a given positive (semi)definite weight matrix.

The general weighted low-rank approximation problem does not admit an analytic solution in terms of the singular value decomposition and is solved by local optimization methods, which provide no guarantee that a globally optimal solution is found.

In case of uncorrelated weights, weighted low-rank approximation problem also can be formulated in this way:^[4]^[5] for a non-negative matrix $W$ and a matrix $A$ we want to minimize $\sum _{i,j}(W_{i,j}(A_{i,j}-B_{i,j}))^{2}$ over matrices, $B$ , of rank at most $r$ .

Entry-wise L_p low-rank approximation problems

Let $\|A\|_{p}=\left(\sum _{i,j}|A_{i,j}^{p}|\right)^{1/p}$ . For $p=2$ , the fastest algorithm runs in $nnz(A)+n\cdot poly(k/\epsilon )$ time.^[6]^[7] One of the important ideas been used is called Oblivious Subspace Embedding (OSE), it is first proposed by Sarlos.^[8]

For $p=1$ , it is known that this entry-wise L1 norm is more robust than the Frobenius norm in the presence of outliers and is indicated in models where Gaussian assumptions on the noise may not apply. It is natural to seek to minimize $\|B-A\|_{1}$ .^[9] For $p=0$ and $p\geq 1$ , there are some algorithms with provable guarantees.^[10]^[11]

Distance low-rank approximation problem

Let $P=\{p_{1},\ldots ,p_{m}\}$ and $Q=\{q_{1},\ldots ,q_{n}\}$ be two point sets in an arbitrary metric space. Let $A$ represent the $m\times n$ matrix where $A_{i,j}=dist(p_{i},q_{i})$ . Such distances matrices are commonly computed in software packages and have applications to learning image manifolds, handwriting recognition, and multi-dimensional unfolding. In an attempt to reduce their description size,^[12]^[13] one can study low rank approximation of such matrices.

Distributed/Streaming low-rank approximation problem

The low-rank approximation problems in the distributed and streaming setting has been considered in.^[14]

Image and kernel representations of the rank constraints

Using the equivalences

\operatorname {rank} ({\widehat {D}})\leq r\quad \iff \quad {\text{there are }}P\in \mathbb {R} ^{m\times r}{\text{ and }}L\in \mathbb {R} ^{r\times n}{\text{ such that }}{\widehat {D}}=PL

and

\operatorname {rank} ({\widehat {D}})\leq r\quad \iff \quad {\text{there is full row rank }}R\in \mathbb {R} ^{m-r\times m}{\text{ such that }}R{\widehat {D}}=0

the weighted low-rank approximation problem becomes equivalent to the parameter optimization problems

{\text{minimize}}\quad {\text{over }}{\widehat {D}},P{\text{ and }}L\quad \operatorname {vec} ^{\top }(D-{\widehat {D}})W\operatorname {vec} (D-{\widehat {D}})\quad {\text{subject to}}\quad {\widehat {D}}=PL

and

{\text{minimize}}\quad {\text{over }}{\widehat {D}}{\text{ and }}R\quad \operatorname {vec} ^{\top }(D-{\widehat {D}})W\operatorname {vec} (D-{\widehat {D}})\quad {\text{subject to}}\quad R{\widehat {D}}=0\quad {\text{and}}\quad RR^{\top }=I_{r},

where $I_{r}$ is the identity matrix of size $r$ .

Alternating projections algorithm

The image representation of the rank constraint suggests a parameter optimization method in which the cost function is minimized alternatively over one of the variables ( $P$ or $L$ ) with the other one fixed. Although simultaneous minimization over both $P$ and $L$ is a difficult biconvex optimization problem, minimization over one of the variables alone is a linear least squares problem and can be solved globally and efficiently.

The resulting optimization algorithm (called alternating projections) is globally convergent with a linear convergence rate to a locally optimal solution of the weighted low-rank approximation problem. Starting value for the $P$ (or $L$ ) parameter should be given. The iteration is stopped when a user defined convergence condition is satisfied.

Matlab implementation of the alternating projections algorithm for weighted low-rank approximation:

function[dh, f] = wlra_ap(d, w, p, tol, maxiter)[m,n]=size(d);r=size(p,2);f=inf;fori=2:maxiter% minimization over Lbp=kron(eye(n),p);vl=(bp'*w*bp)\bp'*w*d(:);l=reshape(vl,r,n);% minimization over Pbl=kron(l',eye(m));vp=(bl'*w*bl)\bl'*w*d(:);p=reshape(vp,m,r);% check exit conditiondh=p*l;dd=d-dh;f(i)=dd(:)'*w*dd(:);ifabs(f(i-1)-f(i))<tol,break,endendfor

Variable projections algorithm

The alternating projections algorithm exploits the fact that the low rank approximation problem, parameterized in the image form, is bilinear in the variables $P$ or $L$ . The bilinear nature of the problem is effectively used in an alternative approach, called variable projections.^[15]

Consider again the weighted low rank approximation problem, parameterized in the image form. Minimization with respect to the $L$ variable (a linear least squares problem) leads to the closed form expression of the approximation error as a function of $P$

f(P)={\sqrt {\operatorname {vec} ^{\top }(D){\Big (}W-W(I_{n}\otimes P){\big (}(I_{n}\otimes P)^{\top }W(I_{n}\otimes P){\big )}^{-1}(I_{n}\otimes P)^{\top }W{\Big )}\operatorname {vec} (D)}}.

The original problem is therefore equivalent to the nonlinear least squares problem of minimizing $f(P)$ with respect to $P$ . For this purpose standard optimization methods, e.g. the Levenberg-Marquardt algorithm can be used.

Matlab implementation of the variable projections algorithm for weighted low-rank approximation:

function[dh, f] = wlra_varpro(d, w, p, tol, maxiter)prob=optimset();prob.solver='lsqnonlin';prob.options=optimset('MaxIter',maxiter,'TolFun',tol);prob.x0=p;prob.objective=@(p)cost_fun(p,d,w);[p,f]=lsqnonlin(prob);[f,vl]=cost_fun(p,d,w);dh=p*reshape(vl,size(p,2),size(d,2));function[f, vl] = cost_fun(p, d, w)bp=kron(eye(size(d,2)),p);vl=(bp'*w*bp)\bp'*w*d(:);f=d(:)'*w*(d(:)-bp*vl);

The variable projections approach can be applied also to low rank approximation problems parameterized in the kernel form. The method is effective when the number of eliminated variables is much larger than the number of optimization variables left at the stage of the nonlinear least squares minimization. Such problems occur in system identification, parameterized in the kernel form, where the eliminated variables are the approximating trajectory and the remaining variables are the model parameters. In the context of linear time-invariant systems, the elimination step is equivalent to Kalman smoothing.

A Variant: convex-restricted low rank approximation

Usually, we want our new solution not only to be of low rank, but also satisfy other convex constraints due to application requirements. Our interested problem would be as follows,

{\text{minimize}}\quad {\text{over }}{\widehat {p}}\quad \|p-{\widehat {p}}\|\quad {\text{subject to}}\quad \operatorname {rank} {\big (}{\mathcal {S}}({\widehat {p}}){\big )}\leq r{\text{ and }}g({\widehat {p}})\leq 0

This problem has many real world applications, including to recover a good solution from an inexact (semidefinite programming) relaxation. If additional constraint $g({\widehat {p}})\leq 0$ is linear, like we require all elements to be nonnegative, the problem is called structured low rank approximation.^[16] The more general form is named convex-restricted low rank approximation.

This problem is helpful in solving many problems. However, it is challenging due to the combination of the convex and nonconvex (low-rank) constraints. Different techniques were developed based on different realizations of $g({\widehat {p}})\leq 0$ . However, the Alternating Direction Method of Multipliers (ADMM) can be applied to solve the nonconvex problem with convex objective function, rank constraints and other convex constraints,^[17] and is thus suitable to solve our above problem. Moreover, unlike the general nonconvex problems, ADMM will guarantee to converge a feasible solution as long as its dual variable converges in the iterations.

Related Research Articles

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed.

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

In mathematics, and in particular linear algebra, the Moore–Penrose inverse⁠ $⁠$ of a matrix ⁠ $⁠$ , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

In mathematics, in particular functional analysis, the singular values of a compact operator $acting between Hilbert spaces and, are the square roots of the eigenvalues of the self-adjoint operator .$

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

In the field of mathematics, norms are defined for elements within a vector space. Specifically, when the vector space comprises matrices, such norms are referred to as matrix norms. Matrix norms differ from vector norms in that they must also interact with matrix multiplication.

In statistics, Cook's distance or Cook's D is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ways: to indicate influential data points that are particularly worth checking for validity; or to indicate regions of the design space where it would be good to be able to obtain more data points. It is named after the American statistician R. Dennis Cook, who introduced the concept in 1977.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation of the current parental individuals, usually in a stochastic way. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, individuals with better and better -values are generated over the generation sequence.$

In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let X₁, ..., X_n be independent Bernoulli random variables taking values +1 and −1 with probability 1/2, then for every positive $,$

Stress majorization is an optimization strategy used in multidimensional scaling (MDS) where, for a set of -dimensional data items, a configuration of $points in -dimensional space is sought that minimizes the so-called stress function . Usually is or, i.e. the matrix lists points in or dimensional Euclidean space so that the result may be visualised. The function is a cost or loss function that measures the squared differences between ideal distances and actual distances in r -dimensional space. It is defined as:$

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Uniform convergence in probability is a form of convergence in probability in statistical asymptotic theory and probability theory. It means that, under certain conditions, the empirical frequencies of all events in a certain event-family converge to their theoretical probabilities. Uniform convergence in probability has applications to statistics as well as machine learning as part of statistical learning theory.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

In statistics, the Innovation Method provides an estimator for the parameters of stochastic differential equations given a time series of observations of the state variables. In the framework of continuous-discrete state space models, the innovation estimator is obtained by maximizing the log-likelihood of the corresponding discrete-time innovation process with respect to the parameters. The innovation estimator can be classified as a M-estimator, a quasi-maximum likelihood estimator or a prediction error estimator depending on the inferential considerations that want to be emphasized. The innovation method is a system identification technique for developing mathematical models of dynamical systems from measured data and for the optimal design of experiments.

References

↑ E. Schmidt, Zur Theorie der linearen und nichtlinearen Integralgleichungen, Math. Annalen 63 (1907), 433-476. doi : 10.1007/BF01449770
↑ C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. doi : 10.1007/BF02288367
↑ L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Q.J. Math. 11 (1960), 50-59. doi : 10.1093/qmath/11.1.50
↑ Srebro, Nathan; Jaakkola, Tommi (2003). Weighted Low-Rank Approximations (PDF). ICML'03.
↑ Razenshteyn, Ilya; Song, Zhao; Woodruff, David P. (2016). Weighted Low Rank Approximations with Provable Guarantees. STOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing.
↑ Clarkson, Kenneth L.; Woodruff, David P. (2013). Low Rank Approximation and Regression in Input Sparsity Time. STOC '13 Proceedings of the forty-fifth annual ACM symposium on Theory of Computing. arXiv: 1207.6365 .
↑ Nelson, Jelani; Nguyen, Huy L. (2013). OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. FOCS '13. arXiv: 1211.1002 .
↑ Sarlos, Tamas (2006). Improved approximation algorithms for large matrices via random projections. FOCS'06.
↑ Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). Low Rank Approximation with Entrywise L1-Norm Error. STOC '17 Proceedings of the forty-ninth annual ACM symposium on Theory of Computing. arXiv: 1611.00898 .
↑ Bringmann, Karl; Kolev, Pavel; Woodruff, David P. (2017). Approximation Algorithms for L0-Low Rank Approximation. NIPS'17. arXiv: 1710.11253 .
↑ Chierichetti, Flavio; Gollapudi, Sreenivas; Kumar, Ravi; Lattanzi, Silvio; Panigrahy, Rina; Woodruff, David P. (2017). Algorithms for Lp Low-Rank Approximation. ICML'17. arXiv: 1705.06730 .
↑ Bakshi, Ainesh L.; Woodruff, David P. (2018). Sublinear Time Low-Rank Approximation of Distance Matrices. NeurIPS. arXiv: 1809.06986 .
↑ Indyk, Piotr; Vakilian, Ali; Wagner, Tal; Woodruff, David P. (2019). Sample-Optimal Low-Rank Approximation of Distance Matrices. COLT.
↑ Boutsidis, Christos; Woodruff, David P.; Zhong, Peilin (2016). Optimal Principal Component Analysis in Distributed and Streaming Models. STOC. arXiv: 1504.06729 .
↑ G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Institute of Physics, Inverse Problems, Volume 19, 2003, Pages 1-26.
↑ Chu, Moody T.; Funderlic, Robert E.; Plemmons, Robert J. (2003). "structured low-rank approximation". Linear Algebra and Its Applications. 366: 157–172. doi: 10.1016/S0024-3795(02)00505-0 .
↑ "A General System for Heuristic Solution of Convex Problems over Nonconvex Sets" (PDF).

M. T. Chu, R. E. Funderlic, R. J. Plemmons, Structured low-rank approximation, Linear Algebra and its Applications, Volume 366, 1 June 2003, Pages 157–172 doi : 10.1016/S0024-3795(02)00505-0

External links

C++ package for structured-low rank approximation

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[ES-1] E. Schmidt, Zur Theorie der linearen und nichtlinearen Integralgleichungen, Math. Annalen 63 (1907), 433-476. doi : 10.1007/BF01449770

[EYM-thm-2] C. Eckart, G. Young, The approximation of one matrix by another of lower rank. Psychometrika, Volume 1, 1936, Pages 211–8. doi : 10.1007/BF02288367

[LM-3] L. Mirsky, Symmetric gauge functions and unitarily invariant norms, Q.J. Math. 11 (1960), 50-59. doi : 10.1093/qmath/11.1.50

[4] Srebro, Nathan; Jaakkola, Tommi (2003). Weighted Low-Rank Approximations (PDF). ICML'03.

[5] Razenshteyn, Ilya; Song, Zhao; Woodruff, David P. (2016). Weighted Low Rank Approximations with Provable Guarantees. STOC '16 Proceedings of the forty-eighth annual ACM symposium on Theory of Computing.

[6] Clarkson, Kenneth L.; Woodruff, David P. (2013). Low Rank Approximation and Regression in Input Sparsity Time. STOC '13 Proceedings of the forty-fifth annual ACM symposium on Theory of Computing. arXiv: 1207.6365 .

[7] Nelson, Jelani; Nguyen, Huy L. (2013). OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. FOCS '13. arXiv: 1211.1002 .

[8] Sarlos, Tamas (2006). Improved approximation algorithms for large matrices via random projections. FOCS'06.

[9] Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). Low Rank Approximation with Entrywise L1-Norm Error. STOC '17 Proceedings of the forty-ninth annual ACM symposium on Theory of Computing. arXiv: 1611.00898 .

[10] Bringmann, Karl; Kolev, Pavel; Woodruff, David P. (2017). Approximation Algorithms for L0-Low Rank Approximation. NIPS'17. arXiv: 1710.11253 .

[11] Chierichetti, Flavio; Gollapudi, Sreenivas; Kumar, Ravi; Lattanzi, Silvio; Panigrahy, Rina; Woodruff, David P. (2017). Algorithms for Lp Low-Rank Approximation. ICML'17. arXiv: 1705.06730 .

[12] Bakshi, Ainesh L.; Woodruff, David P. (2018). Sublinear Time Low-Rank Approximation of Distance Matrices. NeurIPS. arXiv: 1809.06986 .

[13] Indyk, Piotr; Vakilian, Ali; Wagner, Tal; Woodruff, David P. (2019). Sample-Optimal Low-Rank Approximation of Distance Matrices. COLT.

[14] Boutsidis, Christos; Woodruff, David P.; Zhong, Peilin (2016). Optimal Principal Component Analysis in Distributed and Streaming Models. STOC. arXiv: 1504.06729 .

[15] G. Golub and V. Pereyra, Separable nonlinear least squares: the variable projection method and its applications, Institute of Physics, Inverse Problems, Volume 19, 2003, Pages 1-26.

[16] Chu, Moody T.; Funderlic, Robert E.; Plemmons, Robert J. (2003). "structured low-rank approximation". Linear Algebra and Its Applications. 366: 157–172. doi: 10.1016/S0024-3795(02)00505-0 .

[17] "A General System for Heuristic Solution of Convex Problems over Nonconvex Sets" (PDF).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]