CUR matrix approximation

Last updated February 26, 2024

A CUR matrix approximation is a set of three matrices that, when multiplied together, closely approximate a given matrix.^[1]^[2]^[3] A CUR approximation can be used in the same way as the low-rank approximation of the singular value decomposition (SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix (rather than left and right singular vectors):

Formally, a CUR matrix approximation of a matrix A is three matrices C, U, and R such that C is made from columns of A, R is made from rows of A, and that the product CUR closely approximates A. Usually the CUR is selected to be a rank-k approximation, which means that C contains k columns of A, R contains k rows of A, and U is a k-by-k matrix. There are many possible CUR matrix approximations, and many CUR matrix approximations for a given rank.

The CUR matrix approximation is often ^{[ citation needed ]} used in place of the low-rank approximation of the SVD in principal component analysis. The CUR is less accurate, but the columns of the matrix C are taken from A and the rows of R are taken from A. In PCA, each column of A contains a data sample; thus, the matrix C is made of a subset of data samples. This is much easier to interpret than the SVD's left singular vectors, which represent the data in a rotated space. Similarly, the matrix R is made of a subset of variables measured for each data sample. This is easier to comprehend than the SVD's right singular vectors, which are another rotations of the data in space.

Mathematical definition

Hamm and Huang^[4] gives the following theorem describing the basics of a CUR decomposition of a matrix $L$ with rank $r$ :

Theorem: Consider row and column indices $I,J\subseteq [n]$ with $|I|,|J|\geq r$ . Denote submatrices $C=L_{:,J},$ $U=L_{I,J}$ and $R=L_{I,:}$ . If rank( $U$ ) = rank( $L$ ), then $L=CU^{+}R$ , where $(\cdot )^{+}$ denotes the Moore–Penrose pseudoinverse.

In other words, if $L$ has low rank, we can take a sub-matrix $U=L_{I,J}$ of the same rank, together with some rows $R$ and columns $C$ of $L$ and use them to reconstruct $L$ .

Algorithms

The CUR matrix approximation is not unique and there are multiple algorithms for computing one. One is ALGORITHMCUR.^[1]

The "Linear Time CUR" algorithm ^[5] simply picks J by sampling columns randomly (with replacement) with probability proportional to the squared column norms, $\|L_{:,j}\|_{2}^{2}$ ; and similarly sampling I proportional to the squared row norms, $\|L_{i}\|_{2}^{2}$ . The authors show that taking $|J|\approx k/\varepsilon ^{4}$ and $|I|\approx k/\varepsilon ^{2}$ where $0\leq \varepsilon$ , the algorithm achieves Frobenius error bound $\|A-CUR\|_{F}\leq \|A-A_{k}\|_{F}+\varepsilon \|A\|_{F}$ , where $A_{k}$ is the optimal rank k approximation.

Tensor

Tensor-CURT decomposition^[6] is a generalization of matrix-CUR decomposition. Formally, a CURT tensor approximation of a tensor A is three matrices and a (core-)tensor C, R, T and U such that C is made from columns of A, R is made from rows of A, T is made from tubes of A and that the product U(C,R,T) (where the $i,j,l$ -th entry of it is $\sum _{i',j',l'}U_{i',j',l'}C_{i,i'}R_{j,j'}T_{l,l'}$ ) closely approximates A. Usually the CURT is selected to be a rank-k approximation, which means that C contains k columns of A, R contains k rows of A, T contains tubes of A and U is a k-by-k-by-k (core-)tensor.

Related Research Articles

In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. The determinant of a matrix $A$ is commonly denoted $det(A)$ , $det A$ , or $| A |$ . Its value characterizes some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants.

In linear algebra, the rank of a matrix $A$ is the dimension of the vector space generated by its columns. This corresponds to the maximal number of linearly independent columns of $A$ . This, in turn, is identical to the dimension of the vector space spanned by its rows. Rank is thus a measure of the "nondegenerateness" of the system of linear equations and linear transformation encoded by $A$ . There are multiple equivalent definitions of rank. A matrix's rank is one of its most fundamental characteristics.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

In linear algebra, the outer product of two coordinate vectors is the matrix whose entries are all products of an element in the first vector with an element in the second vector. If the two coordinate vectors have dimensions n and m, then their outer product is an n × m matrix. More generally, given two tensors, their outer product is a tensor. The outer product of tensors is also referred to as their tensor product, and can be used to define the tensor algebra.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any $matrix. It is related to the polar decomposition.$

In linear algebra, an $n$ -by- $n$ square matrix $A$ is called invertible if there exists an $n$ -by- $n$ square matrix $B$ such that

In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of problems.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

In mathematics, the polar decomposition of a square real or complex matrix $is a factorization of the form, where is a unitary matrix and is a positive semi-definite Hermitian matrix, both square and of the same size.$

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish astronomer Tadeusz Banachiewicz in 1938. To quote: "It appears that Gauss and Doolittle applied the method [of elimination] only to symmetric equations. More recent authors, for example, Aitken, Banachiewicz, Dwyer, and Crout … have emphasized the use of the method, or variations of it, in connection with non-symmetric problems … Banachiewicz … saw the point … that the basic problem is really one of matrix factorization, or “decomposition” as he called it." It is also sometimes referred to as LR decomposition.

In linear algebra, two-dimensional singular-value decomposition (2DSVD) computes the low-rank approximation of a set of matrices such as 2D images or weather maps in a manner almost identical to SVD which computes the low-rank approximation of a single matrix.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

In multilinear algebra, the tensor rank decomposition or the $decomposition of a tensor is the decomposition of a tensor in terms of a sum of minimum tensors. This is an open problem.$

In multilinear algebra, the higher-order singular value decomposition (HOSVD) of a tensor is a specific orthogonal Tucker decomposition. It may be regarded as one type of generalization of the matrix singular value decomposition. It has applications in computer vision, computer graphics, machine learning, scientific computing, and signal processing. Some aspects can be traced as far back as F. L. Hitchcock in 1928, but it was L. R. Tucker who developed for third-order tensors the general Tucker decomposition in the 1960s, further advocated by L. De Lathauwer et al. in their Multilinear SVD work that employs the power method, or advocated by Vasilescu and Terzopoulos that developed M-mode SVD a parallel algorithm that employs the matrix SVD.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry $represents the rating of movie by customer, if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.$

In applied mathematics, k-SVD is a dictionary learning algorithm for creating a dictionary for sparse representations, via a singular value decomposition approach. k-SVD is a generalization of the k-means clustering method, and it works by iteratively alternating between sparse coding the input data based on the current dictionary, and updating the atoms in the dictionary to better fit the data. It is structurally related to the expectation maximization (EM) algorithm. k-SVD can be found widely in use in applications such as image processing, audio processing, biology, and document analysis.

Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L₀ from highly corrupted measurements M = L₀ +S₀. This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP, Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM), Iteratively Reweighted Least Squares (IRLS ) or alternating projections (AP).

In mathematics, the Hamiltonian cycle polynomial of an n×n-matrix is a polynomial in its entries, defined as

References

1 2 Michael W. Mahoney; Petros Drineas. "CUR matrix decompositions for improved data analysis" . Retrieved 26 June 2012.
↑ Boutsidis, Christos; Woodruff, David P. (2014). Optimal CUR matrix decompositions. STOC '14 Proceedings of the forty-sixth annual ACM symposium on Theory of Computing.
↑ Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). Low Rank Approximation with Entrywise L1-Norm Error. STOC '17 Proceedings of the forty-ninth annual ACM symposium on Theory of Computing. arXiv: 1611.00898 .
↑ Keaton Hamm and Longxiu Huang. Perspectives on CUR decompositions. Applied and Computational Harmonic Analysis, 48(3):1088–1099, 2020.
↑ Drineas, Petros; Kannan, Ravi; Mahoney, Michael W. (2006-01-01). "Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication". SIAM Journal on Computing. 36 (1): 132–157. doi:10.1137/S0097539704442684. ISSN 0097-5397.
↑ Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). "Relative Error Tensor Low Rank Approximation". arXiv: 1704.08246 [cs.DS].

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[mahoney-1] 1 2 Michael W. Mahoney; Petros Drineas. "CUR matrix decompositions for improved data analysis" . Retrieved 26 June 2012.

[2] Boutsidis, Christos; Woodruff, David P. (2014). Optimal CUR matrix decompositions. STOC '14 Proceedings of the forty-sixth annual ACM symposium on Theory of Computing.

[3] Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). Low Rank Approximation with Entrywise L1-Norm Error. STOC '17 Proceedings of the forty-ninth annual ACM symposium on Theory of Computing. arXiv: 1611.00898 .

[4] Keaton Hamm and Longxiu Huang. Perspectives on CUR decompositions. Applied and Computational Harmonic Analysis, 48(3):1088–1099, 2020.

[5] Drineas, Petros; Kannan, Ravi; Mahoney, Michael W. (2006-01-01). "Fast Monte Carlo Algorithms for Matrices I: Approximating Matrix Multiplication". SIAM Journal on Computing. 36 (1): 132–157. doi:10.1137/S0097539704442684. ISSN 0097-5397.

[6] Song, Zhao; Woodruff, David P.; Zhong, Peilin (2017). "Relative Error Tensor Low Rank Approximation". arXiv: 1704.08246 [cs.DS].

[1]

[2]

[3]

[4]

[5]

[6]