Robust principal component analysis

Last updated

Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L0 from highly corrupted measurements M = L0 +S0. [1] This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), [1] Stable PCP, [2] Quantized PCP, [3] Block based PCP, [4] and Local PCP. [5] Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM [6] ), Alternating Direction Method (ADM [7] ), Fast Alternating Minimization (FAM [8] ), Iteratively Reweighted Least Squares (IRLS [9] [10] [11] ) or alternating projections (AP [12] [13] [14] ).

Contents

Algorithms

Non-convex method

The 2014 guaranteed algorithm for the robust PCA problem (with the input matrix being ) is an alternating minimization type algorithm. [12] The computational complexity is where the input is the superposition of a low-rank (of rank ) and a sparse matrix of dimension and is the desired accuracy of the recovered solution, i.e., where is the true low-rank component and is the estimated or recovered low-rank component. Intuitively, this algorithm performs projections of the residual onto the set of low-rank matrices (via the SVD operation) and sparse matrices (via entry-wise hard thresholding) in an alternating manner - that is, low-rank projection of the difference the input matrix and the sparse matrix obtained at a given iteration followed by sparse projection of the difference of the input matrix and the low-rank matrix obtained in the previous step, and iterating the two steps until convergence.

This alternating projections algorithm is later improved by an accelerated version, coined AccAltProj. [13] The acceleration is achieved by applying a tangent space projection before project the residue onto the set of low-rank matrices. This trick improves the computational complexity to with a much smaller constant in front while it maintains the theoretically guaranteed linear convergence.

Another fast version of accelerated alternating projections algorithm is IRCUR. [14] It uses the structure of CUR decomposition in alternating projections framework to dramatically reduces the computational complexity of RPCA to

Convex relaxation

This method consists of relaxing the rank constraint in the optimization problem to the nuclear norm and the sparsity constraint to -norm . The resulting program can be solved using methods such as the method of Augmented Lagrange Multipliers.

Deep-learning augmented method

Some recent works propose RPCA algorithms with learnable/training parameters. [15] Such a learnable/trainable algorithm can be unfolded as a deep neural network whose parameters can be learned via machine learning techniques from a given dataset or problem distribution. The learned algorithm will have superior performance on the corresponding problem distribution.

Applications

RPCA has many real life important applications particularly when the data under study can naturally be modeled as a low-rank plus a sparse contribution. Following examples are inspired by contemporary challenges in computer science, and depending on the applications, either the low-rank component or the sparse component could be the object of interest:

Video surveillance

Given a sequence of surveillance video frames, it is often required to identify the activities that stand out from the background. If we stack the video frames as columns of a matrix M, then the low-rank component L0 naturally corresponds to the stationary background and the sparse component S0 captures the moving objects in the foreground. [1] [16]

Face recognition

Images of a convex, Lambertian surface under varying illuminations span a low-dimensional subspace. [17] This is one of the reasons for effectiveness of low-dimensional models for imagery data. In particular, it is easy to approximate images of a human's face by a low-dimensional subspace. To be able to correctly retrieve this subspace is crucial in many applications such as face recognition and alignment. It turns out that RPCA can be applied successfully to this problem to exactly recover the face. [1]

See also

Surveys

Books, journals and workshops

Books

Journals

Workshops

Sessions

Resources and libraries

Websites

Libraries

The LRS Library (developed by Andrews Sobral) provides a collection of low-rank and sparse decomposition algorithms in MATLAB. The library was designed for moving object detection in videos, but it can be also used for other computer vision / machine learning tasks. Currently the LRSLibrary offers more than 100 algorithms based on matrix and tensor methods.

Related Research Articles

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Jordan normal form</span> Form of a matrix indicating its eigenvalues and their algebraic multiplicities

In linear algebra, a Jordan normal form, also known as a Jordan canonical form (JCF), is an upper triangular matrix of a particular form called a Jordan matrix representing a linear operator on a finite-dimensional vector space with respect to some basis. Such a matrix has each non-zero off-diagonal entry equal to 1, immediately above the main diagonal, and with identical diagonal entries to the left and below them.

Latent semantic analysis (LSA) is a technique in natural language processing, in particular distributional semantics, of analyzing relationships between a set of documents and the terms they contain by producing a set of concepts related to the documents and terms. LSA assumes that words that are close in meaning will occur in similar pieces of text. A matrix containing word counts per document is constructed from a large piece of text and a mathematical technique called singular value decomposition (SVD) is used to reduce the number of rows while preserving the similarity structure among columns. Documents are then compared by cosine similarity between any two columns. Values close to 1 represent very similar documents while values close to 0 represent very dissimilar documents.

Partial least squares regression is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of maximum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models. Partial least squares discriminant analysis (PLS-DA) is a variant used when the Y is categorical.

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

<span class="mw-page-title-main">Singular spectrum analysis</span> Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

Sparse principal component analysis is a technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables.

In graph theory, the cycle rank of a directed graph is a digraph connectivity measure proposed first by Eggan and Büchi. Intuitively, this concept measures how close a digraph is to a directed acyclic graph (DAG), in the sense that a DAG has cycle rank zero, while a complete digraph of order n with a self-loop at each vertex has cycle rank n. The cycle rank of a directed graph is closely related to the tree-depth of an undirected graph and to the star height of a regular language. It has also found use in sparse matrix computations and logic (Rossman 2008).

<span class="mw-page-title-main">Multilinear subspace learning</span> Approach to dimensionality reduction

Multilinear subspace learning is an approach for disentangling the causal factor of data formation and performing dimensionality reduction. The Dimensionality reduction can be performed on a data tensor that contains a collection of observations have been vectorized, or observations that are treated as matrices and concatenated into a data tensor. Here are some examples of data tensors whose observations are vectorized or whose observations are matrices concatenated into data tensor images (2D/3D), video sequences (3D/4D), and hyperspectral cubes (3D/4D).

Multilinear principal component analysis (MPCA) is a multilinear extension of principal component analysis (PCA). MPCA is employed in the analysis of M-way arrays, i.e. a cube or hyper-cube of numbers, also informally referred to as a "data tensor". M-way arrays may be modeled by

In numerical mathematics, hierarchical matrices (H-matrices) are used as data-sparse approximations of non-sparse matrices. While a sparse matrix of dimension can be represented efficiently in units of storage by storing only its non-zero entries, a non-sparse matrix would require units of storage, and using this type of matrices for large problems would therefore be prohibitively expensive in terms of storage and computing time. Hierarchical matrices provide an approximation requiring only units of storage, where is a parameter controlling the accuracy of the approximation. In typical applications, e.g., when discretizing integral equations, preconditioning the resulting systems of linear equations, or solving elliptic partial differential equations, a rank proportional to with a small constant is sufficient to ensure an accuracy of . Compared to many other data-sparse representations of non-sparse matrices, hierarchical matrices offer a major advantage: the results of matrix arithmetic operations like matrix multiplication, factorization or inversion can be approximated in operations, where

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry represents the rating of movie by customer , if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.

A CUR matrix approximation is a set of three matrices that, when multiplied together, closely approximate a given matrix. A CUR approximation can be used in the same way as the low-rank approximation of the singular value decomposition (SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix :

<span class="mw-page-title-main">Foreground detection</span>

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

<span class="mw-page-title-main">L1-norm principal component analysis</span>

L1-norm principal component analysis (L1-PCA) is a general method for multivariate data analysis. L1-PCA is often preferred over standard L2-norm principal component analysis (PCA) when the analyzed data may contain outliers.

In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation . Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of and to transform into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur algorithm. It remains a standard approach for solving Sylvester equations when is of small to moderate size.

Namrata Vaswani is an Indian-American electrical engineer known for her research in compressed sensing, robust principal component analysis, signal processing, statistical learning theory, and computer vision. She is a Joseph and Elizabeth Anderlik Professor in Electrical and Computer Engineering at Iowa State University, and a professor of mathematics at Iowa State.

References

  1. 1 2 3 4 Emmanuel J. Candes; Xiaodong Li; Yi Ma; John Wright (2009). "Robust Principal Component Analysis?". Journal of the ACM. 58 (3): 1–37. doi:10.1145/1970392.1970395. S2CID   7128002.
  2. J. Wright; Y. Peng; Y. Ma; A. Ganesh; S. Rao (2009). "Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization". Neural Information Processing Systems, NIPS 2009.
  3. S. Becker; E. Candes, M. Grant (2011). "TFOCS: Flexible First-order Methods for Rank Minimization". Low-rank Matrix Optimization Symposium, SIAM Conference on Optimization.
  4. G. Tang; A. Nehorai (2011). "Robust principal component analysis based on low-rank and block-sparse matrix decomposition". 2011 45th Annual Conference on Information Sciences and Systems. pp. 1–5. doi:10.1109/CISS.2011.5766144. ISBN   978-1-4244-9846-8. S2CID   17079459.
  5. B. Wohlberg; R. Chartrand; J. Theiler (2012). "Local principal component pursuit for nonlinear datasets". 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3925–3928. doi:10.1109/ICASSP.2012.6288776. ISBN   978-1-4673-0046-9. S2CID   2747520.
  6. Z. Lin; M. Chen; L. Wu; Y. Ma (2013). "The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices". Journal of Structural Biology. 181 (2): 116–27. arXiv: 1009.5055 . doi:10.1016/j.jsb.2012.10.010. PMC   3565063 . PMID   23110852.
  7. X. Yuan; J. Yang (2009). "Sparse and Low-Rank Matrix Decomposition via Alternating Direction Methods". Optimization Online.
  8. P. Rodríguez; B. Wohlberg (2013). "Fast principal component pursuit via alternating minimization". 2013 IEEE International Conference on Image Processing. pp. 69–73. doi:10.1109/ICIP.2013.6738015. ISBN   978-1-4799-2341-0. S2CID   5726914.
  9. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Foreground Detection via Robust Low Rank Matrix Decomposition including Spatio-Temporal Constraint". International Workshop on Background Model Challenges, ACCV 2012.
  10. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression". International Conference on Pattern Recognition, ICPR 2012.
  11. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Moving Object Detection via Robust Low Rank Matrix Decomposition with IRLS scheme". International Symposium on Visual Computing, ISVC 2012.
  12. 1 2 P., Netrapalli; U., Niranjan; S., Sanghavi; A., Anandkumar; P., Jain (2014). "Non-convex robust PCA". Advances in Neural Information Processing Systems. 27: 1107–1115. arXiv: 1410.7660 . Bibcode:2014arXiv1410.7660N.
  13. 1 2 Cai, H.; Cai, J.-F.; Wei, K. (2019). "Accelerated alternating projections for robust principal component analysis". The Journal of Machine Learning Research. 20 (1): 685–717. arXiv: 1711.05519 . Bibcode:2017arXiv171105519C.
  14. 1 2 Cai, H.; Hamm, K.; Huang, L.; Li, J.; Wang, T. (2021). "Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation". IEEE Signal Processing Letters. 28: 116–120. arXiv: 2010.07422 . Bibcode:2021ISPL...28..116C. doi:10.1109/LSP.2020.3044130. S2CID   222378834.
  15. Cai, H.; Liu, J.; Yin, W. (2021). "Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection". Advances in Neural Information Processing Systems. 34: 16977–16989. arXiv: 2110.05649 . Bibcode:2021arXiv211005649C.
  16. 1 2 T. Bouwmans; E. Zahzah (2014). "Robust PCA via Principal Component Pursuit: A Review for a Comparative Evaluation in Video Surveillance". Computer Vision and Image Understanding. 122: 22–34. doi:10.1016/j.cviu.2013.11.009.
  17. R. Basri; D. Jacobs. "Lambertian reflectance and linear subspaces".{{cite journal}}: Cite journal requires |journal= (help)
  18. N. Vaswani; T. Bouwmans; S. Javed; P. Narayanamurthy (2017). "Robust PCA and Robust Subspace Tracking". Preprint. 35 (4): 32–55. arXiv: 1711.09492 . Bibcode:2018ISPM...35d..32V. doi:10.1109/MSP.2018.2826566. S2CID   3691367.
  19. T. Bouwmans; A. Sobral; S. Javed; S. Jung; E. Zahzahg (2015). "Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset". Computer Science Review. 23: 1–71. arXiv: 1511.01245 . Bibcode:2015arXiv151101245B. doi:10.1016/j.cosrev.2016.11.001. S2CID   10420698.
  20. Z. Lin (2016). "A Review on Low-Rank Models in Data Analysis". Big Data and Information Analytics. 1 (2): 139–161. doi: 10.3934/bdia.2016001 .