Robust principal component analysis

Last updated

Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L0 from highly corrupted measurements M = L0 +S0. [1] This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), [1] Stable PCP, [2] Quantized PCP, [3] Block based PCP, [4] and Local PCP. [5] Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM [6] ), Alternating Direction Method (ADM [7] ), Fast Alternating Minimization (FAM [8] ), Iteratively Reweighted Least Squares (IRLS [9] [10] [11] ) or alternating projections (AP [12] [13] [14] ).

Contents

Algorithms

Non-convex method

The 2014 guaranteed algorithm for the robust PCA problem (with the input matrix being ) is an alternating minimization type algorithm. [12] The computational complexity is where the input is the superposition of a low-rank (of rank ) and a sparse matrix of dimension and is the desired accuracy of the recovered solution, i.e., where is the true low-rank component and is the estimated or recovered low-rank component. Intuitively, this algorithm performs projections of the residual onto the set of low-rank matrices (via the SVD operation) and sparse matrices (via entry-wise hard thresholding) in an alternating manner - that is, low-rank projection of the difference the input matrix and the sparse matrix obtained at a given iteration followed by sparse projection of the difference of the input matrix and the low-rank matrix obtained in the previous step, and iterating the two steps until convergence.

This alternating projections algorithm is later improved by an accelerated version, coined AccAltProj. [13] The acceleration is achieved by applying a tangent space projection before projecting the residue onto the set of low-rank matrices. This trick improves the computational complexity to with a much smaller constant in front while it maintains the theoretically guaranteed linear convergence.

Another fast version of accelerated alternating projections algorithm is IRCUR. [14] It uses the structure of CUR decomposition in alternating projections framework to dramatically reduces the computational complexity of RPCA to

Convex relaxation

This method consists of relaxing the rank constraint in the optimization problem to the nuclear norm and the sparsity constraint to -norm . The resulting program can be solved using methods such as the method of Augmented Lagrange Multipliers.

Deep-learning augmented method

Some recent works propose RPCA algorithms with learnable/training parameters. [15] Such a learnable/trainable algorithm can be unfolded as a deep neural network whose parameters can be learned via machine learning techniques from a given dataset or problem distribution. The learned algorithm will have superior performance on the corresponding problem distribution.

Applications

RPCA has many real life important applications particularly when the data under study can naturally be modeled as a low-rank plus a sparse contribution. Following examples are inspired by contemporary challenges in computer science, and depending on the applications, either the low-rank component or the sparse component could be the object of interest:

Video surveillance

Given a sequence of surveillance video frames, it is often required to identify the activities that stand out from the background. If we stack the video frames as columns of a matrix M, then the low-rank component L0 naturally corresponds to the stationary background and the sparse component S0 captures the moving objects in the foreground. [1] [16]

Face recognition

Images of a convex, Lambertian surface under varying illuminations span a low-dimensional subspace. [17] This is one of the reasons for effectiveness of low-dimensional models for imagery data. In particular, it is easy to approximate images of a human's face by a low-dimensional subspace. To be able to correctly retrieve this subspace is crucial in many applications such as face recognition and alignment. It turns out that RPCA can be applied successfully to this problem to exactly recover the face. [1]

See also

Surveys

Books, journals and workshops

Books

Journals

Workshops

Sessions

Resources and libraries

Websites

Libraries

The LRS Library (developed by Andrews Sobral) provides a collection of low-rank and sparse decomposition algorithms in MATLAB. The library was designed for moving object detection in videos, but it can be also used for other computer vision / machine learning tasks. Currently the LRSLibrary offers more than 100 algorithms based on matrix and tensor methods.

Related Research Articles

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors plus "error" terms, hence factor analysis can be thought of as a special case of errors-in-variables models.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Projection of data onto lower-dimensional manifolds

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

Non-negative matrix factorization, also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

<span class="mw-page-title-main">Matching pursuit</span> Multidimensional data algorithm

Matching pursuit (MP) is a sparse approximation algorithm which finds the "best matching" projections of multidimensional data onto the span of an over-complete dictionary . The basic idea is to approximately represent a signal from Hilbert space as a weighted sum of finitely many functions taken from . An approximation with atoms has the form

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

<span class="mw-page-title-main">Singular spectrum analysis</span> Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

Sparse principal component analysis is a technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables.

<span class="mw-page-title-main">Multilinear subspace learning</span> Approach to dimensionality reduction

Multilinear subspace learning is an approach for disentangling the causal factor of data formation and performing dimensionality reduction. The Dimensionality reduction can be performed on a data tensor that contains a collection of observations have been vectorized, or observations that are treated as matrices and concatenated into a data tensor. Here are some examples of data tensors whose observations are vectorized or whose observations are matrices concatenated into data tensor images (2D/3D), video sequences (3D/4D), and hyperspectral cubes (3D/4D).

Multilinear principal component analysis (MPCA) is a multilinear extension of principal component analysis (PCA) that is used to analyze M-way arrays, also informally referred to as "data tensors". M-way arrays may be modeled by linear tensor models, such as CANDECOMP/Parafac, or by multilinear tensor models, such as multilinear principal component analysis (MPCA) or multilinear independent component analysis (MICA). The origin of MPCA can be traced back to the tensor rank decomposition introduced by Frank Lauren Hitchcock in 1927; to the Tucker decomposition; and to Peter Kroonenberg's "3-mode PCA" work. In 2000, De Lathauwer et al. restated Tucker and Kroonenberg's work in clear and concise numerical computational terms in their SIAM paper entitled "Multilinear Singular Value Decomposition", (HOSVD) and in their paper "On the Best Rank-1 and Rank-(R1, R2, ..., RN ) Approximation of Higher-order Tensors".

In mathematics, low-rank approximation refers to the process of approximating a given matrix by a matrix of lower rank. More precisely, it is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry represents the rating of movie by customer , if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.

A CUR matrix approximation is a set of three matrices that, when multiplied together, closely approximate a given matrix. A CUR approximation can be used in the same way as the low-rank approximation of the singular value decomposition (SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix :

Foreground detection is one of the major tasks in the field of computer vision and image processing whose aim is to detect changes in image sequences. Background subtraction is any technique which allows an image's foreground to be extracted for further processing.

In computer vision, rigid motion segmentation is the process of separating regions, features, or trajectories from a video sequence into coherent subsets of space and time. These subsets correspond to independent rigidly moving objects in the scene. The goal of this segmentation is to differentiate and extract the meaningful rigid motion from the background and analyze it. Image segmentation techniques labels the pixels to be a part of pixels with certain characteristics at a particular time. Here, the pixels are segmented depending on its relative movement over a period of time i.e. the time of the video sequence.

Sparse dictionary learning is a representation learning method which aims to find a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms, and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than any one of the signals being observed. These two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal, but also provide an improvement in sparsity and flexibility of the representation.

<span class="mw-page-title-main">L1-norm principal component analysis</span>

L1-norm principal component analysis (L1-PCA) is a general method for multivariate data analysis. L1-PCA is often preferred over standard L2-norm principal component analysis (PCA) when the analyzed data may contain outliers, as it is believed to be robust.

In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation . Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of and to transform into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur algorithm. It remains a standard approach for solving Sylvester equations when is of small to moderate size.

Namrata Vaswani is an Indian-American electrical engineer known for her research in compressed sensing, robust principal component analysis, signal processing, statistical learning theory, and computer vision. She is a Joseph and Elizabeth Anderlik Professor in Electrical and Computer Engineering at Iowa State University, and a professor of mathematics at Iowa State.

References

  1. 1 2 3 4 Emmanuel J. Candes; Xiaodong Li; Yi Ma; John Wright (2009). "Robust Principal Component Analysis?". Journal of the ACM. 58 (3): 1–37. doi:10.1145/1970392.1970395. S2CID   7128002.
  2. J. Wright; Y. Peng; Y. Ma; A. Ganesh; S. Rao (2009). "Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices by Convex Optimization". Neural Information Processing Systems, NIPS 2009.
  3. S. Becker; E. Candes, M. Grant (2011). "TFOCS: Flexible First-order Methods for Rank Minimization". Low-rank Matrix Optimization Symposium, SIAM Conference on Optimization.
  4. G. Tang; A. Nehorai (2011). "Robust principal component analysis based on low-rank and block-sparse matrix decomposition". 2011 45th Annual Conference on Information Sciences and Systems. pp. 1–5. doi:10.1109/CISS.2011.5766144. ISBN   978-1-4244-9846-8. S2CID   17079459.
  5. B. Wohlberg; R. Chartrand; J. Theiler (2012). "Local principal component pursuit for nonlinear datasets". 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp. 3925–3928. doi:10.1109/ICASSP.2012.6288776. ISBN   978-1-4673-0046-9. S2CID   2747520.
  6. Z. Lin; M. Chen; L. Wu; Y. Ma (2013). "The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices". Journal of Structural Biology. 181 (2): 116–27. arXiv: 1009.5055 . doi:10.1016/j.jsb.2012.10.010. PMC   3565063 . PMID   23110852.
  7. X. Yuan; J. Yang (2009). "Sparse and Low-Rank Matrix Decomposition via Alternating Direction Methods". Optimization Online.
  8. P. Rodríguez; B. Wohlberg (2013). "Fast principal component pursuit via alternating minimization". 2013 IEEE International Conference on Image Processing. pp. 69–73. doi:10.1109/ICIP.2013.6738015. ISBN   978-1-4799-2341-0. S2CID   5726914.
  9. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Foreground Detection via Robust Low Rank Matrix Decomposition including Spatio-Temporal Constraint". International Workshop on Background Model Challenges, ACCV 2012.
  10. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Foreground Detection via Robust Low Rank Matrix Factorization including Spatial Constraint with Iterative Reweighted Regression". International Conference on Pattern Recognition, ICPR 2012.
  11. C. Guyon; T. Bouwmans; E. Zahzah (2012). "Moving Object Detection via Robust Low Rank Matrix Decomposition with IRLS scheme". International Symposium on Visual Computing, ISVC 2012.
  12. 1 2 P., Netrapalli; U., Niranjan; S., Sanghavi; A., Anandkumar; P., Jain (2014). "Non-convex robust PCA". Advances in Neural Information Processing Systems. 27: 1107–1115. arXiv: 1410.7660 . Bibcode:2014arXiv1410.7660N.
  13. 1 2 Cai, H.; Cai, J.-F.; Wei, K. (2019). "Accelerated alternating projections for robust principal component analysis". The Journal of Machine Learning Research. 20 (1): 685–717. arXiv: 1711.05519 . Bibcode:2017arXiv171105519C.
  14. 1 2 Cai, H.; Hamm, K.; Huang, L.; Li, J.; Wang, T. (2021). "Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank Estimation". IEEE Signal Processing Letters. 28: 116–120. arXiv: 2010.07422 . Bibcode:2021ISPL...28..116C. doi:10.1109/LSP.2020.3044130. S2CID   222378834.
  15. Cai, H.; Liu, J.; Yin, W. (2021). "Learned Robust PCA: A Scalable Deep Unfolding Approach for High-Dimensional Outlier Detection". Advances in Neural Information Processing Systems. 34: 16977–16989. arXiv: 2110.05649 . Bibcode:2021arXiv211005649C.
  16. 1 2 T. Bouwmans; E. Zahzah (2014). "Robust PCA via Principal Component Pursuit: A Review for a Comparative Evaluation in Video Surveillance". Computer Vision and Image Understanding. 122: 22–34. doi:10.1016/j.cviu.2013.11.009.
  17. Basri, Ronen; Jacobs, David W. (2003). "Lambertian Reflectance and Linear Subspaces". IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (2): 218–233. doi:10.1109/TPAMI.2003.1177153.
  18. N. Vaswani; T. Bouwmans; S. Javed; P. Narayanamurthy (2017). "Robust PCA and Robust Subspace Tracking". Preprint. 35 (4): 32–55. arXiv: 1711.09492 . Bibcode:2018ISPM...35d..32V. doi:10.1109/MSP.2018.2826566. S2CID   3691367.
  19. T. Bouwmans; A. Sobral; S. Javed; S. Jung; E. Zahzahg (2015). "Decomposition into Low-rank plus Additive Matrices for Background/Foreground Separation: A Review for a Comparative Evaluation with a Large-Scale Dataset". Computer Science Review. 23: 1–71. arXiv: 1511.01245 . Bibcode:2015arXiv151101245B. doi:10.1016/j.cosrev.2016.11.001. S2CID   10420698.
  20. Z. Lin (2016). "A Review on Low-Rank Models in Data Analysis". Big Data and Information Analytics. 1 (2): 139–161. doi: 10.3934/bdia.2016001 .