Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) is a matrix-free method for finding the largest (or smallest) eigenvalues and the corresponding eigenvectors of a symmetric generalized eigenvalue problem
for a given pair of complex Hermitian or real symmetric matrices, where the matrix is also assumed positive-definite.
Kantorovich in 1948 proposed calculating the smallest eigenvalue of a symmetric matrix by steepest descent using a direction of a scaled gradient of a Rayleigh quotient in a scalar product , with the step size computed by minimizing the Rayleigh quotient in the linear span of the vectors and , i.e. in a locally optimal manner. Samokish [1] proposed applying a preconditioner to the residual vector to generate the preconditioned direction and derived asymptotic, as approaches the eigenvector, convergence rate bounds. D'yakonov suggested [2] spectrally equivalent preconditioning and derived non-asymptotic convergence rate bounds. Block locally optimal multi-step steepest descent for eigenvalue problems was described in. [3] Local minimization of the Rayleigh quotient on the subspace spanned by the current approximation, the current residual and the previous approximation, as well as its block version, appeared in. [4] The preconditioned version was analyzed in [5] and. [6]
The method performs an iterative maximization (or minimization) of the generalized Rayleigh quotient
which results in finding largest (or smallest) eigenpairs of
The direction of the steepest ascent, which is the gradient, of the generalized Rayleigh quotient is positively proportional to the vector
called the eigenvector residual. If a preconditioner is available, it is applied to the residual and gives the vector
called the preconditioned residual. Without preconditioning, we set and so . An iterative method
or, in short,
is known as preconditioned steepest ascent (or descent), where the scalar is called the step size. The optimal step size can be determined by maximizing the Rayleigh quotient, i.e.,
(or in case of minimizing), in which case the method is called locally optimal.
To dramatically accelerate the convergence of the locally optimal preconditioned steepest ascent (or descent), one extra vector can be added to the two-term recurrence relation to make it three-term:
(use in case of minimizing). The maximization/minimization of the Rayleigh quotient in a 3-dimensional subspace can be performed numerically by the Rayleigh–Ritz method. Adding more vectors, see, e.g., Richardson extrapolation, does not result in significant acceleration [8] but increases computation costs, so is not generally recommended.
As the iterations converge, the vectors and become nearly linearly dependent, resulting in a precision loss and making the Rayleigh–Ritz method numerically unstable in the presence of round-off errors. The loss of precision may be avoided by substituting the vector with a vector , that may be further away from , in the basis of the three-dimensional subspace , while keeping the subspace unchanged and avoiding orthogonalization or any other extra operations. [8] Furthermore, orthogonalizing the basis of the three-dimensional subspace may be needed for ill-conditioned eigenvalue problems to improve stability and attainable accuracy.
This is a single-vector version of the LOBPCG method—one of possible generalization of the preconditioned conjugate gradient linear solvers to the case of symmetric eigenvalue problems. [8] Even in the trivial case and the resulting approximation with will be different from that obtained by the Lanczos algorithm, although both approximations will belong to the same Krylov subspace.
Extreme simplicity and high efficiency of the single-vector version of LOBPCG make it attractive for eigenvalue-related applications under severe hardware limitations, ranging from spectral clustering based real-time anomaly detection via graph partitioning on embedded ASIC or FPGA to modelling physical phenomena of record computing complexity on exascale TOP500 supercomputers.
Subsequent eigenpairs can be computed one-by-one via single-vector LOBPCG supplemented with an orthogonal deflation or simultaneously as a block. In the former approach, imprecisions in already computed approximate eigenvectors additively affect the accuracy of the subsequently computed eigenvectors, thus increasing the error with every new computation. Iterating several approximate eigenvectors together in a block in a locally optimal fashion in the block version of the LOBPCG. [8] allows fast, accurate, and robust computation of eigenvectors, including those corresponding to nearly-multiple eigenvalues where the single-vector LOBPCG suffers from slow convergence. The block size can be tuned to balance numerical stability vs. convergence speed vs. computer costs of orthogonalizations and the Rayleigh-Ritz method on every iteration.
The block approach in LOBPCG replaces single-vectors and with block-vectors, i.e. matrices and , where, e.g., every column of approximates one of the eigenvectors. All columns are iterated simultaneously, and the next matrix of approximate eigenvectors is determined by the Rayleigh–Ritz method on the subspace spanned by all columns of matrices and . Each column of is computed simply as the preconditioned residual for every column of The matrix is determined such that the subspaces spanned by the columns of and of are the same.
The outcome of the Rayleigh–Ritz method is determined by the subspace spanned by all columns of matrices and , where a basis of the subspace can theoretically be arbitrary. However, in inexact computer arithmetic the Rayleigh–Ritz method becomes numerically unstable if some of the basis vectors are approximately linearly dependent. Numerical instabilities typically occur, e.g., if some of the eigenvectors in the iterative block already reach attainable accuracy for a given computer precision and are especially prominent in low precision, e.g., single precision.
The art of multiple different implementation of LOBPCG is to ensure numerical stability of the Rayleigh–Ritz method at minimal computing costs by choosing a good basis of the subspace. The arguably most stable approach of making the basis vectors orthogonal, e.g., by the Gram–Schmidt process, is also the most computational expensive. For example, LOBPCG implementations, [9] [10] utilize unstable but efficient Cholesky decomposition of the normal matrix, which is performed only on individual matrices and , rather than on the whole subspace. The constantly increasing amount of computer memory allows typical block sizes nowadays in the range, where the percentage of compute time spend on orthogonalizations and the Rayleigh-Ritz method starts dominating.
Block methods for eigenvalue problems that iterate subspaces commonly have some of the iterative eigenvectors converged faster than others that motivates locking the already converged eigenvectors, i.e., removing them from the iterative loop, in order to eliminate unnecessary computations and improve numerical stability. A simple removal of an eigenvector may likely result in forming its duplicate in still iterating vectors. The fact that the eigenvectors of symmetric eigenvalue problems are pair-wise orthogonal suggest keeping all iterative vectors orthogonal to the locked vectors.
Locking can be implemented differently maintaining numerical accuracy and stability while minimizing the compute costs. For example, LOBPCG implementations, [9] [10] follow, [8] [11] separating hard locking, i.e. a deflation by restriction, where the locked eigenvectors serve as a code input and do not change, from soft locking, where the locked vectors do not participate in the typically most expensive iterative step of computing the residuals, however, fully participate in the Rayleigh—Ritz method and thus are allowed to be changed by the Rayleigh—Ritz method.
LOBPCG includes all columns of matrices and into the Rayleigh–Ritz method resulting in an up to -by- eigenvalue problem needed to solve and up to dot products to compute at every iteration, where denotes the block size — the number of columns. For large block sizes this starts dominating compute and I/O costs and limiting parallelization, where multiple compute devices are running simultaneously.
The original LOBPCG paper [8] describes a modification, called LOBPCG II, to address such a problem running the single-vector version of the LOBPCG method for each desired eigenpair with the Rayleigh-Ritz procedure solving of 3-by-3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all eigenpairs is on every iteration but only on the columns of the matrix , thus reducing the number of the necessary dot products to from and the size of the global projected eigenvalue problem to -by- from -by- on every iteration. Reference [12] goes further applying the LOBPCG algorithm to each approximate eigenvector separately, i.e., running the unblocked version of the LOBPCG method for each desired eigenpair for a fixed number of iterations. The Rayleigh-Ritz procedures in these runs only need to solve a set of 3 × 3 projected eigenvalue problems. The global Rayleigh-Ritz procedure for all desired eigenpairs is only applied periodically at the end of a fixed number of unblocked LOBPCG iterations.
Such modifications may be less robust compared to the original LOBPCG. Individually running branches of the single-vector LOBPCG may not follow continuous iterative paths flipping instead and creating duplicated approximations to the same eigenvector. The single-vector LOBPCG may be unsuitable for clustered eigenvalues, but separate small-block LOBPCG runs require determining their block sizes automatically during the process of iterations since the number of the clusters of eigenvalues and their sizes may be unknown a priori.
LOBPCG by construction is guaranteed [8] to minimize the Rayleigh quotient not slower than the block steepest gradient descent, which has a comprehensive convergence theory. Every eigenvector is a stationary point of the Rayleigh quotient, where the gradient vanishes. Thus, the gradient descent may slow down in a vicinity of any eigenvector, however, it is guaranteed to either converge to the eigenvector with a linear convergence rate or, if this eigenvector is a saddle point, the iterative Rayleigh quotient is more likely to drop down below the corresponding eigenvalue and start converging linearly to the next eigenvalue below. The worst value of the linear convergence rate has been determined [8] and depends on the relative gap between the eigenvalue and the rest of the matrix spectrum and the quality of the preconditioner, if present.
For a general matrix, there is evidently no way to predict the eigenvectors and thus generate the initial approximations that always work well. The iterative solution by LOBPCG may be sensitive to the initial eigenvectors approximations, e.g., taking longer to converge slowing down as passing intermediate eigenpairs. Moreover, in theory, one cannot guarantee convergence necessarily to the smallest eigenpair, although the probability of the miss is zero. A good quality random Gaussian function with the zero mean is commonly the default in LOBPCG to generate the initial approximations. To fix the initial approximations, one can select a fixed seed for the random number generator.
In contrast to the Lanczos method, LOBPCG rarely exhibits asymptotic superlinear convergence in practice.
LOBPCG can be trivially adapted for computing several largest singular values and the corresponding singular vectors (partial SVD), e.g., for iterative computation of PCA, for a data matrix D with zero mean, without explicitly computing the covariance matrix DTD, i.e. in matrix-free fashion. The main calculation is evaluation of a function of the product DT(D X) of the covariance matrix DTD and the block-vector X that iteratively approximates the desired singular vectors. PCA needs the largest eigenvalues of the covariance matrix, while LOBPCG is typically implemented to calculate the smallest ones. A simple work-around is to negate the function, substituting -DT(D X) for DT(D X) and thus reversing the order of the eigenvalues, since LOBPCG does not care if the matrix of the eigenvalue problem is positive definite or not. [9]
LOBPCG for PCA and SVD is implemented in SciPy since revision 1.4.0 [13]
LOBPCG's inventor, Andrew Knyazev, published a reference implementation called Block Locally Optimal Preconditioned Eigenvalue Xolvers (BLOPEX) [14] [15] with interfaces to PETSc, hypre, and Parallel Hierarchical Adaptive MultiLevel method (PHAML). [16] Other implementations are available in, e.g., GNU Octave, [17] MATLAB (including for distributed or tiling arrays), [9] Java, [18] Anasazi (Trilinos), [19] SLEPc, [20] [21] SciPy , [10] Julia, [22] MAGMA, [23] Pytorch, [24] Rust, [25] OpenMP and OpenACC, [26] CuPy (A NumPy-compatible array library accelerated by CUDA), [27] Google JAX, [28] and NVIDIA AMGX. [29] LOBPCG is implemented, [30] but not included, in TensorFlow.
Software packages scikit-learn and Megaman [31] use LOBPCG to scale spectral clustering [32] and manifold learning [33] via Laplacian eigenmaps to large data sets. NVIDIA has implemented [34] LOBPCG in its nvGRAPH library introduced in CUDA 8. Sphynx, [35] a hybrid distributed- and shared-memory-enabled parallel graph partitioner - the first graph partitioning tool that works on GPUs on distributed-memory settings - uses spectral clustering for graph partitioning, computing eigenvectors on the Laplacian matrix of the graph using LOBPCG from the Anasazi package.
LOBPCG is implemented in ABINIT [36] (including CUDA version) and Octopus. [37] It has been used for multi-billion size matrices by Gordon Bell Prize finalists, on the Earth Simulator supercomputer in Japan. [38] [39] Hubbard model for strongly-correlated electron systems to understand the mechanism behind the superconductivity uses LOBPCG to calculate the ground state of the Hamiltonian on the K computer [40] and multi-GPU systems. [41]
There are MATLAB [42] and Julia [43] [44] versions of LOBPCG for Kohn-Sham equations and density functional theory (DFT) using the plane wave basis. Recent implementations include TTPY, [45] Platypus‐QM, [46] MFDn, [47] ACE-Molecule, [48] LACONIC. [49]
LOBPCG from BLOPEX is used for preconditioner setup in Multilevel Balancing Domain Decomposition by Constraints (BDDC) solver library BDDCML, which is incorporated into OpenFTL (Open Finite element Template Library) and Flow123d simulator of underground water flow, solute and heat transport in fractured porous media. LOBPCG has been implemented [50] in LS-DYNA and indirectly in ANSYS. [51]
LOBPCG is one of core eigenvalue solvers in PYFEMax and high performance multiphysics finite element software Netgen/NGSolve. LOBPCG from hypre is incorporated into open source lightweight scalable C++ library for finite element methods MFEM, which is used in many projects, including BLAST, XBraid, VisIt, xSDK, the FASTMath institute in SciDAC, and the co-design Center for Efficient Exascale Discretizations (CEED) in the Exascale computing Project.
Iterative LOBPCG-based approximate low-pass filter can be used for denoising; see, [52] e.g., to accelerate total variation denoising.
Image segmentation via spectral clustering performs a low-dimension embedding using an affinity matrix between pixels, followed by clustering of the components of the eigenvectors in the low dimensional space, e.g., using the graph Laplacian for the bilateral filter. Image segmentation via spectral graph partitioning by LOBPCG with multigrid preconditioning has been first proposed in [53] and actually tested in [54] and. [55] The latter approach has been later implemented in Python scikit-learn [56] that uses LOBPCG from SciPy with algebraic multigrid preconditioning for solving the eigenvalue problem for the graph Laplacian.
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
In mathematics, a Hermitian matrix is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the i-th row and j-th column is equal to the complex conjugate of the element in the j-th row and i-th column, for all indices i and j:
Rayleigh quotient iteration is an eigenvalue algorithm which extends the idea of the inverse iteration by using the Rayleigh quotient to obtain increasingly accurate eigenvalue estimates.
In numerical analysis, inverse iteration is an iterative eigenvalue algorithm. It allows one to find an approximate eigenvector when an approximation to a corresponding eigenvalue is already known. The method is conceptually similar to the power method. It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics.
In numerical linear algebra, the QR algorithm or QR iteration is an eigenvalue algorithm: that is, a procedure to calculate the eigenvalues and eigenvectors of a matrix. The QR algorithm was developed in the late 1950s by John G. F. Francis and by Vera N. Kublanovskaya, working independently. The basic idea is to perform a QR decomposition, writing the matrix as a product of an orthogonal matrix and an upper triangular matrix, multiply the factors in the reverse order, and iterate.
In numerical linear algebra, the Arnoldi iteration is an eigenvalue algorithm and an important example of an iterative method. Arnoldi finds an approximation to the eigenvalues and eigenvectors of general matrices by constructing an orthonormal basis of the Krylov subspace, which makes it particularly useful when dealing with large sparse matrices.
In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.
In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron and Georg Frobenius, asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of American football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.
In numerical analysis, a multigrid method is an algorithm for solving differential equations using a hierarchy of discretizations. They are an example of a class of techniques called multiresolution methods, very useful in problems exhibiting multiple scales of behavior. For example, many basic relaxation methods exhibit different rates of convergence for short- and long-wavelength components, suggesting these different scales be treated differently, as in a Fourier analysis approach to multigrid. MG methods can be used as solvers as well as preconditioners.
The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.
In linear algebra, it is often important to know which vectors have their directions unchanged by a given linear transformation. An eigenvector or characteristic vector is such a vector. More precisely, an eigenvector of a linear transformation is scaled by a constant factor when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .
The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the "most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.
In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing a condition number of the problem. The preconditioned problem is then usually solved by an iterative method.
In mathematics, power iteration is an eigenvalue algorithm: given a diagonalizable matrix , the algorithm will produce a number , which is the greatest eigenvalue of , and a nonzero vector , which is a corresponding eigenvector of , that is, . The algorithm is also known as the Von Mises iteration.
In mathematics, a graph partition is the reduction of a graph to a smaller graph by partitioning its set of nodes into mutually exclusive groups. Edges of the original graph that cross between the groups will produce edges in the partitioned graph. If the number of resulting edges is small compared to the original graph, then the partitioned graph may be better suited for analysis and problem-solving than the original. Finding a partition that simplifies graph analysis is a hard problem, but one that has applications to scientific computing, VLSI circuit design, and task scheduling in multiprocessor computers, among others. Recently, the graph partition problem has gained importance due to its application for clustering and detection of cliques in social, pathological and biological networks. For a survey on recent trends in computational methods and applications see Buluc et al. (2013). Two common examples of graph partitioning are minimum cut and maximum cut problems.
In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.
In multivariate statistics, spectral clustering techniques make use of the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. The similarity matrix is provided as an input and consists of a quantitative assessment of the relative similarity of each pair of points in the dataset.
The image segmentation problem is concerned with partitioning an image into multiple regions according to some homogeneity criterion. This article is primarily concerned with graph theoretic approaches to image segmentation applying graph partitioning via minimum cut or maximum cut. Segmentation-based object categorization can be viewed as a specific case of spectral clustering applied to image segmentation.
Andrew Knyazev is an American mathematician. He graduated from the Faculty of Computational Mathematics and Cybernetics of Moscow State University under the supervision of Evgenii Georgievich D'yakonov in 1981 and obtained his PhD in Numerical Mathematics at the Russian Academy of Sciences under the supervision of Vyacheslav Ivanovich Lebedev in 1985. He worked at the Kurchatov Institute between 1981–1983, and then to 1992 at the Marchuk Institute of Numerical Mathematics of the Russian Academy of Sciences, headed by Gury Marchuk.
SLEPc is a software library for the parallel computation of eigenvalues and eigenvectors of large, sparse matrices. It can be seen as a module of PETSc that provides solvers for different types of eigenproblems, including linear and nonlinear, as well as the SVD. Recent versions also include support for matrix functions. It uses the MPI standard for parallelization. Both real and complex arithmetic are supported, with single, double and quadruple precision.