Non-negative matrix factorization

Last updated
Illustration of approximate non-negative matrix factorization: the matrix V is represented by the two smaller matrices W and H, which, when multiplied, approximately reconstruct V. NMF.png
Illustration of approximate non-negative matrix factorization: the matrix V is represented by the two smaller matrices W and H, which, when multiplied, approximately reconstruct V.

Non-negative matrix factorization (NMF or NNMF), also non-negative matrix approximation [1] [2] is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Also, in applications such as processing of audio spectrograms or muscular activity, non-negativity is inherent to the data being considered. Since the problem is not exactly solvable in general, it is commonly approximated numerically.

Contents

NMF finds applications in such fields as astronomy, [3] [4] computer vision, document clustering, [1] missing data imputation, [5] chemometrics, audio signal processing, recommender systems, [6] [7] and bioinformatics. [8]

History

In chemometrics non-negative matrix factorization has a long history under the name "self modeling curve resolution". [9] In this framework the vectors in the right matrix are continuous curves rather than discrete vectors. Also early work on non-negative matrix factorizations was performed by a Finnish group of researchers in the 1990s under the name positive matrix factorization. [10] [11] [12] It became more widely known as non-negative matrix factorization after Lee and Seung investigated the properties of the algorithm and published some simple and useful algorithms for two types of factorizations. [13] [14]

Background

Let matrix V be the product of the matrices W and H,

Matrix multiplication can be implemented as computing the column vectors of V as linear combinations of the column vectors in W using coefficients supplied by columns of H. That is, each column of V can be computed as follows:

where vi is the i-th column vector of the product matrix V and hi is the i-th column vector of the matrix H.

When multiplying matrices, the dimensions of the factor matrices may be significantly lower than those of the product matrix and it is this property that forms the basis of NMF. NMF generates factors with significantly reduced dimensions compared to the original matrix. For example, if V is an m × n matrix, W is an m × p matrix, and H is a p × n matrix then p can be significantly less than both m and n.

Here is an example based on a text-mining application:

This last point is the basis of NMF because we can consider each original document in our example as being built from a small set of hidden features. NMF generates these features.

It is useful to think of each feature (column vector) in the features matrix W as a document archetype comprising a set of words where each word's cell value defines the word's rank in the feature: The higher a word's cell value the higher the word's rank in the feature. A column in the coefficients matrix H represents an original document with a cell value defining the document's rank for a feature. We can now reconstruct a document (column vector) from our input matrix by a linear combination of our features (column vectors in W) where each feature is weighted by the feature's cell value from the document's column in H.

Clustering property

NMF has an inherent clustering property, [15] i.e., it automatically clusters the columns of input data .

More specifically, the approximation of by is achieved by finding and that minimize the error function (using the Frobenius norm)

subject to ,

If we furthermore impose an orthogonality constraint on , i.e. , then the above minimization is mathematically equivalent to the minimization of K-means clustering. [15]

Furthermore, the computed gives the cluster membership, i.e., if for all ik, this suggests that the input data belongs to -th cluster. The computed gives the cluster centroids, i.e., the -th column gives the cluster centroid of -th cluster. This centroid's representation can be significantly enhanced by convex NMF.

When the orthogonality constraint is not explicitly imposed, the orthogonality holds to a large extent, and the clustering property holds too.

When the error function to be used is Kullback–Leibler divergence, NMF is identical to the probabilistic latent semantic analysis (PLSA), a popular document clustering method. [16]

Types

Approximate non-negative matrix factorization

Usually the number of columns of W and the number of rows of H in NMF are selected so the product WH will become an approximation to V. The full decomposition of V then amounts to the two non-negative matrices W and H as well as a residual U, such that: V = WH + U. The elements of the residual matrix can either be negative or positive.

When W and H are smaller than V they become easier to store and manipulate. Another reason for factorizing V into smaller matrices W and H, is that if one's goal is to approximately represent the elements of V by significantly less data, then one has to infer some latent structure in the data.

Convex non-negative matrix factorization

In standard NMF, matrix factor WR+m × k, i.e., W can be anything in that space. Convex NMF [17] restricts the columns of W to convex combinations of the input data vectors . This greatly improves the quality of data representation of W. Furthermore, the resulting matrix factor H becomes more sparse and orthogonal.

Nonnegative rank factorization

In case the nonnegative rank of V is equal to its actual rank, V = WH is called a nonnegative rank factorization (NRF). [18] [19] [20] The problem of finding the NRF of V, if it exists, is known to be NP-hard. [21]

Different cost functions and regularizations

There are different types of non-negative matrix factorizations. The different types arise from using different cost functions for measuring the divergence between V and WH and possibly by regularization of the W and/or H matrices. [1]

Two simple divergence functions studied by Lee and Seung are the squared error (or Frobenius norm) and an extension of the Kullback–Leibler divergence to positive matrices (the original Kullback–Leibler divergence is defined on probability distributions). Each divergence leads to a different NMF algorithm, usually minimizing the divergence using iterative update rules.

The factorization problem in the squared error version of NMF may be stated as: Given a matrix find nonnegative matrices W and H that minimize the function

Another type of NMF for images is based on the total variation norm. [22]

When L1 regularization (akin to Lasso) is added to NMF with the mean squared error cost function, the resulting problem may be called non-negative sparse coding due to the similarity to the sparse coding problem, [23] [24] although it may also still be referred to as NMF. [25]

Online NMF

Many standard NMF algorithms analyze all the data together; i.e., the whole matrix is available from the start. This may be unsatisfactory in applications where there are too many data to fit into memory or where the data are provided in streaming fashion. One such use is for collaborative filtering in recommendation systems, where there may be many users and many items to recommend, and it would be inefficient to recalculate everything when one user or one item is added to the system. The cost function for optimization in these cases may or may not be the same as for standard NMF, but the algorithms need to be rather different. [26] [27]

Convolutional NMF

If the columns of V represent data sampled over spatial or temporal dimensions, e.g. time signals, images, or video, features that are equivariant w.r.t. shifts along these dimensions can be learned by Convolutional NMF. In this case, W is sparse with columns having local non-zero weight windows that are shared across shifts along the spatio-temporal dimensions of V, representing convolution kernels. By spatio-temporal pooling of H and repeatedly using the resulting representation as input to convolutional NMF, deep feature hierarchies can be learned. [28]

Algorithms

There are several ways in which the W and H may be found: Lee and Seung's multiplicative update rule [14] has been a popular method due to the simplicity of implementation. This algorithm is:

initialize: W and H non negative.
Then update the values in W and H by computing the following, with as an index of the iteration.
and
Until W and H are stable.

Note that the updates are done on an element by element basis not matrix multiplication.

We note that the multiplicative factors for W and H, i.e. the and terms, are matrices of ones when .

More recently other algorithms have been developed. Some approaches are based on alternating non-negative least squares: in each step of such an algorithm, first H is fixed and W found by a non-negative least squares solver, then W is fixed and H is found analogously. The procedures used to solve for W and H may be the same [29] or different, as some NMF variants regularize one of W and H. [23] Specific approaches include the projected gradient descent methods, [29] [30] the active set method, [6] [31] the optimal gradient method, [32] and the block principal pivoting method [33] among several others. [34]

Current algorithms are sub-optimal in that they only guarantee finding a local minimum, rather than a global minimum of the cost function. A provably optimal algorithm is unlikely in the near future as the problem has been shown to generalize the k-means clustering problem which is known to be NP-complete. [35] However, as in many other data mining applications, a local minimum may still prove to be useful.

Fractional residual variance (FRV) plots for PCA and sequential NMF; for PCA, the theoretical values are the contribution from the residual eigenvalues. In comparison, the FRV curves for PCA reaches a flat plateau where no signal are captured effectively; while the NMF FRV curves are declining continuously, indicating a better ability to capture signal. The FRV curves for NMF also converges to higher levels than PCA, indicating the less-overfitting property of NMF. Fractional Residual Variances comparison, PCA and NMF.pdf
Fractional residual variance (FRV) plots for PCA and sequential NMF; for PCA, the theoretical values are the contribution from the residual eigenvalues. In comparison, the FRV curves for PCA reaches a flat plateau where no signal are captured effectively; while the NMF FRV curves are declining continuously, indicating a better ability to capture signal. The FRV curves for NMF also converges to higher levels than PCA, indicating the less-overfitting property of NMF.

Sequential NMF

The sequential construction of NMF components (W and H) was firstly used to relate NMF with Principal Component Analysis (PCA) in astronomy. [36] The contribution from the PCA components are ranked by the magnitude of their corresponding eigenvalues; for NMF, its components can be ranked empirically when they are constructed one by one (sequentially), i.e., learn the -th component with the first components constructed.

The contribution of the sequential NMF components can be compared with the Karhunen–Loève theorem, an application of PCA, using the plot of eigenvalues. A typical choice of the number of components with PCA is based on the "elbow" point, then the existence of the flat plateau is indicating that PCA is not capturing the data efficiently, and at last there exists a sudden drop reflecting the capture of random noise and falls into the regime of overfitting. [37] [38] For sequential NMF, the plot of eigenvalues is approximated by the plot of the fractional residual variance curves, where the curves decreases continuously, and converge to a higher level than PCA, [4] which is the indication of less over-fitting of sequential NMF.

Exact NMF

Exact solutions for the variants of NMF can be expected (in polynomial time) when additional constraints hold for matrix V. A polynomial time algorithm for solving nonnegative rank factorization if V contains a monomial sub matrix of rank equal to its rank was given by Campbell and Poole in 1981. [39] Kalofolias and Gallopoulos (2012) [40] solved the symmetric counterpart of this problem, where V is symmetric and contains a diagonal principal sub matrix of rank r. Their algorithm runs in O(rm2) time in the dense case. Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) give a polynomial time algorithm for exact NMF that works for the case where one of the factors W satisfies a separability condition. [41]

Relation to other techniques

In Learning the parts of objects by non-negative matrix factorization Lee and Seung [42] proposed NMF mainly for parts-based decomposition of images. It compares NMF to vector quantization and principal component analysis, and shows that although the three techniques may be written as factorizations, they implement different constraints and therefore produce different results.

NMF as a probabilistic graphical model: visible units (V) are connected to hidden units (H) through weights W, so that V is generated from a probability distribution with mean
[?]
a
W
i
a
h
a
{\displaystyle \sum _{a}W_{ia}h_{a}}
. Restricted Boltzmann machine.svg
NMF as a probabilistic graphical model: visible units (V) are connected to hidden units (H) through weights W, so that V is generated from a probability distribution with mean .

It was later shown that some types of NMF are an instance of a more general probabilistic model called "multinomial PCA". [43] When NMF is obtained by minimizing the Kullback–Leibler divergence, it is in fact equivalent to another instance of multinomial PCA, probabilistic latent semantic analysis, [44] trained by maximum likelihood estimation. That method is commonly used for analyzing and clustering textual data and is also related to the latent class model.

NMF with the least-squares objective is equivalent to a relaxed form of K-means clustering: the matrix factor W contains cluster centroids and H contains cluster membership indicators. [15] [45] This provides a theoretical foundation for using NMF for data clustering. However, k-means does not enforce non-negativity on its centroids, so the closest analogy is in fact with "semi-NMF". [17]

NMF can be seen as a two-layer directed graphical model with one layer of observed random variables and one layer of hidden random variables. [46]

NMF extends beyond matrices to tensors of arbitrary order. [47] [48] [49] This extension may be viewed as a non-negative counterpart to, e.g., the PARAFAC model.

Other extensions of NMF include joint factorization of several data matrices and tensors where some factors are shared. Such models are useful for sensor fusion and relational learning. [50]

NMF is an instance of nonnegative quadratic programming, just like the support vector machine (SVM). However, SVM and NMF are related at a more intimate level than that of NQP, which allows direct application of the solution algorithms developed for either of the two methods to problems in both domains. [51]

Uniqueness

The factorization is not unique: A matrix and its inverse can be used to transform the two factorization matrices by, e.g., [52]

If the two new matrices and are non-negative they form another parametrization of the factorization.

The non-negativity of and applies at least if B is a non-negative monomial matrix. In this simple case it will just correspond to a scaling and a permutation.

More control over the non-uniqueness of NMF is obtained with sparsity constraints. [53]

Applications

Astronomy

In astronomy, NMF is a promising method for dimension reduction in the sense that astrophysical signals are non-negative. NMF has been applied to the spectroscopic observations [54] [3] and the direct imaging observations [4] as a method to study the common properties of astronomical objects and post-process the astronomical observations. The advances in the spectroscopic observations by Blanton & Roweis (2007) [3] takes into account of the uncertainties of astronomical observations, which is later improved by Zhu (2016) [36] where missing data are also considered and parallel computing is enabled. Their method is then adopted by Ren et al. (2018) [4] to the direct imaging field as one of the methods of detecting exoplanets, especially for the direct imaging of circumstellar disks.

Ren et al. (2018) [4] are able to prove the stability of NMF components when they are constructed sequentially (i.e., one by one), which enables the linearity of the NMF modeling process; the linearity property is used to separate the stellar light and the light scattered from the exoplanets and circumstellar disks.

In direct imaging, to reveal the faint exoplanets and circumstellar disks from bright the surrounding stellar lights, which has a typical contrast from 10⁵ to 10¹⁰, various statistical methods have been adopted, [55] [56] [37] however the light from the exoplanets or circumstellar disks are usually over-fitted, where forward modeling have to be adopted to recover the true flux. [57] [38] Forward modeling is currently optimized for point sources, [38] however not for extended sources, especially for irregularly shaped structures such as circumstellar disks. In this situation, NMF has been an excellent method, being less over-fitting in the sense of the non-negativity and sparsity of the NMF modeling coefficients, therefore forward modeling can be performed with a few scaling factors, [4] rather than a computationally intensive data re-reduction on generated models.

Data imputation

To impute missing data in statistics, NMF can take missing data while minimizing its cost function, rather than treating these missing data as zeros. [5] This makes it a mathematically proven method for data imputation in statistics. [5] By first proving that the missing data are ignored in the cost function, then proving that the impact from missing data can be as small as a second order effect, Ren et al. (2020) [5] studied and applied such an approach for the field of astronomy. Their work focuses on two-dimensional matrices, specifically, it includes mathematical derivation, simulated data imputation, and application to on-sky data.

The data imputation procedure with NMF can be composed of two steps. First, when the NMF components are known, Ren et al. (2020) proved that impact from missing data during data imputation ("target modeling" in their study) is a second order effect. Second, when the NMF components are unknown, the authors proved that the impact from missing data during component construction is a first-to-second order effect.

Depending on the way that the NMF components are obtained, the former step above can be either independent or dependent from the latter. In addition, the imputation quality can be increased when the more NMF components are used, see Figure 4 of Ren et al. (2020) for their illustration. [5]

Text mining

NMF can be used for text mining applications. In this process, a document-term matrix is constructed with the weights of various terms (typically weighted word frequency information) from a set of documents. This matrix is factored into a term-feature and a feature-document matrix. The features are derived from the contents of the documents, and the feature-document matrix describes data clusters of related documents.

One specific application used hierarchical NMF on a small subset of scientific abstracts from PubMed. [58] Another research group clustered parts of the Enron email dataset [59] with 65,033 messages and 91,133 terms into 50 clusters. [60] NMF has also been applied to citations data, with one example clustering English Wikipedia articles and scientific journals based on the outbound scientific citations in English Wikipedia. [61]

Arora, Ge, Halpern, Mimno, Moitra, Sontag, Wu, & Zhu (2013) have given polynomial-time algorithms to learn topic models using NMF. The algorithm assumes that the topic matrix satisfies a separability condition that is often found to hold in these settings. [41]

Hassani, Iranmanesh and Mansouri (2019) proposed a feature agglomeration method for term-document matrices which operates using NMF. The algorithm reduces the term-document matrix into a smaller matrix more suitable for text clustering. [62]

Spectral data analysis

NMF is also used to analyze spectral data; one such use is in the classification of space objects and debris. [63]

Scalable Internet distance prediction

NMF is applied in scalable Internet distance (round-trip time) prediction. For a network with hosts, with the help of NMF, the distances of all the end-to-end links can be predicted after conducting only measurements. This kind of method was firstly introduced in Internet Distance Estimation Service (IDES). [64] Afterwards, as a fully decentralized approach, Phoenix network coordinate system [65] is proposed. It achieves better overall prediction accuracy by introducing the concept of weight.

Non-stationary speech denoising

Speech denoising has been a long lasting problem in audio signal processing. There are many algorithms for denoising if the noise is stationary. For example, the Wiener filter is suitable for additive Gaussian noise. However, if the noise is non-stationary, the classical denoising algorithms usually have poor performance because the statistical information of the non-stationary noise is difficult to estimate. Schmidt et al. [66] use NMF to do speech denoising under non-stationary noise, which is completely different from classical statistical approaches. The key idea is that clean speech signal can be sparsely represented by a speech dictionary, but non-stationary noise cannot. Similarly, non-stationary noise can also be sparsely represented by a noise dictionary, but speech cannot.

The algorithm for NMF denoising goes as follows. Two dictionaries, one for speech and one for noise, need to be trained offline. Once a noisy speech is given, we first calculate the magnitude of the Short-Time-Fourier-Transform. Second, separate it into two parts via NMF, one can be sparsely represented by the speech dictionary, and the other part can be sparsely represented by the noise dictionary. Third, the part that is represented by the speech dictionary will be the estimated clean speech.

Population genetics

Sparse NMF is used in Population genetics for estimating individual admixture coefficients, detecting genetic clusters of individuals in a population sample or evaluating genetic admixture in sampled genomes. In human genetic clustering, NMF algorithms provide estimates similar to those of the computer program STRUCTURE, but the algorithms are more efficient computationally and allow analysis of large population genomic data sets. [67]

Bioinformatics

NMF has been successfully applied in bioinformatics for clustering gene expression and DNA methylation data and finding the genes most representative of the clusters. [24] [68] [69] [70] In the analysis of cancer mutations it has been used to identify common patterns of mutations that occur in many cancers and that probably have distinct causes. [71] NMF techniques can identify sources of variation such as cell types, disease subtypes, population stratification, tissue composition, and tumor clonality. [72]

A particular variant of NMF, namely Non-Negative Matrix Tri-Factorization (NMTF), [73] has been use for drug repurposing tasks in order to predict novel protein targets and therapeutic indications for approved drugs [74] and to infer pair of synergic anticancer drugs. [75]

Nuclear imaging

NMF, also referred in this field as factor analysis, has been used since the 1980s [76] to analyze sequences of images in SPECT and PET dynamic medical imaging. Non-uniqueness of NMF was addressed using sparsity constraints. [77] [78] [79]

Current research

Current research (since 2010) in nonnegative matrix factorization includes, but is not limited to,

  1. Algorithmic: searching for global minima of the factors and factor initialization. [80]
  2. Scalability: how to factorize million-by-billion matrices, which are commonplace in Web-scale data mining, e.g., see Distributed Nonnegative Matrix Factorization (DNMF), [81] Scalable Nonnegative Matrix Factorization (ScalableNMF), [82] Distributed Stochastic Singular Value Decomposition. [83]
  3. Online: how to update the factorization when new data comes in without recomputing from scratch, e.g., see online CNSC [84]
  4. Collective (joint) factorization: factorizing multiple interrelated matrices for multiple-view learning, e.g. multi-view clustering, see CoNMF [85] and MultiNMF [86]
  5. Cohen and Rothblum 1993 problem: whether a rational matrix always has an NMF of minimal inner dimension whose factors are also rational. Recently, this problem has been answered negatively. [87]

See also

Notes

  1. 1 2 3 Suvrit Sra; Inderjit S. Dhillon (2006). Generalized Nonnegative Matrix Approximations with Bregman Divergences (PDF). Advances in Neural Information Processing Systems. ISBN   978-0-262-23253-1. Wikidata   Q77685465.{{cite book}}: |journal= ignored (help)
  2. Tandon, Rashish; Sra, Suvrit (September 13, 2010). Sparse nonnegative matrix approximation: new formulations and algorithms (PDF) (Report). Max Planck Institute for Biological Cybernetics. Technical Report No. 193.
  3. 1 2 3 Blanton, Michael R.; Roweis, Sam (2007). "K-corrections and filter transformations in the ultraviolet, optical, and near infrared". The Astronomical Journal. 133 (2): 734–754. arXiv: astro-ph/0606170 . Bibcode:2007AJ....133..734B. doi:10.1086/510127. S2CID   18561804.
  4. 1 2 3 4 5 6 7 Ren, Bin; Pueyo, Laurent; Zhu, Guangtun B.; Duchêne, Gaspard (2018). "Non-negative Matrix Factorization: Robust Extraction of Extended Structures". The Astrophysical Journal. 852 (2): 104. arXiv: 1712.10317 . Bibcode:2018ApJ...852..104R. doi: 10.3847/1538-4357/aaa1f2 . S2CID   3966513.
  5. 1 2 3 4 5 Ren, Bin; Pueyo, Laurent; Chen, Christine; Choquet, Elodie; Debes, John H; Duechene, Gaspard; Menard, Francois; Perrin, Marshall D. (2020). "Using Data Imputation for Signal Separation in High Contrast Imaging". The Astrophysical Journal. 892 (2): 74. arXiv: 2001.00563 . Bibcode:2020ApJ...892...74R. doi: 10.3847/1538-4357/ab7024 . S2CID   209531731.
  6. 1 2 Rainer Gemulla; Erik Nijkamp; Peter J. Haas; Yannis Sismanis (2011). Large-scale matrix factorization with distributed stochastic gradient descent. Proc. ACM SIGKDD Int'l Conf. on Knowledge discovery and data mining. pp. 69–77.
  7. Yang Bao; et al. (2014). TopicMF: Simultaneously Exploiting Ratings and Reviews for Recommendation. AAAI.
  8. Ben Murrell; et al. (2011). "Non-Negative Matrix Factorization for Learning Alignment-Specific Models of Protein Evolution". PLOS ONE. 6 (12): e28898. Bibcode:2011PLoSO...628898M. doi: 10.1371/journal.pone.0028898 . PMC   3245233 . PMID   22216138.
  9. William H. Lawton; Edward A. Sylvestre (1971). "Self modeling curve resolution". Technometrics . 13 (3): 617–633. doi:10.2307/1267173. JSTOR   1267173.
  10. Pentti Paatero; Unto Tapper; Pasi Aalto; Markku Kulmala (1991). "Matrix factorization methods for analysing diffusion battery data". Journal of Aerosol Science . 22: S273–S276. doi:10.1016/S0021-8502(05)80089-8. ISSN   0021-8502. Wikidata   Q58065673.
  11. Pentti Paatero; Unto Tapper (June 1994). "Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values". Environmetrics. 5 (2): 111–126. doi:10.1002/ENV.3170050203. ISSN   1180-4009. Wikidata   Q29308406.
  12. Pia Anttila; Pentti Paatero; Unto Tapper; Olli Järvinen (1995). "Source identification of bulk wet deposition in Finland by positive matrix factorization". Atmospheric Environment . 29 (14): 1705–1718. Bibcode:1995AtmEn..29.1705A. doi:10.1016/1352-2310(94)00367-T.
  13. 1 2 Daniel D. Lee & H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization". Nature . 401 (6755): 788–791. Bibcode:1999Natur.401..788L. doi:10.1038/44565. PMID   10548103. S2CID   4428232.
  14. 1 2 Daniel D. Lee & H. Sebastian Seung (2001). Algorithms for Non-negative Matrix Factorization (PDF). Advances in Neural Information Processing Systems 13: Proceedings of the 2000 Conference. MIT Press. pp. 556–562.
  15. 1 2 3 C. Ding, X. He, H.D. Simon (2005). "On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering". Proc. SIAM Int'l Conf. Data Mining, pp. 606-610. May 2005
  16. Ding C, Li Y, Peng W (2008). "On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing" (PDF). Computational Statistics & Data Analysis. 52 (8): 3913–3927. doi:10.1016/j.csda.2008.01.011. Archived from the original (PDF) on 2016-03-04.
  17. 1 2 C Ding, T Li, MI Jordan, Convex and semi-nonnegative matrix factorizations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32, 45-55, 2010
  18. Berman, A.; R.J. Plemmons (1974). "Inverses of nonnegative matrices". Linear and Multilinear Algebra. 2 (2): 161–172. doi:10.1080/03081087408817055.
  19. A. Berman; R.J. Plemmons (1994). Nonnegative matrices in the Mathematical Sciences. Philadelphia: SIAM.
  20. Thomas, L.B. (1974). "Problem 73-14, Rank factorization of nonnegative matrices". SIAM Rev. 16 (3): 393–394. doi:10.1137/1016064.
  21. Vavasis, S.A. (2009). "On the complexity of nonnegative matrix factorization". SIAM J. Optim. 20 (3): 1364–1377. arXiv: 0708.4149 . doi:10.1137/070709967. S2CID   7150400.
  22. Zhang, T.; Fang, B.; Liu, W.; Tang, Y. Y.; He, G.; Wen, J. (2008). "Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns". Neurocomputing . 71 (10–12): 1824–1831. doi:10.1016/j.neucom.2008.01.022.
  23. 1 2 Hoyer, Patrik O. (2002). Non-negative sparse coding. Proc. IEEE Workshop on Neural Networks for Signal Processing. arXiv: cs/0202009 .
  24. 1 2 Leo Taslaman & Björn Nilsson (2012). "A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data". PLOS One . 7 (11): e46331. Bibcode:2012PLoSO...746331T. doi: 10.1371/journal.pone.0046331 . PMC   3487913 . PMID   23133590.
  25. Hsieh, C. J.; Dhillon, I. S. (2011). Fast coordinate descent methods with variable selection for non-negative matrix factorization (PDF). Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11. p. 1064. doi:10.1145/2020408.2020577. ISBN   9781450308137.
  26. Fung, Yik-Hing; Li, Chun-Hung; Cheung, William K. (2 November 2007). Online Discussion Participation Prediction Using Non-negative Matrix Factorization. Wi-Iatw '07. IEEE Computer Society. pp. 284–287. ISBN   9780769530284 via dl.acm.org.
  27. Naiyang Guan; Dacheng Tao; Zhigang Luo & Bo Yuan (July 2012). "Online Nonnegative Matrix Factorization With Robust Stochastic Approximation". IEEE Transactions on Neural Networks and Learning Systems. 23 (7): 1087–1099. doi:10.1109/TNNLS.2012.2197827. PMID   24807135. S2CID   8755408.
  28. Behnke, S. (2003). "Discovering hierarchical speech features using convolutional non-negative matrix factorization". Proceedings of the International Joint Conference on Neural Networks, 2003. Vol. 4. Portland, Oregon USA: IEEE. pp. 2758–2763. doi:10.1109/IJCNN.2003.1224004. ISBN   978-0-7803-7898-8. S2CID   3109867.
  29. 1 2 Lin, Chih-Jen (2007). "Projected Gradient Methods for Nonnegative Matrix Factorization" (PDF). Neural Computation . 19 (10): 2756–2779. CiteSeerX   10.1.1.308.9135 . doi:10.1162/neco.2007.19.10.2756. PMID   17716011. S2CID   2295736.
  30. Lin, Chih-Jen (2007). "On the Convergence of Multiplicative Update Algorithms for Nonnegative Matrix Factorization". IEEE Transactions on Neural Networks. 18 (6): 1589–1596. CiteSeerX   10.1.1.407.318 . doi:10.1109/TNN.2007.895831. S2CID   2183630.
  31. Hyunsoo Kim & Haesun Park (2008). "Nonnegative Matrix Factorization Based on Alternating Nonnegativity Constrained Least Squares and Active Set Method" (PDF). SIAM Journal on Matrix Analysis and Applications . 30 (2): 713–730. CiteSeerX   10.1.1.70.3485 . doi:10.1137/07069239x.
  32. Naiyang Guan; Dacheng Tao; Zhigang Luo; Bo Yuan (June 2012). "NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization". IEEE Transactions on Signal Processing. 60 (6): 2882–2898. Bibcode:2012ITSP...60.2882G. doi:10.1109/TSP.2012.2190406. S2CID   8143231.
  33. Jingu Kim & Haesun Park (2011). "Fast Nonnegative Matrix Factorization: An Active-set-like Method and Comparisons". SIAM Journal on Scientific Computing . 58 (6): 3261–3281. Bibcode:2011SJSC...33.3261K. CiteSeerX   10.1.1.419.798 . doi:10.1137/110821172.
  34. Jingu Kim; Yunlong He & Haesun Park (2013). "Algorithms for nonnegative matrix and tensor factorizations: A unified view based on block coordinate descent framework" (PDF). Journal of Global Optimization . 33 (2): 285–319. doi: 10.1007/s10898-013-0035-4 . S2CID   11197117.
  35. Ding, C.; He, X. & Simon, H.D. (2005). "On the equivalence of nonnegative matrix factorization and spectral clustering". Proc. SIAM Data Mining Conf. Vol. 4. pp. 606–610. doi:10.1137/1.9781611972757.70. ISBN   978-0-89871-593-4.
  36. 1 2 Zhu, Guangtun B. (2016-12-19). "Nonnegative Matrix Factorization (NMF) with Heteroscedastic Uncertainties and Missing data". arXiv: 1612.06037 [astro-ph.IM].
  37. 1 2 Soummer, Rémi; Pueyo, Laurent; Larkin, James (2012). "Detection and Characterization of Exoplanets and Disks Using Projections on Karhunen-Loève Eigenimages". The Astrophysical Journal Letters. 755 (2): L28. arXiv: 1207.4197 . Bibcode:2012ApJ...755L..28S. doi:10.1088/2041-8205/755/2/L28. S2CID   51088743.
  38. 1 2 3 Pueyo, Laurent (2016). "Detection and Characterization of Exoplanets using Projections on Karhunen Loeve Eigenimages: Forward Modeling". The Astrophysical Journal. 824 (2): 117. arXiv: 1604.06097 . Bibcode:2016ApJ...824..117P. doi: 10.3847/0004-637X/824/2/117 . S2CID   118349503.
  39. Campbell, S.L.; G.D. Poole (1981). "Computing nonnegative rank factorizations". Linear Algebra Appl. 35: 175–182. doi: 10.1016/0024-3795(81)90272-x .
  40. Kalofolias, V.; Gallopoulos, E. (2012). "Computing symmetric nonnegative rank factorizations" (PDF). Linear Algebra Appl. 436 (2): 421–435. doi:10.1016/j.laa.2011.03.016.
  41. 1 2 Arora, Sanjeev; Ge, Rong; Halpern, Yoni; Mimno, David; Moitra, Ankur; Sontag, David; Wu, Yichen; Zhu, Michael (2013). A practical algorithm for topic modeling with provable guarantees. Proceedings of the 30th International Conference on Machine Learning. arXiv: 1212.4777 . Bibcode:2012arXiv1212.4777A.
  42. Lee, Daniel D.; Sebastian, Seung, H. (1999). "Learning the parts of objects by non-negative matrix factorization" (PDF). Nature . 401 (6755): 788–791. Bibcode:1999Natur.401..788L. doi:10.1038/44565. PMID   10548103. S2CID   4428232.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  43. Wray Buntine (2002). Variational Extensions to EM and Multinomial PCA (PDF). Proc. European Conference on Machine Learning (ECML-02). LNAI. Vol. 2430. pp. 23–34.
  44. Eric Gaussier & Cyril Goutte (2005). Relation between PLSA and NMF and Implications (PDF). Proc. 28th international ACM SIGIR conference on Research and development in information retrieval (SIGIR-05). pp. 601–602. Archived from the original (PDF) on 2007-09-28. Retrieved 2007-01-29.
  45. Ron Zass and Amnon Shashua (2005). "A Unifying Approach to Hard and Probabilistic Clustering". International Conference on Computer Vision (ICCV) Beijing, China, Oct., 2005.
  46. Max Welling; et al. (2004). Exponential Family Harmoniums with an Application to Information Retrieval. NIPS.
  47. Pentti Paatero (1999). "The Multilinear Engine: A Table-Driven, Least Squares Program for Solving Multilinear Problems, including the n-Way Parallel Factor Analysis Model". Journal of Computational and Graphical Statistics . 8 (4): 854–888. doi:10.2307/1390831. JSTOR   1390831.
  48. Max Welling & Markus Weber (2001). "Positive Tensor Factorization". Pattern Recognition Letters . 22 (12): 1255–1261. Bibcode:2001PaReL..22.1255W. CiteSeerX   10.1.1.21.24 . doi:10.1016/S0167-8655(01)00070-8.
  49. Jingu Kim & Haesun Park (2012). Fast Nonnegative Tensor Factorization with an Active-set-like Method (PDF). High-Performance Scientific Computing: Algorithms and Applications. Springer. pp. 311–326.
  50. Kenan Yilmaz; A. Taylan Cemgil & Umut Simsekli (2011). Generalized Coupled Tensor Factorization (PDF). NIPS.
  51. Vamsi K. Potluru; Sergey M. Plis; Morten Morup; Vince D. Calhoun & Terran Lane (2009). Efficient Multiplicative updates for Support Vector Machines. Proceedings of the 2009 SIAM Conference on Data Mining (SDM). pp. 1218–1229.
  52. Wei Xu; Xin Liu & Yihong Gong (2003). Document clustering based on non-negative matrix factorization. Proceedings of the 26th annual international ACM SIGIR conference on Research and development in information retrieval. New York: Association for Computing Machinery. pp. 267–273.
  53. Eggert, J.; Korner, E. (2004). "Sparse coding and NMF". 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541). Vol. 4. pp. 2529–2533. doi:10.1109/IJCNN.2004.1381036. ISBN   978-0-7803-8359-3. S2CID   17923083.
  54. Berné, O.; Joblin, C.; Deville, Y.; Smith, J. D.; Rapacioli, M.; Bernard, J. P.; Thomas, J.; Reach, W.; Abergel, A. (2007-07-01). "Analysis of the emission of very small dust particles from Spitzer spectro-imagery data using blind signal separation methods". Astronomy & Astrophysics. 469 (2): 575–586. arXiv: astro-ph/0703072 . Bibcode:2007A&A...469..575B. doi: 10.1051/0004-6361:20066282 . ISSN   0004-6361.
  55. Lafrenière, David; Maroid, Christian; Doyon, René; Barman, Travis (2009). "HST/NICMOS Detection of HR 8799 b in 1998". The Astrophysical Journal Letters. 694 (2): L148. arXiv: 0902.3247 . Bibcode:2009ApJ...694L.148L. doi:10.1088/0004-637X/694/2/L148. S2CID   7332750.
  56. Amara, Adam; Quanz, Sascha P. (2012). "PYNPOINT: an image processing package for finding exoplanets". Monthly Notices of the Royal Astronomical Society. 427 (2): 948. arXiv: 1207.6637 . Bibcode:2012MNRAS.427..948A. doi:10.1111/j.1365-2966.2012.21918.x. S2CID   119200505.
  57. Wahhaj, Zahed; Cieza, Lucas A.; Mawet, Dimitri; Yang, Bin; Canovas, Hector; de Boer, Jozua; Casassus, Simon; Ménard, François; Schreiber, Matthias R.; Liu, Michael C.; Biller, Beth A.; Nielsen, Eric L.; Hayward, Thomas L. (2015). "Improving signal-to-noise in the direct imaging of exoplanets and circumstellar disks with MLOCI". Astronomy & Astrophysics. 581 (24): A24. arXiv: 1502.03092 . Bibcode:2015A&A...581A..24W. doi:10.1051/0004-6361/201525837. S2CID   20174209.
  58. Nielsen, Finn Årup; Balslev, Daniela; Hansen, Lars Kai (2005). "Mining the posterior cingulate: segregation between memory and pain components" (PDF). NeuroImage . 27 (3): 520–522. doi:10.1016/j.neuroimage.2005.04.034. PMID   15946864. S2CID   18509039.
  59. Cohen, William (2005-04-04). "Enron Email Dataset" . Retrieved 2008-08-26.
  60. Berry, Michael W.; Browne, Murray (2005). "Email Surveillance Using Non-negative Matrix Factorization". Computational and Mathematical Organization Theory . 11 (3): 249–264. doi:10.1007/s10588-005-5380-5. S2CID   16249147.
  61. Nielsen, Finn Årup (2008). Clustering of scientific citations in Wikipedia. Wikimania. arXiv: 0805.1154 .
  62. Hassani, Ali; Iranmanesh, Amir; Mansouri, Najme (2019-11-12). "Text Mining using Nonnegative Matrix Factorization and Latent Semantic Analysis". arXiv: 1911.04705 [cs.LG].
  63. Berry, Michael W.; Browne, Murray; Langville, Amy N.; Paucac, V. Paul; Plemmonsc, Robert J. (15 September 2007). "Algorithms and Applications for Approximate Nonnegative Matrix Factorization". Computational Statistics & Data Analysis. 52 (1): 155–173. doi:10.1016/j.csda.2006.11.006.
  64. Yun Mao; Lawrence Saul & Jonathan M. Smith (2006). "IDES: An Internet Distance Estimation Service for Large Networks". IEEE Journal on Selected Areas in Communications . 24 (12): 2273–2284. CiteSeerX   10.1.1.136.3837 . doi:10.1109/JSAC.2006.884026. S2CID   12931155.
  65. Yang Chen; Xiao Wang; Cong Shi; et al. (2011). "Phoenix: A Weight-based Network Coordinate System Using Matrix Factorization" (PDF). IEEE Transactions on Network and Service Management . 8 (4): 334–347. CiteSeerX   10.1.1.300.2851 . doi:10.1109/tnsm.2011.110911.100079. S2CID   8079061. Archived from the original (PDF) on 2011-11-14.
  66. Schmidt, M.N., J. Larsen, and F.T. Hsiao. (2007). "Wind noise reduction using non-negative sparse coding", Machine Learning for Signal Processing, IEEE Workshop on, 431–436
  67. Frichot E, Mathieu F, Trouillon T, Bouchard G, Francois O (2014). "Fast and efficient estimation of individual ancestry coefficients". Genetics . 196 (4): 973–983. doi:10.1534/genetics.113.160572. PMC   3982712 . PMID   24496008.
  68. Devarajan, K. (2008). "Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology". PLOS Computational Biology . 4 (7): e1000029. Bibcode:2008PLSCB...4E0029D. doi: 10.1371/journal.pcbi.1000029 . PMC   2447881 . PMID   18654623.
  69. Hyunsoo Kim & Haesun Park (2007). "Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis". Bioinformatics . 23 (12): 1495–1502. doi: 10.1093/bioinformatics/btm134 . PMID   17483501.
  70. Schwalbe, E. (2013). "DNA methylation profiling of medulloblastoma allows robust sub-classification and improved outcome prediction using formalin-fixed biopsies". Acta Neuropathologica . 125 (3): 359–371. doi:10.1007/s00401-012-1077-2. PMC   4313078 . PMID   23291781.
  71. Alexandrov, Ludmil B.; Nik-Zainal, Serena; Wedge, David C.; Campbell, Peter J.; Stratton, Michael R. (2013-01-31). "Deciphering signatures of mutational processes operative in human cancer". Cell Reports. 3 (1): 246–259. doi:10.1016/j.celrep.2012.12.008. ISSN   2211-1247. PMC   3588146 . PMID   23318258.
  72. Stein-O’Brien, Genevieve L.; Arora, Raman; Culhane, Aedin C.; Favorov, Alexander V.; Garmire, Lana X.; Greene, Casey S.; Goff, Loyal A.; Li, Yifeng; Ngom, Aloune; Ochs, Michael F.; Xu, Yanxun (2018-10-01). "Enter the Matrix: Factorization Uncovers Knowledge from Omics". Trends in Genetics. 34 (10): 790–805. doi:10.1016/j.tig.2018.07.003. ISSN   0168-9525. PMC   6309559 . PMID   30143323.
  73. Ding; Li; Peng; Park (2006). "Orthogonal nonnegative matrix t-factorizations for clustering". Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 126–135. doi:10.1145/1150402.1150420. ISBN   1595933395. S2CID   165018.{{cite book}}: CS1 maint: date and year (link)
  74. Ceddia; Pinoli; Ceri; Masseroli (2020). "Matrix factorization-based technique for drug repurposing predictions". IEEE Journal of Biomedical and Health Informatics. 24 (11): 3162–3172. doi:10.1109/JBHI.2020.2991763. PMID   32365039. S2CID   218504587.
  75. Pinoli; Ceddia; Ceri; Masseroli (2021). "Predicting drug synergism by means of non-negative matrix tri-factorization". IEEE/ACM Transactions on Computational Biology and Bioinformatics. PP (4): 1956–1967. doi:10.1109/TCBB.2021.3091814. PMID   34166199. S2CID   235634059.
  76. DiPaola; Bazin; Aubry; Aurengo; Cavailloles; Herry; Kahn (1982). "Handling of dynamic sequences in nuclear medicine". IEEE Trans Nucl Sci . 29 (4): 1310–21. Bibcode:1982ITNS...29.1310D. doi:10.1109/tns.1982.4332188. S2CID   37186516.
  77. Sitek; Gullberg; Huesman (2002). "Correction for ambiguous solutions in factor analysis using a penalized least squares objective". IEEE Trans Med Imaging . 21 (3): 216–25. doi:10.1109/42.996340. PMID   11989846. S2CID   6553527.
  78. Boutchko; Mitra; Baker; Jagust; Gullberg (2015). "Clustering Initiated Factor Analysis (CIFA) Application for Tissue Classification in Dynamic Brain PET". Journal of Cerebral Blood Flow and Metabolism . 35 (7): 1104–11. doi:10.1038/jcbfm.2015.69. PMC   4640278 . PMID   25899294.
  79. Abdalah; Boutchko; Mitra; Gullberg (2015). "Reconstruction of 4-D Dynamic SPECT Images From Inconsistent Projections Using a Spline Initialized FADS Algorithm (SIFADS)". IEEE Trans Med Imaging . 34 (1): 216–18. doi:10.1109/TMI.2014.2352033. PMID   25167546. S2CID   11060831.
  80. C. Boutsidis & E. Gallopoulos (2008). "SVD based initialization: A head start for nonnegative matrix factorization". Pattern Recognition. 41 (4): 1350–1362. Bibcode:2008PatRe..41.1350B. CiteSeerX   10.1.1.137.8281 . doi:10.1016/j.patcog.2007.09.010.
  81. Chao Liu; Hung-chih Yang; Jinliang Fan; Li-Wei He & Yi-Min Wang (2010). "Distributed Nonnegative Matrix Factorization for Web-Scale Dyadic Data Analysis on MapReduce" (PDF). Proceedings of the 19th International World Wide Web Conference.
  82. Jiangtao Yin; Lixin Gao & Zhongfei (Mark) Zhang (2014). "Scalable Nonnegative Matrix Factorization with Block-wise Updates" (PDF). Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases.
  83. "Apache Mahout". mahout.apache.org. Retrieved 2019-12-14.
  84. Dong Wang; Ravichander Vipperla; Nick Evans; Thomas Fang Zheng (2013). "Online Non-Negative Convolutive Pattern Learning for Speech Signals" (PDF). IEEE Transactions on Signal Processing. 61 (1): 44–56. Bibcode:2013ITSP...61...44W. CiteSeerX   10.1.1.707.7348 . doi:10.1109/tsp.2012.2222381. S2CID   12530378. Archived from the original (PDF) on 2015-04-19. Retrieved 2015-04-19.
  85. Xiangnan He; Min-Yen Kan; Peichu Xie & Xiao Chen (2014). "Comment-based Multi-View Clustering of Web 2.0 Items" (PDF). Proceedings of the 23rd International World Wide Web Conference. Archived from the original (PDF) on 2015-04-02. Retrieved 2015-03-22.
  86. Jialu Liu; Chi Wang; Jing Gao & Jiawei Han (2013). "Multi-View Clustering via Joint Nonnegative Matrix Factorization". Proceedings of the 2013 SIAM International Conference on Data Mining (PDF). pp. 252–260. CiteSeerX   10.1.1.301.1771 . doi:10.1137/1.9781611972832.28. ISBN   978-1-61197-262-7. S2CID   4968.{{cite book}}: |journal= ignored (help)
  87. Chistikov, Dmitry; Kiefer, Stefan; Marušić, Ines; Shirmohammadi, Mahsa; Worrell, James (2016-05-22). "Nonnegative Matrix Factorization Requires Irrationality". arXiv: 1605.06848 [cs.CC].

Others

Related Research Articles

<span class="mw-page-title-main">Fast Fourier transform</span> O(N log N) discrete Fourier transform algorithm

A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse factors. As a result, it manages to reduce the complexity of computing the DFT from , which arises if one simply applies the definition of DFT, to , where n is the data size. The difference in speed can be enormous, especially for long data sets where n may be in the thousands or millions. In the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly or indirectly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory.

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where the variation in the data can be described with fewer dimensions than the initial data. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition.

In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of problems.

<span class="mw-page-title-main">Sparse matrix</span> Matrix in which most of the elements are zero

In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse but a common criterion is that the number of non-zero elements is roughly equal to the number of rows or columns. By contrast, if most of the elements are non-zero, the matrix is considered dense. The number of zero-valued elements divided by the total number of elements is sometimes referred to as the sparsity of the matrix.

Dimensionality reduction, or dimension reduction, is the transformation of data from a high-dimensional space into a low-dimensional space so that the low-dimensional representation retains some meaningful properties of the original data, ideally close to its intrinsic dimension. Working in high-dimensional spaces can be undesirable for many reasons; raw data are often sparse as a consequence of the curse of dimensionality, and analyzing the data is usually computationally intractable. Dimensionality reduction is common in fields that deal with large numbers of observations and/or large numbers of variables, such as signal processing, speech recognition, neuroinformatics, and bioinformatics.

<span class="mw-page-title-main">Independent component analysis</span> Signal processing computational method

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the "cocktail party problem" of listening in on one person's speech in a noisy room.

In numerical analysis, the minimum degree algorithm is an algorithm used to permute the rows and columns of a symmetric sparse matrix before applying the Cholesky decomposition, to reduce the number of non-zeros in the Cholesky factor. This results in reduced storage requirements and means that the Cholesky factor can be applied with fewer arithmetic operations.

In mathematics, a nonnegative matrix, written

In numerical analysis and linear algebra, lower–upper (LU) decomposition or factorization factors a matrix as the product of a lower triangular matrix and an upper triangular matrix. The product sometimes includes a permutation matrix as well. LU decomposition can be viewed as the matrix form of Gaussian elimination. Computers usually solve square systems of linear equations using LU decomposition, and it is also a key step when inverting a matrix or computing the determinant of a matrix. The LU decomposition was introduced by the Polish astronomer Tadeusz Banachiewicz in 1938. To quote: "It appears that Gauss and Doolittle applied the method [of elimination] only to symmetric equations. More recent authors, for example, Aitken, Banachiewicz, Dwyer, and Crout … have emphasized the use of the method, or variations of it, in connection with non-symmetric problems … Banachiewicz … saw the point … that the basic problem is really one of matrix factorization, or “decomposition” as he called it." It's also referred to as LR decomposition.

In numerical mathematics, hierarchical matrices (H-matrices) are used as data-sparse approximations of non-sparse matrices. While a sparse matrix of dimension can be represented efficiently in units of storage by storing only its non-zero entries, a non-sparse matrix would require units of storage, and using this type of matrices for large problems would therefore be prohibitively expensive in terms of storage and computing time. Hierarchical matrices provide an approximation requiring only units of storage, where is a parameter controlling the accuracy of the approximation. In typical applications, e.g., when discretizing integral equations, preconditioning the resulting systems of linear equations, or solving elliptic partial differential equations, a rank proportional to with a small constant is sufficient to ensure an accuracy of . Compared to many other data-sparse representations of non-sparse matrices, hierarchical matrices offer a major advantage: the results of matrix arithmetic operations like matrix multiplication, factorization or inversion can be approximated in operations, where

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

In mathematical optimization, the problem of non-negative least squares (NNLS) is a type of constrained least squares problem where the coefficients are not allowed to become negative. That is, given a matrix A and a (column) vector of response variables y, the goal is to find

Robust Principal Component Analysis (RPCA) is a modification of the widely used statistical procedure of principal component analysis (PCA) which works well with respect to grossly corrupted observations. A number of different approaches exist for Robust PCA, including an idealized version of Robust PCA, which aims to recover a low-rank matrix L0 from highly corrupted measurements M = L0 +S0. This decomposition in low-rank and sparse matrices can be achieved by techniques such as Principal Component Pursuit method (PCP), Stable PCP, Quantized PCP, Block based PCP, and Local PCP. Then, optimization methods are used such as the Augmented Lagrange Multiplier Method (ALM), Alternating Direction Method (ADM), Fast Alternating Minimization (FAM), Iteratively Reweighted Least Squares (IRLS ) or alternating projections (AP).

<span class="mw-page-title-main">Extreme learning machine</span> Type of artificial neural network

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

<span class="mw-page-title-main">Haesun Park</span> South Korean American mathematician

Haesun Park is a professor and chair of Computational Science and Engineering at the Georgia Institute of Technology. She is an IEEE Fellow, ACM Fellow, and Society for Industrial and Applied Mathematics Fellow. Park's main areas of research are Numerical Algorithms, Data Analysis, Visual Analytics and Parallel Computing. She has co-authored over 100 articles in peer-reviewed journals and conferences.

<span class="mw-page-title-main">Sparse dictionary learning</span> Representation learning method

Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

Andrzej Cichocki is a Polish computer scientist, electrical engineer and a professor at the Systems Research Institute of Polish Academy of Science, Warsaw, Poland and a visiting professor in several universities and research institutes, especially Riken AIP, Japan. He is most noted for his learning algorithms for  Signal separation (BSS), Independent Component Analysis (ICA), Non-negative matrix factorization (NMF), tensor decomposition,  Deep (Multilayer) Matrix Factorizations for ICA, NMF, PCA, neural networks for optimization and signal processing, Tensor network for Machine Learning and Big Data, and brain–computer interfaces. He is the author of several monographs/books and more than 500 scientific peer-reviewed articles.

Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices. This family of methods became widely known during the Netflix prize challenge due to its effectiveness as reported by Simon Funk in his 2006 blog post, where he shared his findings with the research community. The prediction results can be improved by assigning different regularization weights to the latent factors based on items' popularity and users' activeness.