Tucker decomposition

Last updated

In mathematics, Tucker decomposition decomposes a tensor into a set of matrices and one small core tensor. It is named after Ledyard R. Tucker [1] although it goes back to Hitchcock in 1927. [2] Initially described as a three-mode extension of factor analysis and principal component analysis it may actually be generalized to higher mode analysis, which is also called higher-order singular value decomposition (HOSVD).

It may be regarded as a more flexible PARAFAC (parallel factor analysis) model. In PARAFAC the core tensor is restricted to be "diagonal".

In practice, Tucker decomposition is used as a modelling tool. For instance, it is used to model three-way (or higher way) data by means of relatively small numbers of components for each of the three or more modes, and the components are linked to each other by a three- (or higher-) way core array. The model parameters are estimated in such a way that, given fixed numbers of components, the modelled data optimally resemble the actual data in the least squares sense. The model gives a summary of the information in the data, in the same way as principal components analysis does for two-way data.

For a 3rd-order tensor , where is either or , Tucker Decomposition can be denoted as follows,

where is the core tensor, a 3rd-order tensor that contains the 1-mode, 2-mode and 3-mode singular values of , which are defined as the Frobenius norm of the 1-mode, 2-mode and 3-mode slices of tensor respectively. are unitary matrices in respectively. The j-mode product (j = 1, 2, 3) of by is denoted as with entries as

Taking for all is always sufficient to represent exactly, but often can be compressed or efficiently approximately by choosing . A common choice is , which can be effective when the difference in dimension sizes is large.

There are two special cases of Tucker decomposition:

Tucker1: if and are identity, then

Tucker2: if is identity, then .

RESCAL decomposition [3] can be seen as a special case of Tucker where is identity and is equal to .

See also

Related Research Articles

In mathematics, a product is the result of multiplication, or an expression that identifies factors to be multiplied. For example, 30 is the product of 6 and 5, and is the product of and .

Principal component analysis Method of data analysis

The principal components of a collection of points in a real coordinate space are a sequence of unit vectors, where the -th vector is the direction of a line that best fits the data while being orthogonal to the first vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.

Singular value decomposition Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any matrix. It is related to the polar decomposition.

Functional data analysis (FDA) is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

Autoencoder Neural network that learns efficient data encoding in an unsupervised manner

An autoencoder is a type of artificial neural network used to learn efficient codings of unlabeled data. The encoding is validated and refined by attempting to regenerate the input from the encoding. The autoencoder learns a representation (encoding) for a set of data, typically for dimensionality reduction, by training the network to ignore insignificant data (“noise”).

In mathematics, the Schwartz kernel theorem is a foundational result in the theory of generalized functions, published by Laurent Schwartz in 1952. It states, in broad terms, that the generalized functions introduced by Schwartz have a two-variable theory that includes all reasonable bilinear forms on the space of test functions. The space itself consists of smooth functions of compact support.

Singular spectrum analysis Nonparametric spectral estimation method

In time series analysis, singular spectrum analysis (SSA) is a nonparametric spectral estimation method. It combines elements of classical time series analysis, multivariate statistics, multivariate geometry, dynamical systems and signal processing. Its roots lie in the classical Karhunen (1946)–Loève spectral decomposition of time series and random fields and in the Mañé (1981)–Takens (1981) embedding theorem. SSA can be an aid in the decomposition of time series into a sum of components, each having a meaningful interpretation. The name "singular spectrum analysis" relates to the spectrum of eigenvalues in a singular value decomposition of a covariance matrix, and not directly to a frequency domain decomposition.

In multilinear algebra, the tensor rank decomposition or canonical polyadic decomposition (CPD) is one generalization of the matrix singular value decomposition (SVD) to tensors, which have found application in statistics, signal processing, computer vision, computer graphics, psychometrics, linguistics and chemometrics. The tensor rank decomposition was introduced by Frank Lauren Hitchcock in 1927 and later rediscovered several times, notably in psychometrics. For this reason, the tensor rank decomposition is often referred to as CANDECOMP, PARAFAC, or CANDECOMP/PARAFAC (CP).

In multilinear algebra, the higher-order singular value decomposition (HOSVD) of a tensor is a specific orthogonal Tucker decomposition. It may be regarded as one generalization of the matrix singular value decomposition. The HOSVD has applications in computer graphics, machine learning, scientific computing, and signal processing. Some key ingredients of the HOSVD can be traced as far back as F. L. Hitchcock in 1928, but it was L. R. Tucker who developed for third-order tensors the general Tucker decomposition in the 1960s, including the HOSVD. The HOSVD as decomposition in its own right was further advocated by L. De Lathauwer et al. in 2000. Robust and L1-norm-based variants of HOSVD have also been proposed.

In mathematics, the tensor product (TP) model transformation was proposed by Baranyi and Yam as key concept for higher-order singular value decomposition of functions. It transforms a function into TP function form if such a transformation is possible. If an exact transformation is not possible, then the method determines a TP function that is an approximation of the given function. Hence, the TP model transformation can provide a trade-off between approximation accuracy and complexity.

Multilinear subspace learning Approach to dimensionality reduction

Multilinear subspace learning is an approach to dimensionality reduction. Dimensionality reduction can be performed on a data tensor whose observations have been vectorized and organized into a data tensor, or whose observations are matrices that are concatenated into a data tensor. Here are some examples of data tensors whose observations are vectorized or whose observations are matrices concatenated into data tensor images (2D/3D), video sequences (3D/4D), and hyperspectral cubes (3D/4D).

Within statistics, Multilinear principal component analysis (MPCA) is a multilinear extension of principal component analysis (PCA). MPCA is employed in the analysis of n-way arrays, i.e. a cube or hyper-cube of numbers, also informally referred to as a "data tensor". N-way arrays may be decomposed, analyzed, or modeled by

A CUR matrix approximation is a set of three matrices that, when multiplied together, closely approximate a given matrix. A CUR approximation can be used in the same way as the low-rank approximation of the Singular value decomposition (SVD). CUR approximations are less accurate than the SVD, but they offer two key advantages, both stemming from the fact that the rows and columns come from the original matrix :

Baranyi and Yam proposed the TP model transformation as a new concept in quasi-LPV (qLPV) based control, which plays a central role in the highly desirable bridging between identification and polytopic systems theories. It is also used as a TS (Takagi-Sugeno) fuzzy model transformation. It is uniquely effective in manipulating the convex hull of polytopic forms, and, hence, has revealed and proved the fact that convex hull manipulation is a necessary and crucial step in achieving optimal solutions and decreasing conservativeness in modern linear matrix inequality based control theory. Thus, although it is a transformation in a mathematical sense, it has established a conceptually new direction in control theory and has laid the ground for further new approaches towards optimality.

Based on the key idea of higher-order singular value decomposition (HOSVD) in tensor algebra, Baranyi and Yam proposed the concept of HOSVD-based canonical form of TP functions and quasi-LPV system models. Szeidl et al. proved that the TP model transformation is capable of numerically reconstructing this canonical form.

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.

Sparse dictionary learning Representation learning method

Sparse coding is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

In signal processing, multidimensional empirical mode decomposition is an extension of the one-dimensional (1-D) EMD algorithm to a signal encompassing multiple dimensions. The Hilbert–Huang empirical mode decomposition (EMD) process decomposes a signal into intrinsic mode functions combined with the Hilbert spectral analysis, known as the Hilbert–Huang transform (HHT). The multidimensional EMD extends the 1-D EMD algorithm into multiple-dimensional signals. This decomposition can be applied to image processing, audio signal processing, and various other multidimensional signals.

L1-norm principal component analysis

L1-norm principal component analysis (L1-PCA) is a general method for multivariate data analysis. L1-PCA is often preferred over standard L2-norm principal component analysis (PCA) when the analyzed data may contain outliers.

In statistics, modes of variation are a continuously indexed set of vectors or functions that are centered at a mean and are used to depict the variation in a population or sample. Typically, variation patterns in the data can be decomposed in descending order of eigenvalues with the directions represented by the corresponding eigenvectors or eigenfunctions. Modes of variation provide a visualization of this decomposition and an efficient description of variation around the mean. Both in principal component analysis (PCA) and in functional principal component analysis (FPCA), modes of variation play an important role in visualizing and describing the variation in the data contributed by each eigencomponent. In real-world applications, the eigencomponents and associated modes of variation aid to interpret complex data, especially in exploratory data analysis (EDA).

References

  1. Ledyard R. Tucker (September 1966). "Some mathematical notes on three-mode factor analysis". Psychometrika . 31 (3): 279–311. doi:10.1007/BF02289464. PMID   5221127.
  2. F. L. Hitchcock (1927). "The expression of a tensor or a polyadic as a sum of products". Journal of Mathematics and Physics . 6: 164–189.
  3. Nickel, Maximilian; Tresp, Volker; Kriegel, Hans-Peter (28 June 2011). A Three-Way Model for Collective Learning on Multi-Relational Data. ICML. Vol. 11. pp. 809–816.