Modes of variation

Last updated

In statistics, modes of variation [1] are a continuously indexed set of vectors or functions that are centered at a mean and are used to depict the variation in a population or sample. Typically, variation patterns in the data can be decomposed in descending order of eigenvalues with the directions represented by the corresponding eigenvectors or eigenfunctions. Modes of variation provide a visualization of this decomposition and an efficient description of variation around the mean. Both in principal component analysis (PCA) and in functional principal component analysis (FPCA), modes of variation play an important role in visualizing and describing the variation in the data contributed by each eigencomponent. [2] In real-world applications, the eigencomponents and associated modes of variation aid to interpret complex data, especially in exploratory data analysis (EDA).

Contents

Formulation

Modes of variation are a natural extension of PCA and FPCA.

Modes of variation in PCA

If a random vector has the mean vector , and the covariance matrix with eigenvalues and corresponding orthonormal eigenvectors , by eigendecomposition of a real symmetric matrix, the covariance matrix can be decomposed as

where is an orthogonal matrix whose columns are the eigenvectors of , and is a diagonal matrix whose entries are the eigenvalues of . By the Karhunen–Loève expansion for random vectors, one can express the centered random vector in the eigenbasis

where is the principal component [3] associated with the -th eigenvector , with the properties

and

Then the -th mode of variation of is the set of vectors, indexed by ,

where is typically selected as .

Modes of variation in FPCA

For a square-integrable random function , where typically and is an interval, denote the mean function by , and the covariance function by

where are the eigenvalues and are the orthonormal eigenfunctions of the linear Hilbert–Schmidt operator

By the Karhunen–Loève theorem, one can express the centered function in the eigenbasis,

where

is the -th principal component with the properties

and

Then the -th mode of variation of is the set of functions, indexed by ,

that are viewed simultaneously over the range of , usually for . [2]

Estimation

The formulation above is derived from properties of the population. Estimation is needed in real-world applications. The key idea is to estimate mean and covariance.

Modes of variation in PCA

Suppose the data represent independent drawings from some -dimensional population with mean vector and covariance matrix . These data yield the sample mean vector , and the sample covariance matrix with eigenvalue-eigenvector pairs . Then the -th mode of variation of can be estimated by

Modes of variation in FPCA

Consider realizations of a square-integrable random function with the mean function and the covariance function . Functional principal component analysis provides methods for the estimation of and in detail, often involving point wise estimate and interpolation. Substituting estimates for the unknown quantities, the -th mode of variation of can be estimated by

Applications

The first and second modes of variation of female mortality data from 41 countries in 2003 Movfemale.svg
The first and second modes of variation of female mortality data from 41 countries in 2003
The first and second modes of variation of male mortality data from 41 countries in 2003 Movmale.svg
The first and second modes of variation of male mortality data from 41 countries in 2003

Modes of variation are useful to visualize and describe the variation patterns in the data sorted by the eigenvalues. In real-world applications, modes of variation associated with eigencomponents allow to interpret complex data, such as the evolution of function traits [5] and other infinite-dimensional data. [6] To illustrate how modes of variation work in practice, two examples are shown in the graphs to the right, which display the first two modes of variation. The solid curve represents the sample mean function. The dashed, dot-dashed, and dotted curves correspond to modes of variation with and , respectively.

The first graph displays the first two modes of variation of female mortality data from 41 countries in 2003. [4] The object of interest is log hazard function between ages 0 and 100 years. The first mode of variation suggests that the variation of female mortality is smaller for ages around 0 or 100, and larger for ages around 25. An appropriate and intuitive interpretation is that mortality around 25 is driven by accidental death, while around 0 or 100, mortality is related to congenital disease or natural death.

Compared to female mortality data, modes of variation of male mortality data shows higher mortality after around age 20, possibly related to the fact that life expectancy for women is higher than that for men.

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

<span class="mw-page-title-main">Stress–energy tensor</span> Tensor describing energy momentum density in spacetime

The stress–energy tensor, sometimes called the stress–energy–momentum tensor or the energy–momentum tensor, is a tensor physical quantity that describes the density and flux of energy and momentum in spacetime, generalizing the stress tensor of Newtonian physics. It is an attribute of matter, radiation, and non-gravitational force fields. This density and flux of energy and momentum are the sources of the gravitational field in the Einstein field equations of general relativity, just as mass density is the source of such a field in Newtonian gravity.

<span class="mw-page-title-main">Noether's theorem</span> Statement relating differentiable symmetries to conserved quantities

Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.

<span class="mw-page-title-main">Anti-de Sitter space</span> Maximally symmetric Lorentzian manifold with a negative cosmological constant

In mathematics and physics, n-dimensional anti-de Sitter space (AdSn) is a maximally symmetric Lorentzian manifold with constant negative scalar curvature. Anti-de Sitter space and de Sitter space are named after Willem de Sitter (1872–1934), professor of astronomy at Leiden University and director of the Leiden Observatory. Willem de Sitter and Albert Einstein worked together closely in Leiden in the 1920s on the spacetime structure of the universe. Paul Dirac was the first person to rigorously explore anti-de Sitter space, doing so in 1963.

A directional derivative is a concept in multivariable calculus that measures the rate at which a function changes in a particular direction at a given point.

In mathematics, particularly in functional analysis, a projection-valued measure (PVM) is a function defined on certain subsets of a fixed set and whose values are self-adjoint projections on a fixed Hilbert space. Projection-valued measures are formally similar to real-valued measures, except that their values are self-adjoint projections rather than real numbers. As in the case of ordinary measures, it is possible to integrate complex-valued functions with respect to a PVM; the result of such an integration is a linear operator on the given Hilbert space.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In theoretical physics, massive gravity is a theory of gravity that modifies general relativity by endowing the graviton with a nonzero mass. In the classical theory, this means that gravitational waves obey a massive wave equation and hence travel at speeds below the speed of light.

The history of Lorentz transformations comprises the development of linear transformations forming the Lorentz group or Poincaré group preserving the Lorentz interval and the Minkowski inner product .

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

<span class="mw-page-title-main">Mathematical descriptions of the electromagnetic field</span> Formulations of electromagnetism

There are various mathematical descriptions of the electromagnetic field that are used in the study of electromagnetism, one of the four fundamental interactions of nature. In this article, several approaches are discussed, although the equations are in terms of electric and magnetic fields, potentials, and charges with currents, generally speaking.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

A geometric stable distribution or geo-stable distribution is a type of leptokurtic probability distribution. Geometric stable distributions were introduced in Klebanov, L. B., Maniya, G. M., and Melamed, I. A. (1985). A problem of Zolotarev and analogs of infinitely divisible and stable distributions in a scheme for summing a random number of random variables. These distributions are analogues for stable distributions for the case when the number of summands is random, independent of the distribution of summand, and having geometric distribution. The geometric stable distribution may be symmetric or asymmetric. A symmetric geometric stable distribution is also referred to as a Linnik distribution. The Laplace distribution and asymmetric Laplace distribution are special cases of the geometric stable distribution. The Mittag-Leffler distribution is also a special case of a geometric stable distribution.

Stochastic portfolio theory (SPT) is a mathematical theory for analyzing stock market structure and portfolio behavior introduced by E. Robert Fernholz in 2002. It is descriptive as opposed to normative, and is consistent with the observed behavior of actual markets. Normative assumptions, which serve as a basis for earlier theories like modern portfolio theory (MPT) and the capital asset pricing model (CAPM), are absent from SPT.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

References

  1. Castro, P. E.; Lawton, W. H.; Sylvestre, E. A. (November 1986). "Principal Modes of Variation for Processes with Continuous Sample Curves". Technometrics. 28 (4): 329. doi:10.2307/1268982. ISSN   0040-1706. JSTOR   1268982.
  2. 1 2 Wang, Jane-Ling; Chiou, Jeng-Min; Müller, Hans-Georg (June 2016). "Functional Data Analysis". Annual Review of Statistics and Its Application . 3 (1): 257–295. doi: 10.1146/annurev-statistics-041715-033624 . ISSN   2326-8298.
  3. Kleffe, Jürgen (January 1973). "Principal components of random variables with values in a seperable hilbert space". Mathematische Operationsforschung und Statistik. 4 (5): 391–406. doi:10.1080/02331887308801137. ISSN   0047-6277.
  4. 1 2 3 "Human Mortality Database". www.mortality.org. Retrieved 2020-03-12.
  5. Kirkpatrick, Mark; Heckman, Nancy (August 1989). "A quantitative genetic model for growth, shape, reaction norms, and other infinite-dimensional characters". Journal of Mathematical Biology. 27 (4): 429–450. doi:10.1007/bf00290638. ISSN   0303-6812. PMID   2769086. S2CID   46336613.
  6. Jones, M. C.; Rice, John A. (May 1992). "Displaying the Important Features of Large Collections of Similar Curves". The American Statistician. 46 (2): 140–145. doi:10.1080/00031305.1992.10475870. ISSN   0003-1305.