Functional principal component analysis

Last updated

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L2 that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, [1] or in functional regression [2] and classification.

Contents

Formulation

For a square-integrable stochastic process X(t), t ∈ 𝒯, let

and

where are the eigenvalues and , , ... are the orthonormal eigenfunctions of the linear Hilbert–Schmidt operator

By the Karhunen–Loève theorem, one can express the centered process in the eigenbasis,

where

is the principal component associated with the k-th eigenfunction , with the properties

The centered process is then equivalent to ξ1, ξ2, .... A common assumption is that X can be represented by only the first few eigenfunctions (after subtracting the mean function), i.e.

where

Interpretation of eigenfunctions

The first eigenfunction depicts the dominant mode of variation of X.

where

The k-th eigenfunction is the dominant mode of variation orthogonal to , , ... , ,

where

Estimation

Let Yij = Xi(tij) + εij be the observations made at locations (usually time points) tij, where Xi is the i-th realization of the smooth stochastic process that generates the data, and εij are identically and independently distributed normal random variable with mean 0 and variance σ2, j = 1, 2, ..., mi. To obtain an estimate of the mean function μ(tij), if a dense sample on a regular grid is available, one may take the average at each location tij:

If the observations are sparse, one needs to smooth the data pooled from all observations to obtain the mean estimate, [3] using smoothing methods like local linear smoothing or spline smoothing.

Then the estimate of the covariance function is obtained by averaging (in the dense case) or smoothing (in the sparse case) the raw covariances

Note that the diagonal elements of Gi should be removed because they contain measurement error. [4]

In practice, is discretized to an equal-spaced dense grid, and the estimation of eigenvalues λk and eigenvectors vk is carried out by numerical linear algebra. [5] The eigenfunction estimates can then be obtained by interpolating the eigenvectors

The fitted covariance should be positive definite and symmetric and is then obtained as

Let be a smoothed version of the diagonal elements Gi(tij, tij) of the raw covariance matrices. Then is an estimate of (G(t, t) + σ2). An estimate of σ2 is obtained by

if otherwise

If the observations Xij, j=1, 2, ..., mi are dense in 𝒯, then the k-th FPC ξk can be estimated by numerical integration, implementing

However, if the observations are sparse, this method will not work. Instead, one can use best linear unbiased predictors, [3] yielding

where

,

and is evaluated at the grid points generated by tij, j = 1, 2, ..., mi. The algorithm, PACE, has an available Matlab package [6] and R package [7]

Asymptotic convergence properties of these estimates have been investigated. [3] [8] [9]

Applications

FPCA can be applied for displaying the modes of functional variation, [1] [10] in scatterplots of FPCs against each other or of responses against FPCs, for modeling sparse longitudinal data, [3] or for functional regression and classification (e.g., functional linear regression). [2] Scree plots and other methods can be used to determine the number of components included. Functional Principal component analysis has varied applications in time series analysis. At present, this method is being adapted from traditional multivariate techniques to analyze financial data sets such as stock market indices and generate implied volatility graphs. [11] A good example of advantages of the functional approach is the Smoothed FPCA (SPCA), developed by Silverman [1996] and studied by Pezzulli and Silverman [1993], that enables direct combination of FPCA along with a general smoothing approach that makes using the information stored in some linear differential operators possible. An important application of the FPCA already known from multivariate PCA is motivated by the Karhunen-Loève decomposition of a random function to the set of functional parameters – factor functions and corresponding factor loadings (scalar random variables). This application is much more important than in the standard multivariate PCA since the distribution of the random function is in general too complex to be directly analyzed and the Karhunen-Loève decomposition reduces the analysis to the interpretation of the factor functions and the distribution of scalar random variables. Due to dimensionality reduction as well as its accuracy to represent data, there is a wide scope for further developments of functional principal component techniques in the financial field.

Connection with principal component analysis

The following table shows a comparison of various elements of principal component analysis (PCA) and FPCA. The two methods are both used for dimensionality reduction. In implementations, FPCA uses a PCA step.

However, PCA and FPCA differ in some critical aspects. First, the order of multivariate data in PCA can be permuted, which has no effect on the analysis, but the order of functional data carries time or space information and cannot be reordered. Second, the spacing of observations in FPCA matters, while there is no spacing issue in PCA. Third, regular PCA does not work for high-dimensional data without regularization, while FPCA has a built-in regularization due to the smoothness of the functional data and the truncation to a finite number of included components.

ElementIn PCAIn FPCA
Data
Dimension
Mean
Covariance
Eigenvalues
Eigenvectors/Eigenfunctions
Inner Product
Principal Components

See also

Notes

  1. 1 2 Jones, M. C.; Rice, J. A. (1992). "Displaying the Important Features of Large Collections of Similar Curves". The American Statistician. 46 (2): 140. doi:10.1080/00031305.1992.10475870.
  2. 1 2 Yao, F.; Müller, H. G.; Wang, J. L. (2005). "Functional linear regression analysis for longitudinal data". The Annals of Statistics. 33 (6): 2873. arXiv: math/0603132 . doi:10.1214/009053605000000660.
  3. 1 2 3 4 Yao, F.; Müller, H. G.; Wang, J. L. (2005). "Functional Data Analysis for Sparse Longitudinal Data". Journal of the American Statistical Association. 100 (470): 577. doi:10.1198/016214504000001745.
  4. Staniswalis, J. G.; Lee, J. J. (1998). "Nonparametric Regression Analysis of Longitudinal Data". Journal of the American Statistical Association. 93 (444): 1403. doi:10.1080/01621459.1998.10473801.
  5. Rice, John; Silverman, B. (1991). "Estimating the Mean and Covariance Structure Nonparametrically When the Data are Curves". Journal of the Royal Statistical Society. Series B (Methodological). 53 (1): 233–243. doi:10.1111/j.2517-6161.1991.tb01821.x.
  6. "PACE: Principal Analysis by Conditional Expectation".
  7. "fdapace: Functional Data Analysis and Empirical Dynamics". 2018-02-25.
  8. Hall, P.; Müller, H. G.; Wang, J. L. (2006). "Properties of principal component methods for functional and longitudinal data analysis". The Annals of Statistics. 34 (3): 1493. arXiv: math/0608022 . doi:10.1214/009053606000000272.
  9. Li, Y.; Hsing, T. (2010). "Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data". The Annals of Statistics. 38 (6): 3321. arXiv: 1211.2137 . doi:10.1214/10-AOS813.
  10. Madrigal, Pedro; Krajewski, Paweł (2015). "Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform". BioData Mining. 8: 20. doi: 10.1186/s13040-015-0051-7 . PMC   4488123 . PMID   26140054.
  11. Functional Data Analysis with Applications in Finance by Michal Benko

Related Research Articles

<span class="mw-page-title-main">Fourier transform</span> Mathematical transform that expresses a function of time as a function of frequency

In physics, engineering and mathematics, the Fourier transform (FT) is an integral transform that converts a function into a form that describes the frequencies present in the original function. The output of the transform is a complex-valued function of frequency. The term Fourier transform refers to both this complex-valued function and the mathematical operation. When a distinction needs to be made the Fourier transform is sometimes called the frequency domain representation of the original function. The Fourier transform is analogous to decomposing the sound of a musical chord into the intensities of its constituent pitches.

<span class="mw-page-title-main">Noether's theorem</span> Statement relating differentiable symmetries to conserved quantities

Noether's theorem states that every continuous symmetry of the action of a physical system with conservative forces has a corresponding conservation law. This is the first of two theorems proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries of physical space.

In mathematics, specifically functional analysis, Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This theorem, presented in, is one of the most notable results of the work of James Mercer (1883–1932). It is an important theoretical tool in the theory of integral equations; it is used in the Hilbert space theory of stochastic processes, for example the Karhunen–Loève theorem; and it is also used in the reproducing kernel Hilbert space theory where it characterizes a symmetric positive-definite kernel as a reproducing kernel.

<span class="mw-page-title-main">Schwinger–Dyson equation</span> Equations for correlation functions in QFT

The Schwinger–Dyson equations (SDEs) or Dyson–Schwinger equations, named after Julian Schwinger and Freeman Dyson, are general relations between correlation functions in quantum field theories (QFTs). They are also referred to as the Euler–Lagrange equations of quantum field theories, since they are the equations of motion corresponding to the Green's function. They form a set of infinitely many functional differential equations, all coupled to each other, sometimes referred to as the infinite tower of SDEs.

In mathematics, particularly in functional analysis, a projection-valued measure is a function defined on certain subsets of a fixed set and whose values are self-adjoint projections on a fixed Hilbert space. A projection-valued measure (PVM) is formally similar to a real-valued measure, except that its values are self-adjoint projections rather than real numbers. As in the case of ordinary measures, it is possible to integrate complex-valued functions with respect to a PVM; the result of such an integration is a linear operator on the given Hilbert space.

In mathematics, the Gauss–Kuzmin–Wirsing operator is the transfer operator of the Gauss map that takes a positive number to the fractional part of its reciprocal. It is named after Carl Gauss, Rodion Kuzmin, and Eduard Wirsing. It occurs in the study of continued fractions; it is also related to the Riemann zeta function.

In quantum field theory, a quartic interaction is a type of self-interaction in a scalar field. Other types of quartic interactions may be found under the topic of four-fermion interactions. A classical free scalar field satisfies the Klein–Gordon equation. If a scalar field is denoted , a quartic interaction is represented by adding a potential energy term to the Lagrangian density. The coupling constant is dimensionless in 4-dimensional spacetime.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In statistics and probability theory, a point process or point field is a collection of mathematical points randomly located on a mathematical space such as the real line or Euclidean space. Point processes can be used for spatial data analysis, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

In operator theory, a bounded operator T: XY between normed vector spaces X and Y is said to be a contraction if its operator norm ||T || ≤ 1. This notion is a special case of the concept of a contraction mapping, but every bounded operator becomes a contraction after suitable scaling. The analysis of contractions provides insight into the structure of operators, or a family of operators. The theory of contractions on Hilbert space is largely due to Béla Szőkefalvi-Nagy and Ciprian Foias.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In mathematics, the Plancherel theorem for spherical functions is an important result in the representation theory of semisimple Lie groups, due in its final form to Harish-Chandra. It is a natural generalisation in non-commutative harmonic analysis of the Plancherel formula and Fourier inversion formula in the representation theory of the group of real numbers in classical harmonic analysis and has a similarly close interconnection with the theory of differential equations. It is the special case for zonal spherical functions of the general Plancherel theorem for semisimple Lie groups, also proved by Harish-Chandra. The Plancherel theorem gives the eigenfunction expansion of radial functions for the Laplacian operator on the associated symmetric space X; it also gives the direct integral decomposition into irreducible representations of the regular representation on L2(X). In the case of hyperbolic space, these expansions were known from prior results of Mehler, Weyl and Fock.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.

In statistics, functional additive models (FAM) can be viewed as extensions of generalized functional linear models where the linearity assumption between the response and the functional linear predictor is replaced by an additivity assumption.

In statistics, modes of variation are a continuously indexed set of vectors or functions that are centered at a mean and are used to depict the variation in a population or sample. Typically, variation patterns in the data can be decomposed in descending order of eigenvalues with the directions represented by the corresponding eigenvectors or eigenfunctions. Modes of variation provide a visualization of this decomposition and an efficient description of variation around the mean. Both in principal component analysis (PCA) and in functional principal component analysis (FPCA), modes of variation play an important role in visualizing and describing the variation in the data contributed by each eigencomponent. In real-world applications, the eigencomponents and associated modes of variation aid to interpret complex data, especially in exploratory data analysis (EDA).

In functional analysis, double operator integrals (DOI) are integrals of the form

Distributional data analysis is a branch of nonparametric statistics that is related to functional data analysis. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that the space of probability distributions is, while a convex space, is not a vector space.

References