Functional regression

Last updated

Functional regression is a version of regression analysis when responses or covariates include functional data. Functional regression models can be classified into four types depending on whether the responses or covariates are functional or scalar: (i) scalar responses with functional covariates, (ii) functional responses with scalar covariates, (iii) functional responses with functional covariates, and (iv) scalar or functional responses with functional and scalar covariates. In addition, functional regression models can be linear, partially linear, or nonlinear. In particular, functional polynomial models, functional single and multiple index models and functional additive models are three special cases of functional nonlinear models.

Contents

Functional linear models (FLMs)

Functional linear models (FLMs) are an extension of linear models (LMs). A linear model with scalar response and scalar covariates can be written as

where denotes the inner product in Euclidean space, and denote the regression coefficients, and is a random error with mean zero and finite variance. FLMs can be divided into two types based on the responses.

Functional linear models with scalar responses

Functional linear models with scalar responses can be obtained by replacing the scalar covariates and the coefficient vector in model ( 1 ) by a centered functional covariate and a coefficient function with domain , respectively, and replacing the inner product in Euclidean space by that in Hilbert space ,

where here denotes the inner product in . One approach to estimating and is to expand the centered covariate and the coefficient function in the same functional basis, for example, B-spline basis or the eigenbasis used in the KarhunenLoève expansion. Suppose is an orthonormal basis of . Expanding and in this basis, , , model ( 2 ) becomes For implementation, regularization is needed and can be done through truncation, penalization or penalization. [1] In addition, a reproducing kernel Hilbert space (RKHS) approach can also be used to estimate and in model ( 2 ) [2]

Adding multiple functional and scalar covariates, model ( 2 ) can be extended to

where are scalar covariates with , are regression coefficients for , respectively, is a centered functional covariate given by , is regression coefficient function for , and is the domain of and , for . However, due to the parametric component , the estimation methods for model ( 2 ) cannot be used in this case [3] and alternative estimation methods for model ( 3 ) are available. [4] [5]

Functional linear models with functional responses

For a functional response with domain and a functional covariate with domain , two FLMs regressing on have been considered. [3] [6] One of these two models is of the form

where is still the centered functional covariate, and are coefficient functions, and is usually assumed to be a random process with mean zero and finite variance. In this case, at any given time , the value of , i.e., , depends on the entire trajectory of . Model ( 4 ), for any given time , is an extension of multivariate linear regression with the inner product in Euclidean space replaced by that in . An estimating equation motivated by multivariate linear regression is where , is defined as with for . [3] Regularization is needed and can be done through truncation, penalization or penalization. [1] Various estimation methods for model ( 4 ) are available. [7] [8]
When and are concurrently observed, i.e., , [9] it is reasonable to consider a historical functional linear model, where the current value of only depends on the history of , i.e., for in model ( 4 ). [3] [10] A simpler version of the historical functional linear model is the functional concurrent model (see below).
Adding multiple functional covariates, model ( 4 ) can be extended to

where for , is a centered functional covariate with domain , and is the corresponding coefficient function with the same domain, respectively. [3] In particular, taking as a constant function yields a special case of model ( 5 ) which is a FLM with functional responses and scalar covariates.

Functional concurrent models

Assuming that , another model, known as the functional concurrent model, sometimes also referred to as the varying-coefficient model, is of the form

where and are coefficient functions. Note that model ( 6 ) assumes the value of at time , i.e., , only depends on that of at the same time, i.e., . Various estimation methods can be applied to model ( 6 ). [11] [12] [13]
Adding multiple functional covariates, model ( 6 ) can also be extended to where are multiple functional covariates with domain and are the coefficient functions with the same domain. [3]

Functional nonlinear models

Functional polynomial models

Functional polynomial models are an extension of the FLMs with scalar responses, analogous to extending linear regression to polynomial regression. For a scalar response and a functional covariate with domain , the simplest example of functional polynomial models is functional quadratic regression [14] where is the centered functional covariate, is a scalar coefficient, and are coefficient functions with domains and , respectively, and is a random error with mean zero and finite variance. By analogy to FLMs with scalar responses, estimation of functional polynomial models can be obtained through expanding both the centered covariate and the coefficient functions and in an orthonormal basis. [14]

Functional single and multiple index models

A functional multiple index model is given by Taking yields a functional single index model. However, for , this model is problematic due to curse of dimensionality. With and relatively small sample sizes, the estimator given by this model often has large variance. [15] An alternative -component functional multiple index model can be expressed as Estimation methods for functional single and multiple index models are available. [15] [16]

Functional additive models (FAMs)

Given an expansion of a functional covariate with domain in an orthonormal basis : , a functional linear model with scalar responses shown in model ( 2 ) can be written as One form of FAMs is obtained by replacing the linear function of , i.e., , by a general smooth function , where satisfies for . [3] [17] Another form of FAMs consists of a sequence of time-additive models: where is a dense grid on with increasing size , and with a smooth function, for [3] [18]

Extensions

A direct extension of FLMs with scalar responses shown in model ( 2 ) is to add a link function to create a generalized functional linear model (GFLM) by analogy to extending linear regression to generalized linear regression (GLM), of which the three components are:

  1. Linear predictor ;
  2. Variance function , where is the conditional mean;
  3. Link function connecting the conditional mean and the linear predictor through .

See also

Related Research Articles

In statistics, the term linear model refers to any model which assumes linearity in the system. The most common occurrence is in connection with regression models and the term is often taken as synonymous with linear regression model. However, the term is also used in time series analysis with a different meaning. In each case, the designation "linear" is used to identify a subclass of models for which substantial reduction in the complexity of the related statistical theory is possible.

In mathematics, a product is the result of multiplication, or an expression that identifies objects to be multiplied, called factors. For example, 21 is the product of 3 and 7, and is the product of and . When one factor is an integer, the product is called a multiple.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In statistics, semiparametric regression includes regression models that combine parametric and nonparametric models. They are often used in situations where the fully nonparametric model may not perform well or when the researcher wants to use a parametric model but the functional form with respect to a subset of the regressors or the density of the errors is not known. Semiparametric regression models are a particular type of semiparametric modelling and, since semiparametric models contain a parametric component, they rely on parametric assumptions and may be misspecified and inconsistent, just like a fully parametric model.

In statistics, the Ramsey Regression Equation Specification Error Test (RESET) test is a general specification test for the linear regression model. More specifically, it tests whether non-linear combinations of the explanatory variables help to explain the response variable. The intuition behind the test is that if non-linear combinations of the explanatory variables have any power in explaining the response variable, the model is misspecified in the sense that the data generating process might be better approximated by a polynomial or another non-linear functional form.


In statistics, the variance inflation factor (VIF) is the ratio (quotient) of the variance of a parameter estimate when fitting a full model that includes other parameters to the variance of the parameter estimate if the model is fit with only the parameter on its own. The VIF provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modeled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. The lasso method assumes that the coefficients of the linear model are sparse, meaning that few of them are non-zero. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

<span class="mw-page-title-main">Errors-in-variables models</span> Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In mathematics, the Kodaira–Spencer map, introduced by Kunihiko Kodaira and Donald C. Spencer, is a map associated to a deformation of a scheme or complex manifold X, taking a tangent space of a point of the deformation space to the first cohomology group of the sheaf of vector fields on X.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

In statistical analysis, the standard framework of varying coefficient models, where the current value of a response process is modeled in dependence on the current value of a predictor process, is disadvantageous when it is assumed that past and present values of the predictor process influence current response. In contrast to these approaches, the history index model includes the effect of recent past values of the predictor through the history index function. Specifically, the influence of past predictor values is modeled by a smooth history index functions, while the effects on the response are described by smooth varying coefficient functions.

References

  1. 1 2 Morris, Jeffrey S. (2015). "Functional Regression". Annual Review of Statistics and Its Application . 2 (1): 321–359. arXiv: 1406.4068 . Bibcode:2015AnRSA...2..321M. doi:10.1146/annurev-statistics-010814-020413. S2CID   18637009.
  2. Yuan and Cai (2010). "A reproducing kernel Hilbert space approach to functional linear regression". The Annals of Statistics. 38 (6):34123444. doi:10.1214/09-AOS772.
  3. 1 2 3 4 5 6 7 8 Wang, Jane-Ling; Chiou, Jeng-Min; Müller, Hans-Georg (2016). "Functional Data Analysis". Annual Review of Statistics and Its Application . 3 (1): 257–295. Bibcode:2016AnRSA...3..257W. doi: 10.1146/annurev-statistics-041715-033624 .
  4. Kong, Xue, Yao and Zhang (2016). "Partially functional linear regression in high dimensions". Biometrika. 103 (1):147159. doi:10.1093/biomet/asv062.
  5. Hu, Wang and Carroll (2004). "Profile-kernel versus backfitting in the partially linear models for longitudinal/clustered data". Biometrika. 91 (2): 251262. doi:10.1093/biomet/91.2.251.
  6. Ramsay and Silverman (2005). Functional data analysis, 2nd ed., New York: Springer, ISBN   0-387-40080-X.
  7. Ramsay and Dalzell (1991). "Some tools for functional data analysis". Journal of the Royal Statistical Society. Series B (Methodological). 53 (3):539572. https://www.jstor.org/stable/2345586.
  8. Yao, Müller and Wang (2005). "Functional linear regression analysis for longitudinal data". The Annals of Statistics. 33 (6):28732903. doi:10.1214/009053605000000660.
  9. Grenander (1950). "Stochastic processes and statistical inference". Arkiv Matematik. 1 (3):195277. doi:10.1007/BF02590638.
  10. Malfait and Ramsay (2003). "The historical functional linear model". Canadian Journal of Statistics. 31 (2):115128. doi:10.2307/3316063.
  11. Fan and Zhang (1999). "Statistical estimation in varying coefficient models". The Annals of Statistics. 27 (5):14911518. doi:10.1214/aos/1017939139.
  12. Huang, Wu and Zhou (2004). "Polynomial spline estimation and inference for varying coefficient models with longitudinal data". Biometrika. 14 (3):763788. https://www.jstor.org/stable/24307415.
  13. Şentürk and Müller (2010). "Functional varying coefficient models for longitudinal data". Journal of the American Statistical Association. 105 (491):12561264. doi:10.1198/jasa.2010.tm09228.
  14. 1 2 Yao and Müller (2010). "Functional quadratic regression". Biometrika. 97 (1):4964. doi:10.1093/biomet/asp069.
  15. 1 2 Chen, Hall and Müller (2011). "Single and multiple index functional regression models with nonparametric link". The Annals of Statistics. 39 (3):17201747. doi:10.1214/11-AOS882.
  16. Jiang and Wang (2011). "Functional single index models for longitudinal data". 39 (1):362388. doi:10.1214/10-AOS845.
  17. Müller and Yao (2008). "Functional additive models". Journal of the American Statistical Association. 103 (484):15341544. doi:10.1198/016214508000000751.
  18. Fan, James and Radchenko (2015). "Functional additive regression". The Annals of Statistics. 43 (5):22962325. doi:10.1214/15-AOS1346.