Generalized functional linear model

Last updated November 25, 2024

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types (continuous or discrete) on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function $X$ with a smooth parameter function $\beta$ . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.^[1]

Overview

A key aspect of GFLM is estimation and inference for the smooth parameter function $\beta$ which is usually obtained by dimension reduction of the infinite dimensional functional predictor. A common method is to expand the predictor function $X$ in an orthonormal basis of L² space, the Hilbert space of square integrable functions with the simultaneous expansion of the parameter function in the same basis. This representation is then combined with a truncation step to reduce the contribution of the parameter function $\beta$ in the linear predictor to a finite number of regression coefficients. Functional principal component analysis (FPCA) that employs the Karhunen–Loève expansion is a common and parsimonious approach to accomplish this. Other orthogonal expansions, like Fourier expansions and B-spline expansions may also be employed for the dimension reduction step. The Akaike information criterion (AIC) can be used for selecting the number of included components. Minimization of cross-validation prediction errors is another criterion often used in classification applications. Once the dimension of the predictor process has been reduced, the simplified linear predictor allows to use GLM and quasi-likelihood estimation techniques to obtain estimates of the finite dimensional regression coefficients which in turn provide an estimate of the parameter function $\beta$ in the GFLM.

Model components

Linear predictor

The predictor functions $\textstyle X(t),t\in T$ , typically are square integrable stochastic processes on a real interval $T$ and the unknown smooth parameter function $\beta (t),t\in T$ , is assumed to be square integrable on $T$ . Given a real measure $dw$ on $T$ , the linear predictor is given by $\eta =\alpha +\int X^{c}(t)\beta (t)\,dw(t)$ where $X^{c}(t)=X(t)-{\text{E}}(X(t))$ is the centered predictor process and $\alpha$ is a scalar that serves as an intercept.

Response variable and variance function

The outcome $Y$ is typically a real valued random variable which may be either continuous or discrete. Often the conditional distribution of $Y$ given the predictor process is specified within the exponential family. However it is also sufficient to consider the functional quasi-likelihood set up, where instead of the distribution of the response one specifies the conditional variance function, ${\rm {{Var}(Y\mid X)=\sigma ^{2}(\mu )}}$ , as a function of the conditional mean, ${\rm {{E}(Y\mid X)=\mu }}$ .

Link function

The link function $g$ is a smooth invertible function, that relates the conditional mean of the response ${\rm {{E}(Y\mid X)=\mu }}$ with the linear predictor $\eta =\alpha +\int X^{c}(t)\beta (t)\,dw(t)$ . The relationship is given by $\mu =g(\eta )$ .

Formulation

In order to implement the necessary dimension reduction, the centered predictor process $X^{c}(t)$ and the parameter function $\beta (t)$ are expanded as,

X^{c}(t)=\sum _{j=1}^{\infty }\xi _{j}\rho _{j}(t){\text{ and }}\beta (t)=\sum _{j=1}^{\infty }\beta _{j}\rho _{j}(t),

where $\rho _{j},j=1,2,\ldots$ is an orthonormal basis of the function space $L^{2}(dw),$ such that $\int _{T}\rho _{j}(t)\rho _{k}(t)\,dw(t)=\delta _{jk}$ where $\delta _{jk}=1$ if $j=k$ and $0$ otherwise.

The random variables $\xi _{j}$ are given by $\xi _{j}=\int X^{c}(t)\rho _{j}(t)\,dw(t)$ and the coefficients $\beta _{j}$ as $\beta _{j}=\int \beta (t)\rho _{j}(t)\,dw(t)$ for $j=1,2,\ldots$ .

${\text{E}}(\xi _{j})=0$ and $\sum _{j=1}^{\infty }\beta _{j}^{2}<\infty$ and denoting $\sigma _{j}^{2}={\text{Var}}(\xi _{j})={\text{E}}(\xi _{j}^{2})$ , so $\sum _{j=1}^{\infty }\sigma _{j}^{2}=\int {\text{E}}(X^{c}(t))^{2}\,dw(t)<\infty$ .

From the orthonormality of the basis functions $\rho _{j}$ , it follows immediately that $\int X^{c}(t)\beta (t)\,dw(t)=\sum _{j=1}^{\infty }\beta _{j}\xi _{j}$ .

The key step is then approximating $\eta =\alpha +\int X^{c}(t)\beta (t)\,dw(t)=\alpha +\sum _{j=1}^{\infty }\beta _{j}\xi _{j}$ by $\eta \approx \alpha +\sum _{j=1}^{p}\beta _{j}\xi _{j}$ for a suitably chosen truncation point $p$ .

FPCA gives the most parsimonious approximation of the linear predictor for a given number of basis functions as the eigenfunction basis explains more of the variation than any other set of basis functions.

For a differentiable link function with bounded first derivative, the approximation error of the $p$ -truncated model i.e. the linear predictor truncated to the summation of the first $p$ components, is a constant multiple of ${\text{Var}}(\sum _{j=p+1}^{\infty }\beta _{j}\xi _{j})={\text{E}}\left(\left(\sum _{j=p+1}^{\infty }\beta _{j}\xi _{j}\right)^{2}\right)=\sum _{j=p+1}^{\infty }\beta _{j}\sigma _{j}^{2}$ .

A heuristic motivation for the truncation strategy derives from the fact that ${\text{E}}\left(\left(\sum _{j=p+1}^{\infty }\beta _{j}\xi _{j}\right)^{2}\right)=\sum _{j=p+1}^{\infty }\beta _{j}\sigma _{j}^{2}\leq \sum _{j=p+1}^{\infty }\beta _{j}^{2}\ \sum _{j=p+1}^{\infty }\sigma _{j}^{2}$ which is a consequence of the Cauchy–Schwarz inequality and by noting that the right hand side of the last inequality converges to 0 as $p\rightarrow \infty$ since both $\sum _{j=1}^{\infty }\beta _{j}^{2}$ and $\sum _{j=1}^{\infty }\sigma _{j}^{2}$ are finite.

For the special case of the eigenfunction basis, the sequence $\sigma _{j}^{2},j=1,2,\ldots$ corresponds to the sequence of the eigenvalues of the covariance kernel $G(s,t)={\text{Cov}}(X(s),X(t)),\ s,t\in T$ .

For data with $n$ i.i.d observations, setting $\xi _{j}^{0}=1$ , $\beta _{0}=\alpha$ and $\xi _{j}^{i}=\int X_{i}(t)\rho _{j}(t)\,dw(t)$ , the approximated linear predictors can be represented as $\eta _{i}=\sum _{j=0}^{p}\beta _{j}\xi _{j}^{i},i=1,2,\ldots ,n$ which are related to the means through $\mu _{i}=g(\eta _{i})$ .

Estimation

The main aim is to estimate the parameter function $\beta$ .

Once $p$ has been fixed, standard GLM and quasi-likelihood methods can be used for the $p$ -truncated model to estimate ${\boldsymbol {\beta }}^{T}=(\beta _{0},\beta _{1},\ldots ,\beta _{p})$ by solving the estimating equation or the score equation $U(\beta )=0.$

The vector valued score function turns out to be $U(\beta )=\sum _{i=1}^{n}(Y_{i}-\mu _{i})g'(\eta _{i})\xi _{i}/\sigma ^{2}(\mu _{i})$ which depends on ${\boldsymbol {\beta }}$ through $\mu$ and $\eta$ .

Just as in GLM, the equation $U(\beta )=0$ is solved using iterative methods like Newton–Raphson (NR) or Fisher scoring (FS) or iteratively reweighted least squares (IWLS) to get the estimate of the regression coefficients ${\boldsymbol {\hat {\beta }}}$ , leading to the estimate of the parameter function ${\hat {\beta }}(t)={\hat {\beta }}_{o}+\sum _{j=1}^{p}{\hat {\beta }}_{j}\rho _{j}(t)$ . When using the canonical link function, these methods are equivalent.

Results are available in the literature of $p$ -truncated models as $p\rightarrow \infty$ which provide asymptotic inference for the deviation of the estimated parametric function from the true parametric function and also asymptotic tests for regression effects and asymptotic confidence regions.

Exponential family response

If the response variable $Y_{i}$ , given $X_{i}\in L^{2}(T)$ follows the one parameter exponential family, then its probability density function or probability mass function (as the case may be) is

f(y_{i}\mid X_{i})=\exp \left({\frac {y_{i}\theta _{i}-b(\theta _{i})}{\phi }}+c(y_{i},\phi )\right)

for some functions $b$ and $c$ , where $\theta _{i}$ is the canonical parameter, and $\phi$ is a dispersion parameter which is typically assumed to be positive.

In the canonical set up, $\eta _{i}=\alpha +\int X_{i}^{c}(t)\beta (t)\,dw(t)=\theta _{i}$ and from the properties of exponential family,

\mu _{i}=b'(\theta _{i}),{\text{ and so }}\mu _{i}=b'(\eta _{i}).

Hence $b'$ serves as a link function and is called the canonical link function.

${\text{Var}}(y_{i})=\phi b''(\theta _{i})=\phi b''(\eta _{i})=\phi g'(\eta _{i})=\phi g'(g^{-1}(\mu _{i})))$ is the corresponding variance function and $\phi$ the dispersion parameter.

Special cases

Functional linear regression (FLR)

Functional linear regression, one of the most useful tools of functional data analysis, is an example of GFLM where the response variable is continuous and is often assumed to have a Normal distribution. The variance function is a constant function and the link function is identity. Under these assumptions the GFLM reduces to the FLR,

\mu =\operatorname {E} (Y\mid X)=\eta =\alpha +\int X^{c}(t)\beta (t)\,dw(t)

Without the normality assumption, the constant variance function motivates the use of quasi-normal techniques.

Functional binary regression

When the response variable has binary outcomes, i.e., 0 or 1, the distribution is usually chosen as Bernoulli, and then $\mu _{i}=P(Y_{i}=1\mid X_{i})$ . Popular link functions are the expit function, which is the inverse of the logit function (functional logistic regression) and the probit function (functional probit regression). Any cumulative distribution function F has range [0,1] which is the range of binomial mean and so can be chosen as a link function. Another link function in this context is the complementary log–log function, which is an asymmetric link. The variance function for binary data is given by $\operatorname {Var} (Y_{i})=\phi \mu _{i}(1-\mu _{i})$ where the dispersion parameter $\phi$ is taken as 1 or alternatively the quasi-likelihood approach is used.

Functional Poisson regression

Another special case of GFLM occurs when the outcomes are counts, so that the distribution of the responses is assumed to be Poisson. The mean $\mu _{i}$ is typically linked to the linear predictor $\eta _{i}$ via a log-link, which is also the canonical link . The variance function is $\operatorname {Var} (Y_{i})=\phi \mu _{i}$ , where the dispersion parameter $\phi$ is 1, except when the data might be over-dispersed which is when the quasi-Poisson approach is used.

Extensions

Extensions of GFLM have been proposed for the cases where there are multiple predictor functions.^[2] Another generalization is called the Semi Parametric Quasi-likelihood Regression (SPQR)^[1] which considers the situation where the link and the variance functions are unknown and are estimated non-parametrically from the data. This situation can also be handled by single or multiple index models, using for example Sliced Inverse Regression (SIR).

Another extension in this domain is Functional Generalized Additive Model (FGAM))^[3] which is a generalization of generalized additive model(GAM) where

g^{-1}(\operatorname {E} (Y\mid X))=\alpha +\sum _{j=1}^{p}f_{j}(\xi _{j}),

where $\xi _{j}$ are the expansion coefficients of the random predictor function $X$ and each $f_{j}$ is an unknown smooth function that has to be estimated and where ${\text{E}}(f_{j}(\xi _{j}))=0.$ .

In general, estimation in FGAM requires combining IWLS with backfitting. However, if the expansion coefficients are obtained as functional principal components, then in some cases (e.g. Gaussian predictor function $X$ ), they will be independent in which case backfitting is not needed, and one can use popular smoothing methods for estimating the unknown parameter functions $f_{j}$ .

Application

Related Research Articles

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In mathematics, particularly in functional analysis, a projection-valued measure is a function defined on certain subsets of a fixed set and whose values are self-adjoint projections on a fixed Hilbert space. A projection-valued measure (PVM) is formally similar to a real-valued measure, except that its values are self-adjoint projections rather than real numbers. As in the case of ordinary measures, it is possible to integrate complex-valued functions with respect to a PVM; the result of such an integration is a linear operator on the given Hilbert space.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

In physics, precisely in the study of the theory of general relativity and many alternatives to it, the post-Newtonian formalism is a calculational tool that expresses Einstein's (nonlinear) equations of gravity in terms of the lowest-order deviations from Newton's law of universal gravitation. This allows approximations to Einstein's equations to be made in the case of weak fields. Higher-order terms can be added to increase accuracy, but for strong fields, it may be preferable to solve the complete equations numerically. Some of these post-Newtonian approximations are expansions in a small parameter, which is the ratio of the velocity of the matter forming the gravitational field to the speed of light, which in this case is better called the speed of gravity. In the limit, when the fundamental speed of gravity becomes infinite, the post-Newtonian expansion reduces to Newton's law of gravity.

In physics and fluid mechanics, a Blasius boundary layer describes the steady two-dimensional laminar boundary layer that forms on a semi-infinite plate which is held parallel to a constant unidirectional flow. Falkner and Skan later generalized Blasius' solution to wedge flow, i.e. flows in which the plate is not parallel to the flow.

In many-body theory, the term Green's function is sometimes used interchangeably with correlation function, but refers specifically to correlators of field operators or creation and annihilation operators.

An $- superprocess,, within mathematics probability theory is a stochastic process on that is usually constructed as a special limit of near-critical branching diffusions.$

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

Functional principal component analysis (FPCA) is a statistical method for investigating the dominant modes of variation of functional data. Using this method, a random function is represented in the eigenbasis, which is an orthonormal basis of the Hilbert space L² that consists of the eigenfunctions of the autocovariance operator. FPCA represents functional data in the most parsimonious way, in the sense that when using a fixed number of basis functions, the eigenfunction basis explains more variation than any other basis expansion. FPCA can be applied for representing random functions, or in functional regression and classification.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

Functional regression is a version of regression analysis when responses or covariates include functional data. Functional regression models can be classified into four types depending on whether the responses or covariates are functional or scalar: (i) scalar responses with functional covariates, (ii) functional responses with scalar covariates, (iii) functional responses with functional covariates, and (iv) scalar or functional responses with functional and scalar covariates. In addition, functional regression models can be linear, partially linear, or nonlinear. In particular, functional polynomial models, functional single and multiple index models and functional additive models are three special cases of functional nonlinear models.

In statistics, functional additive models (FAM) can be viewed as extensions of generalized functional linear models where the linearity assumption between the response and the functional linear predictor is replaced by an additivity assumption.

The GHK algorithm is an importance sampling method for simulating choice probabilities in the multivariate probit model. These simulated probabilities can be used to recover parameter estimates from the maximized likelihood equation using any one of the usual well known maximization methods. Train has well documented steps for implementing this algorithm for a multinomial probit model. What follows here will apply to the binary multivariate probit model.

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

In physics and mathematics, the Klein–Kramers equation or sometimes referred as Kramers–Chandrasekhar equation is a partial differential equation that describes the probability density function $f$ of a Brownian particle in phase space $(r, p)$ . It is a special case of the Fokker–Planck equation.

Distributional data analysis is a branch of nonparametric statistics that is related to functional data analysis. It is concerned with random objects that are probability distributions, i.e., the statistical analysis of samples of random distributions where each atom of a sample is a distribution. One of the main challenges in distributional data analysis is that although the space of probability distributions is a convex space, it is not a vector space.

References

1 2 3 Muller and Stadtmuller (2005). "Generalized Functional Linear Models". The Annals of Statistics. 33 (2): 774–805. arXiv: math/0505638 . doi:10.1214/009053604000001156.
↑ James (2002). "Generalized linear models with functional predictors". Journal of the Royal Statistical Society, Series B. 64 (3): 411–432. CiteSeerX 10.1.1.165.1333 . doi:10.1111/1467-9868.00342.
↑ Muller and Yao (2008). "Functional Additive Models". Journal of the American Statistical Association. 103 (484): 1534–1544. doi:10.1198/016214508000000751.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Muller1-1] 1 2 3 Muller and Stadtmuller (2005). "Generalized Functional Linear Models". The Annals of Statistics. 33 (2): 774–805. arXiv: math/0505638 . doi:10.1214/009053604000001156.

[James-2] James (2002). "Generalized linear models with functional predictors". Journal of the Royal Statistical Society, Series B. 64 (3): 411–432. CiteSeerX 10.1.1.165.1333 . doi:10.1111/1467-9868.00342.

[3] Muller and Yao (2008). "Functional Additive Models". Journal of the American Statistical Association. 103 (484): 1534–1544. doi:10.1198/016214508000000751.

[1]

[2]

[3]