Generalized linear mixed model

Last updated

In statistics, a generalized linear mixed model (GLMM) is an extension to the generalized linear model (GLM) in which the linear predictor contains random effects in addition to the usual fixed effects. [1] [2] [3] They also inherit from GLMs the idea of extending linear mixed models to non-normal data.

Contents

GLMMs provide a broad range of models for the analysis of grouped data, since the differences between groups can be modelled as a random effect. These models are useful in the analysis of many kinds of data, including longitudinal data. [4]

Model

GLMMs are generally defined such that, conditioned on the random effects , the dependent variable is distributed according to the exponential family with its expectation related to the linear predictor via a link function :

.

Here and are the fixed effects design matrix, and fixed effects respectively; and are the random effects design matrix and random effects respectively. To understand this very brief definition you will first need to understand the definition of a generalized linear model and of a mixed model.

Generalized linear mixed models are a special cases of hierarchical generalized linear models in which the random effects are normally distributed.

The complete likelihood [5]

has no general closed form, and integrating over the random effects is usually extremely computationally intensive. In addition to numerically approximating this integral(e.g. via Gauss–Hermite quadrature), methods motivated by Laplace approximation have been proposed. [6] For example, the penalized quasi-likelihood method, which essentially involves repeatedly fitting (i.e. doubly iterative) a weighted normal mixed model with a working variate, [7] is implemented by various commercial and open source statistical programs.

Fitting a model

Fitting GLMMs via maximum likelihood (as via AIC) involves integrating over the random effects. In general, those integrals cannot be expressed in analytical form. Various approximate methods have been developed, but none has good properties for all possible models and data sets (e.g. ungrouped binary data are particularly problematic). For this reason, methods involving numerical quadrature or Markov chain Monte Carlo have increased in use, as increasing computing power and advances in methods have made them more practical.

The Akaike information criterion (AIC) is a common criterion for model selection. Estimates of AIC for GLMMs based on certain exponential family distributions have recently been obtained. [8]

Software

See also

Related Research Articles

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.
<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that at most one subcomponent is Gaussian and that the subcomponents are statistically independent from each other. ICA was invented by Jeanny Hérault and Christian Jutten in 1985. ICA is a special case of blind source separation. A common example application of ICA is the "cocktail party problem" of listening in on one person's speech in a noisy room.

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regression models may be compactly written as

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Mixed models are often preferred over traditional analysis of variance regression models because of their flexibility in dealing with missing values and uneven spacing of repeated measurements. The Mixed model analysis allows measurements to be explicitly modeled in a wider variety of correlation and variance-covariance structures.

In statistics, model specification is part of the process of building a statistical model: specification consists of selecting an appropriate functional form for the model and choosing which variables to include. For example, given personal income together with years of schooling and on-the-job experience , we might specify a functional relationship as follows:

<span class="mw-page-title-main">Quantile regression</span> Statistics concept

Quantile regression is a type of regression analysis used in statistics and econometrics. Whereas the method of least squares estimates the conditional mean of the response variable across values of the predictor variables, quantile regression estimates the conditional median of the response variable. Quantile regression is an extension of linear regression used when the conditions of linear regression are not met.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

In statistics, hierarchical generalized linear models extend generalized linear models by relaxing the assumption that error components are independent. This allows models to be built in situations where more than one error term is necessary and also allows for dependencies between error terms. The error components can be correlated and not necessarily follow a normal distribution. When there are different clusters, that is, groups of observations, the observations in the same cluster are correlated. In fact, they are positively correlated because observations in the same cluster share some common features. In this situation, using generalized linear models and ignoring the correlations may cause problems.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

The generalized functional linear model (GFLM) is an extension of the generalized linear model (GLM) that allows one to regress univariate responses of various types on functional predictors, which are mostly random trajectories generated by a square-integrable stochastic processes. Similarly to GLM, a link function relates the expected value of the response variable to a linear predictor, which in case of GFLM is obtained by forming the scalar product of the random predictor function with a smooth parameter function . Functional Linear Regression, Functional Poisson Regression and Functional Binomial Regression, with the important Functional Logistic Regression included, are special cases of GFLM. Applications of GFLM include classification and discrimination of stochastic processes and functional data.

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

References

  1. Breslow, N. E.; Clayton, D. G. (1993), "Approximate Inference in Generalized Linear Mixed Models", Journal of the American Statistical Association , 88 (421): 9–25, doi:10.2307/2290687, JSTOR   2290687
  2. Stroup, W.W. (2012), Generalized Linear Mixed Models, CRC Press
  3. Jiang, J. (2007), Linear and Generalized Linear Mixed Models and Their Applications, Springer
  4. Fitzmaurice, G. M.; Laird, N. M.; Ware, J.. (2011), Applied Longitudinal Analysis (2nd ed.), John Wiley & Sons, ISBN   978-0-471-21487-8
  5. Pawitan, Yudi. In All Likelihood: Statistical Modelling and Inference Using Likelihood (Paperbackition ed.). OUP Oxford. p. 459. ISBN   978-0199671229.
  6. Breslow, N. E.; Clayton, D. G. (20 December 2012). "Approximate Inference in Generalized Linear Mixed Models". Journal of the American Statistical Association. 88 (421): 9–25. doi:10.1080/01621459.1993.10594284.
  7. Wolfinger, Russ; O'connell, Michael (December 1993). "Generalized linear mixed models a pseudo-likelihood approach". Journal of Statistical Computation and Simulation. 48 (3–4): 233–243. doi:10.1080/00949659308811554.
  8. Saefken, B.; Kneib, T.; van Waveren, C.-S.; Greven, S. (2014), "A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models" (PDF), Electronic Journal of Statistics , 8: 201–225, doi: 10.1214/14-EJS881
  9. Pinheiro, J. C.; Bates, D. M. (2000), Mixed-effects models in S and S-PLUS, Springer, New York
  10. Berridge, D. M.; Crouchley, R. (2011), Multivariate Generalized Linear Mixed Models Using R, CRC Press
  11. "lme4 package - RDocumentation". www.rdocumentation.org. Retrieved 15 September 2022.
  12. "glmm package - RDocumentation". www.rdocumentation.org. Retrieved 15 September 2022.
  13. "IBM Knowledge Center". www.ibm.com. Retrieved 6 December 2017.
  14. "Statsmodels Documentation". www.statsmodels.org. Retrieved 17 March 2021.
  15. "Details of the parameter estimation · MixedModels". juliastats.org. Retrieved 16 June 2021.
  16. Installing, loading and citing the package , retrieved 2022-08-24