Mixed model

Last updated

A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. [1] [2] These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units (see also longitudinal study), or where measurements are made on clusters of related statistical units. [2] Mixed models are often preferred over traditional analysis of variance regression models because they don't rely on the independent observations assumption. Further, they have their flexibility in dealing with missing values and uneven spacing of repeated measurements. [3] The Mixed model analysis allows measurements to be explicitly modeled in a wider variety of correlation and variance-covariance avoiding biased estimations structures.

Contents

This page will discuss mainly linear mixed-effects models rather than generalized linear mixed models or nonlinear mixed-effects models. [4]

Qualitative Description

Linear mixed models (LMMs) are statistical models that incorporate fixed and random effects to accurately represent non-independent data structures. LMM is an alternative to analysis of variance. Often, ANOVA assumes the independence of observations within each group, however, this assumption may not hold in non-independent data, such as multilevel/hierarchical, longitudinal, or correlated datasets.

Non-independent sets are ones in which the variability between outcomes is due to correlations within groups or between groups. Mixed models properly account for nest structures/hierarchical data structures where observations are influenced by their nested associations. For example, when studying education methods involving multiple schools, there are multiple levels of variables to consider. The individual level/lower level comprises individual students or teachers within the school. The observations obtained from this student/teacher is nested within their school. For example, Student A is a unit within the School A. The next higher level is the school. At the higher level, the school contains multiple individual students and teachers. The school level influences the observations obtained from the students and teachers. For Example, School A and School B are the higher levels each with its set of Student A and Student B respectively. This represents a hierarchical data scheme. A solution to modeling hierarchical data is using linear mixed models.

Representation of how data, related to education system, is non-independent and structured in nested/hierarchical levels. Heiarchial Data Strucutre Education.jpg
Representation of how data, related to education system, is non-independent and structured in nested/hierarchical levels.

LMMs allow us to understand the important effects between and within levels while incorporating the corrections for standard errors for non-independence embedded in the data structure. [4] [5]

The Fixed Effect

Fixed effects encapsulate the tendencies/trends that are consistent at the levels of primary interest. These effects are considered fixed because they are non-random and assumed to be constant for the population being studied. [5] For example, when studying education a fixed effect could represent overall school level effects that are consistent across all schools.

While the hierarchy of the data set is typically obvious, the specific fixed effects that affect the average responses for all subjects must be specified. Some fixed effect coefficients are sufficient without corresponding random effects where as other fixed coefficients only represent an average where the individual units are random. These may be determined by incorporating random intercepts and slopes. [6] [7] [8]

In most situations, several related models are considered and the model that best represents a universal model is adopted.

The Random Effect, ε

A key component of the mixed model is the incorporation of random effects with the fixed effect. Fixed effects are often fitted to represent the underlying model. In Linear mixed models, the true regression of the population is linear, β. The fixed data is fitted at the highest level. Random effects introduce statistical variability at different levels of the data hierarchy. These account for the unmeasured sources of variance that affect certain groups in the data. For example, the differences between student 1 and student 2 in the same class, or the differences between class 1 and class 2 in the same school.   [6] [7] [8]

History and current status

Representation of biased vs. unbiased data and the differences between fitted estimates using least squares regression (LSR) and a linear mixed model (LMM). Bias described using LMM.jpg
Representation of biased vs. unbiased data and the differences between fitted estimates using least squares regression (LSR) and a linear mixed model (LMM).

Ronald Fisher introduced random effects models to study the correlations of trait values between relatives. [9] In the 1950s, Charles Roy Henderson provided best linear unbiased estimates of fixed effects and best linear unbiased predictions of random effects. [10] [11] [12] [13] Subsequently, mixed modeling has become a major area of statistical research, including work on computation of maximum likelihood estimates, non-linear mixed effects models, missing data in mixed effects models, and Bayesian estimation of mixed effects models. Mixed models are applied in many disciplines where multiple correlated measurements are made on each unit of interest. They are prominently used in research involving human and animal subjects in fields ranging from genetics to marketing, and have also been used in baseball [14] and industrial statistics. [15] The mixed linear model association has improved the prevention of false positive associations. Populations are deeply interconnected and the relatedness structure of population dynamics is extremely difficult to model without the use of mixed models. Linear mixed models may not, however, be the only solution. LMM's have a constant-residual variance assumption that is sometimes violated when accounting or deeply associated continuous and binary traits. [16]

Definition

In matrix notation a linear mixed model can be represented as

where

For example, if each observation can belong to any zero or more of k categories then Z, which has one row per observation, can be chosen to have k columns, where a value of 1 for a matrix element of Z indicates that an observation is known to belong to a category and a value of 0 indicates that an observation is known to not belong to a category. The inferred value of u for a category is then a category-specific intercept. If Z has additional columns, where the non-zero values are instead the value of an independent variable for an observation, then the corresponding inferred value of u is a category-specific slope for that independent variable. The prior distribution for the category intercepts and slopes is described by the covariance matrix G.

Estimation

The joint density of and can be written as: . Assuming normality, , and , and maximizing the joint density over and , gives Henderson's "mixed model equations" (MME) for linear mixed models: [10] [12] [17]

where for example X′ is the matrix transpose of X and R−1 is the matrix inverse of R.

The solutions to the MME, and are best linear unbiased estimates and predictors for and , respectively. This is a consequence of the Gauss–Markov theorem when the conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is best linear unbiased estimates. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.

One method used to fit such mixed models is that of the expectation–maximization algorithm (EM) where the variance components are treated as unobserved nuisance parameters in the joint likelihood. [18] Currently, this is the method implemented in statistical software such as Python (statsmodels package) and SAS (proc mixed), and as initial step only in R's nlme package lme(). The solution to the mixed model equations is a maximum likelihood estimate when the distribution of the errors is normal. [19] [20]

Fixed, mixed, and random effects influence linear regression models. Mixedandfixedeffects.jpg
Fixed, mixed, and random effects influence linear regression models.

There are several other methods to fit mixed models, including using a mixed effect model (MEM) initially, and then Newton-Raphson (used by R package nlme [21] 's lme()), penalized least squares to get a profiled log likelihood only depending on the (low-dimensional) variance-covariance parameters of , i.e., its cov matrix , and then modern direct optimization for that reduced objective function (used by R's lme4 [22] package lmer() and the Julia package MixedModels.jl) and direct optimization of the likelihood (used by e.g. R's glmmTMB). Notably, while the canonical form proposed by Henderson is useful for theory, many popular software packages use a different formulation for numerical computation in order to take advantage of sparse matrix methods (e.g. lme4 and MixedModels.jl).

See also

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

The topic of heteroskedasticity-consistent (HC) standard errors arises in statistics and econometrics in the context of linear regression and time series analysis. These are also known as heteroskedasticity-robust standard errors, Eicker–Huber–White standard errors, to recognize the contributions of Friedhelm Eicker, Peter J. Huber, and Halbert White.

In statistics, the projection matrix, sometimes also called the influence matrix or hat matrix, maps the vector of response values to the vector of fitted values. It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.

In statistics, a generalized linear mixed model (GLMM) is an extension to the generalized linear model (GLM) in which the linear predictor contains random effects in addition to the usual fixed effects. They also inherit from generalized linear models the idea of extending linear mixed models to non-normal data.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Numerical methods for linear least squares entails the numerical analysis of linear least squares problems.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. If the explanatory variables are measured with error then errors-in-variables models are required, also known as measurement error models.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al. used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

References

  1. Baltagi, Badi H. (2008). Econometric Analysis of Panel Data (Fourth ed.). New York: Wiley. pp. 54–55. ISBN   978-0-470-51886-1.
  2. 1 2 Gomes, Dylan G.E. (20 January 2022). "Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model?". PeerJ. 10: e12794. doi: 10.7717/peerj.12794 . PMC   8784019 . PMID   35116198.
  3. Yang, Jian; Zaitlen, NA; Goddard, ME; Visscher, PM; Prince, AL (29 January 2014). "Advantages and pitfalls in the application of mixed-model association methods". Nat Genet. 46 (2): 100–106. doi:10.1038/ng.2876. PMC   3989144 . PMID   24473328.
  4. 1 2 Seltman, Howard (2016). Experimental Design and Analysis. Vol. 1. pp. 357–378.
  5. 1 2 "Introduction to Linear Mixed Models". Advanced Research Computing Statistical Methods and Data Analytics. UCLA Statistical Consulting Group. 2021.
  6. 1 2 Kreft & de Leeuw, J. Introducing multilevel modeling. London:Sage.
  7. 1 2 Raudenbush, Bryk, S.W, A.S (2002). Hierarchical Linear Models: Applications and Data Analysis Methods. Thousand Oaks, CA: Sage.{{cite book}}: CS1 maint: multiple names: authors list (link)
  8. 1 2 Snijders, Bosker, T.A.B, R.J (2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Vol. 2nd edition. London:Sage.{{cite book}}: CS1 maint: multiple names: authors list (link)
  9. Fisher, RA (1918). "The correlation between relatives on the supposition of Mendelian inheritance". Transactions of the Royal Society of Edinburgh. 52 (2): 399–433. doi:10.1017/S0080456800012163. S2CID   181213898.
  10. 1 2 Robinson, G.K. (1991). "That BLUP is a Good Thing: The Estimation of Random Effects". Statistical Science. 6 (1): 15–32. doi: 10.1214/ss/1177011926 . JSTOR   2245695.
  11. C. R. Henderson; Oscar Kempthorne; S. R. Searle; C. M. von Krosigk (1959). "The Estimation of Environmental and Genetic Trends from Records Subject to Culling". Biometrics. 15 (2). International Biometric Society: 192–218. doi:10.2307/2527669. JSTOR   2527669.
  12. 1 2 L. Dale Van Vleck. "Charles Roy Henderson, April 1, 1911 – March 14, 1989" (PDF). United States National Academy of Sciences.
  13. McLean, Robert A.; Sanders, William L.; Stroup, Walter W. (1991). "A Unified Approach to Mixed Linear Models". The American Statistician. 45 (1). American Statistical Association: 54–64. doi:10.2307/2685241. JSTOR   2685241.
  14. Anderson, R.J (2016). ""MLB analytics guru who could be the next Nate Silver has a revolutionary new stat"".
  15. Obenchain, Lilly, Bob, Eli (1993). "Data Analysis and Information Visualization" (PDF). MWSUG.{{cite book}}: CS1 maint: multiple names: authors list (link)
  16. Chen, H; Wang, C; Conomos, MP; Stilp, AM; Li, Z; Sofer, T; Szpiro, AA; Chen, W; Brehm, JM; Celedon, JC; Redline, S; Papanicolaou, S; Thorton, GJ; Thorton, TA; Laurie, CC; Rice, K; Lin, X (7 April 2016). "Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models". Am J Hum Genet. 98 (4): 653–666. doi:10.1016/j.ajhg.2016.02.012. PMC   4833218 . PMID   27018471.
  17. Henderson, C R (1973). "Sire evaluation and genetic trends" (PDF). Journal of Animal Science. 1973. American Society of Animal Science: 10–41. doi:10.1093/ansci/1973.Symposium.10 . Retrieved 17 August 2014.
  18. Lindstrom, ML; Bates, DM (1988). "Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data". Journal of the American Statistical Association. 83 (404): 1014–1021. doi:10.1080/01621459.1988.10478693.
  19. Laird, Nan M.; Ware, James H. (1982). "Random-Effects Models for Longitudinal Data". Biometrics. 38 (4). International Biometric Society: 963–974. doi:10.2307/2529876. JSTOR   2529876. PMID   7168798.
  20. Fitzmaurice, Garrett M.; Laird, Nan M.; Ware, James H. (2004). Applied Longitudinal Analysis. John Wiley & Sons. pp. 326–328.
  21. Pinheiro, J; Bates, DM (2006). Mixed-effects models in S and S-PLUS. Statistics and Computing. New York: Springer Science & Business Media. doi:10.1007/b98882. ISBN   0-387-98957-9.
  22. Bates, D.; Maechler, M.; Bolker, B.; Walker, S. (2015). "Fitting Linear Mixed-Effects Models Using lme4". Journal of Statistical Software. 67 (1). doi: 10.18637/jss.v067.i01 . hdl: 2027.42/146808 .

Further reading