# Mixed model

Last updated

## Contents

A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. [1] These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units (longitudinal study), or where measurements are made on clusters of related statistical units. Because of their advantage in dealing with missing values, mixed effects models are often preferred over more traditional approaches such as repeated measures analysis of variance.

This page will discuss mainly linear mixed-effects models (=LMEM) rather than generalized linear mixed models or nonlinear mixed-effects models.

## History and current status

Ronald Fisher introduced random effects models to study the correlations of trait values between relatives. [2] In the 1950s, Charles Roy Henderson provided best linear unbiased estimates of fixed effects and best linear unbiased predictions of random effects. [3] [4] [5] [6] Subsequently, mixed modeling has become a major area of statistical research, including work on computation of maximum likelihood estimates, non-linear mixed effects models, missing data in mixed effects models, and Bayesian estimation of mixed effects models. Mixed models are applied in many disciplines where multiple correlated measurements are made on each unit of interest. They are prominently used in research involving human and animal subjects in fields ranging from genetics to marketing, and have also been used in baseball [7] and industrial statistics. [8]

## Definition

In matrix notation a linear mixed model can be represented as

${\displaystyle {\boldsymbol {y}}=X{\boldsymbol {\beta }}+Z{\boldsymbol {u}}+{\boldsymbol {\epsilon }}}$

where

• ${\displaystyle {\boldsymbol {y}}}$ is a known vector of observations, with mean ${\displaystyle E({\boldsymbol {y}})=X{\boldsymbol {\beta }}}$;
• ${\displaystyle {\boldsymbol {\beta }}}$ is an unknown vector of fixed effects;
• ${\displaystyle {\boldsymbol {u}}}$ is an unknown vector of random effects, with mean ${\displaystyle E({\boldsymbol {u}})={\boldsymbol {0}}}$ and variance–covariance matrix ${\displaystyle \operatorname {var} ({\boldsymbol {u}})=G}$;
• ${\displaystyle {\boldsymbol {\epsilon }}}$ is an unknown vector of random errors, with mean ${\displaystyle E({\boldsymbol {\epsilon }})={\boldsymbol {0}}}$ and variance ${\displaystyle \operatorname {var} ({\boldsymbol {\epsilon }})=R}$;
• ${\displaystyle X}$ and ${\displaystyle Z}$ are known design matrices relating the observations ${\displaystyle {\boldsymbol {y}}}$ to ${\displaystyle {\boldsymbol {\beta }}}$ and ${\displaystyle {\boldsymbol {u}}}$, respectively.

## Estimation

The joint density of ${\displaystyle {\boldsymbol {y}}}$ and ${\displaystyle {\boldsymbol {u}}}$ can be written as: ${\displaystyle f({\boldsymbol {y}},{\boldsymbol {u}})=f({\boldsymbol {y}}|{\boldsymbol {u}})\,f({\boldsymbol {u}})}$. Assuming normality, ${\displaystyle {\boldsymbol {u}}\sim {\mathcal {N}}({\boldsymbol {0}},G)}$, ${\displaystyle {\boldsymbol {\epsilon }}\sim {\mathcal {N}}({\boldsymbol {0}},R)}$ and ${\displaystyle \mathrm {Cov} ({\boldsymbol {u}},{\boldsymbol {\epsilon }})={\boldsymbol {0}}}$, and maximizing the joint density over ${\displaystyle {\boldsymbol {\beta }}}$ and ${\displaystyle {\boldsymbol {u}}}$, gives Henderson's "mixed model equations" (MME) for linear mixed models: [3] [5] [9]

${\displaystyle {\begin{pmatrix}X'R^{-1}X&X'R^{-1}Z\\Z'R^{-1}X&Z'R^{-1}Z+G^{-1}\end{pmatrix}}{\begin{pmatrix}{\hat {\boldsymbol {\beta }}}\\{\hat {\boldsymbol {u}}}\end{pmatrix}}={\begin{pmatrix}X'R^{-1}{\boldsymbol {y}}\\Z'R^{-1}{\boldsymbol {y}}\end{pmatrix}}}$

The solutions to the MME, ${\displaystyle \textstyle {\hat {\boldsymbol {\beta }}}}$ and ${\displaystyle \textstyle {\hat {\boldsymbol {u}}}}$ are best linear unbiased estimates and predictors for ${\displaystyle {\boldsymbol {\beta }}}$ and ${\displaystyle {\boldsymbol {u}}}$, respectively. This is a consequence of the Gauss–Markov theorem when the conditional variance of the outcome is not scalable to the identity matrix. When the conditional variance is known, then the inverse variance weighted least squares estimate is best linear unbiased estimates. However, the conditional variance is rarely, if ever, known. So it is desirable to jointly estimate the variance and weighted parameter estimates when solving MMEs.

One method used to fit such mixed models is that of the expectation–maximization algorithm where the variance components are treated as unobserved nuisance parameters in the joint likelihood. [10] Currently, this is the implemented method for the major statistical software packages R (lme in the nlme package, or linear mixed-effects in the lme4 package), Python (statsmodels package), Julia (MixedModels.jl package), and SAS (proc mixed). The solution to the mixed model equations is a maximum likelihood estimate when the distribution of the errors is normal. [11] [12]

## Related Research Articles

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems by minimizing the sum of the squares of the residuals made in the results of every single equation.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

In statistics, the logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions on the priors, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In econometrics, the autoregressive conditional heteroscedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

In statistics, best linear unbiased prediction (BLUP) is used in linear mixed models for the estimation of random effects. BLUP was derived by Charles Roy Henderson in 1950 but the term "best linear unbiased predictor" seems not to have been used until 1962. "Best linear unbiased predictions" (BLUPs) of random effects are similar to best linear unbiased estimates (BLUEs) of fixed effects. The distinction arises because it is conventional to talk about estimating fixed effects but predicting random effects, but the two terms are otherwise equivalent.. However, the equations for the "fixed" effects and for the random effects are different.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

In statistics and in machine learning, a linear predictor function is a linear function of a set of coefficients and explanatory variables, whose value is used to predict the outcome of a dependent variable. This sort of function usually comes in linear regression, where the coefficients are called regression coefficients. However, they also occur in various types of linear classifiers, as well as in various other models, such as principal component analysis and factor analysis. In many of these models, the coefficients are referred to as "weights".

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

A partially linear model is a form of semiparametric model, since it contains parametric and nonparametric elements. Application of the least squares estimators is available to partially linear model, if the hypothesis of the known of nonparametric element is valid. Partially linear equations were first used in the analysis of the relationship between temperature and usage of electricity by Engle, Granger, Rice and Weiss (1986). Typical application of partially linear model in the field of Microeconomics is presented by Tripathi in the case of profitability of firm's production in 1997. Also, partially linear model applied successfully in some other academic field. In 1994, Zeger and Diggle introduced partially linear model into biometrics. In environmental science, Parda-Sanchez et al used partially linear model to analysis collected data in 2000. So far, partially linear model was optimized in many other statistic methods. In 1988, Robinson applied Nadaraya-Waston kernel estimator to test the nonparametric element to build a least-squares estimator After that, in 1997, local linear method was found by Truong.

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology.

## References

1. Baltagi, Badi H. (2008). Econometric Analysis of Panel Data (Fourth ed.). New York: Wiley. pp. 54–55. ISBN   978-0-470-51886-1.
2. Fisher, RA (1918). "The correlation between relatives on the supposition of Mendelian inheritance". Transactions of the Royal Society of Edinburgh. 52 (2): 399–433. doi:10.1017/S0080456800012163.
3. Robinson, G.K. (1991). "That BLUP is a Good Thing: The Estimation of Random Effects". Statistical Science. 6 (1): 15–32. doi:. JSTOR   2245695.
4. C. R. Henderson; Oscar Kempthorne; S. R. Searle; C. M. von Krosigk (1959). "The Estimation of Environmental and Genetic Trends from Records Subject to Culling". Biometrics. International Biometric Society. 15 (2): 192–218. doi:10.2307/2527669. JSTOR   2527669.
5. L. Dale Van Vleck. "Charles Roy Henderson, April 1, 1911 – March 14, 1989" (PDF). United States National Academy of Sciences.
6. McLean, Robert A.; Sanders, William L.; Stroup, Walter W. (1991). "A Unified Approach to Mixed Linear Models". The American Statistician. American Statistical Association. 45 (1): 54–64. doi:10.2307/2685241. JSTOR   2685241.
7. Henderson, C R (1973). "Sire evaluation and genetic trends" (PDF). Journal of Animal Science. American Society of Animal Science. 1973: 10–41. doi:10.1093/ansci/1973.Symposium.10 . Retrieved 17 August 2014.
8. Lindstrom, ML; Bates, DM (1988). "Newton–Raphson and EM algorithms for linear mixed-effects models for repeated-measures data". JASA. 83 (404): 1014–1021. doi:10.1080/01621459.1988.10478693.
9. Laird, Nan M.; Ware, James H. (1982). "Random-Effects Models for Longitudinal Data". Biometrics. International Biometric Society. 38 (4): 963–974. doi:10.2307/2529876. JSTOR   2529876. PMID   7168798.
10. Fitzmaurice, Garrett M.; Laird, Nan M.; Ware, James H. (2004). Applied Longitudinal Analysis. John Wiley & Sons. pp. 326–328.
• Gałecki, Andrzej; Burzykowski, Tomasz (2013). Linear Mixed-Effects Models Using R: A Step-by-Step Approach. New York: Springer. ISBN   978-1-4614-3900-4.
• Milliken, G. A.; Johnson, D. E. (1992). Analysis of Messy Data: Vol. I. Designed Experiments. New York: Chapman & Hall.
• West, B. T.; Welch, K. B.; Galecki, A. T. (2007). Linear Mixed Models: A Practical Guide Using Statistical Software. New York: Chapman & Hall/CRC.