Nonlinear mixed-effects model

Last updated

Nonlinear mixed-effects models constitute a class of statistical models generalizing linear mixed-effects models. Like linear mixed-effects models, they are particularly useful in settings where there are multiple measurements within the same statistical units or when there are dependencies between measurements on related statistical units. Nonlinear mixed-effects models are applied in many fields including medicine, public health, pharmacology, and ecology. [1] [2]

Contents

Definition

While any statistical model containing both fixed effects and random effects is an example of a nonlinear mixed-effects model, the most commonly used models are members of the class of nonlinear mixed-effects models for repeated measures [1]

where

Estimation

When the model is only nonlinear in fixed effects and the random effects are Gaussian, maximum-likelihood estimation can be done using nonlinear least squares methods, although asymptotic properties of estimators and test statistics may differ from the conventional general linear model. In the more general setting, there exist several methods for doing maximum-likelihood estimation or maximum a posteriori estimation in certain classes of nonlinear mixed-effects models – typically under the assumption of normally distributed random variables. A popular approach is the Lindstrom-Bates algorithm [3] which relies on iteratively optimizing a nonlinear problem, locally linearizing the model around this optimum and then employing conventional methods from linear mixed-effects models to do maximum likelihood estimation. Stochastic approximation of the expectation-maximization algorithm gives an alternative approach for doing maximum-likelihood estimation. [4]

Applications

Example: Disease progression modeling

Nonlinear mixed-effects models have been used for modeling progression of disease. [5] In progressive disease, the temporal patterns of progression on outcome variables may follow a nonlinear temporal shape that is similar between patients. However, the stage of disease of an individual may not be known or only partially known from what can be measured. Therefore, a latent time variable that describe individual disease stage (i.e. where the patient is along the nonlinear mean curve) can be included in the model.

Example: Modeling cognitive decline in Alzheimer's disease

Example of disease progression modeling of longitudinal ADAS-Cog scores using the progmod R package. ADAS-Cog disease progression modeling.gif
Example of disease progression modeling of longitudinal ADAS-Cog scores using the progmod R package.

Alzheimer's disease is characterized by a progressive cognitive deterioration. However, patients may differ widely in cognitive ability and reserve, so cognitive testing at a single time point can often only be used to coarsely group individuals in different stages of disease. Now suppose we have a set of longitudinal cognitive data from individuals that are each categorized as having either normal cognition (CN), mild cognitive impairment (MCI) or dementia (DEM) at the baseline visit (time corresponding to measurement ). These longitudinal trajectories can be modeled using a nonlinear mixed effects model that allows differences in disease state based on baseline categorization:

where

An example of such a model with an exponential mean function fitted to longitudinal measurements of the Alzheimer's Disease Assessment Scale-Cognitive Subscale (ADAS-Cog) is shown in the box. As shown, the inclusion of fixed effects of baseline categorization (MCI or dementia relative to normal cognition) and the random effect of individual continuous disease stage aligns the trajectories of cognitive deterioration to reveal a common pattern of cognitive decline.

Example: Growth analysis

Estimation of a mean height curve for boys from the Berkeley Growth Study with and without warping. Warping model is fitted as a nonlinear mixed-effects model using the pavpop R package. Estimation of a mean height curve for boys from the Berkeley Growth Study with and without warping.gif
Estimation of a mean height curve for boys from the Berkeley Growth Study with and without warping. Warping model is fitted as a nonlinear mixed-effects model using the pavpop R package.

Growth phenomena often follow nonlinear patters (e.g. logistic growth, exponential growth, and hyperbolic growth). Factors such as nutrient deficiency may both directly affect the measured outcome (e.g. organisms with lack of nutrients end up smaller), but possibly also timing (e.g. organisms with lack of nutrients grow at a slower pace). If a model fails to account for the differences in timing, the estimated population-level curves may smooth out finer details due to lack of synchronization between organisms. Nonlinear mixed-effects models enable simultaneous modeling of individual differences in growth outcomes and timing.

Example: Modeling human height

Models for estimating the mean curves of human height and weight as a function of age and the natural variation around the mean are used to create growth charts. The growth of children can however become desynchronized due to both genetic and environmental factors. For example, age at onset of puberty and its associated height spurt can vary several years between adolescents. Therefore, cross-sectional studies may underestimate the magnitude of the pubertal height spurt because age is not synchronized with biological development. The differences in biological development can be modeled using random effects that describe a mapping of observed age to a latent biological age using a so-called warping function. A simple nonlinear mixed-effects model with this structure is given by

where

There exists several methods and software packages for fitting such models. The so-called SITAR model [7] can fit such models using warping functions that are affine transformations of time (i.e. additive shifts in biological age and differences in rate of maturation), while the so-called pavpop model [6] can fit models with smoothly-varying warping functions. An example of the latter is shown in the box.

Example: Population Pharmacokinetic/pharmacodynamic modeling

Basic pharmacokinetic processes affecting the fate of ingested substances. Nonlinear mixed-effects modeling can be used to estimate the population-level effects of these processes while also modeling the individual variation between subjects. Pharmacokinetics.svg
Basic pharmacokinetic processes affecting the fate of ingested substances. Nonlinear mixed-effects modeling can be used to estimate the population-level effects of these processes while also modeling the individual variation between subjects.

PK/PD models for describing exposure-response relationships such as the Emax model can be formulated as nonlinear mixed-effects models. [8] The mixed-model approach allows modeling of both population level and individual differences in effects that have a nonlinear effect on the observed outcomes, for example the rate at which a compound is being metabolized or distributed in the body.

Example: COVID-19 epidemiological modeling

Extrapolated infection trajectories of 40 countries severely affected by COVID-19 and grand (population) average through May 14th COVID 19 Outbreak.jpg
Extrapolated infection trajectories of 40 countries severely affected by COVID-19 and grand (population) average through May 14th

The platform of the nonlinear mixed effect models can be used to describe infection trajectories of subjects and understand some common features shared across the subjects. In epidemiological problems, subjects can be countries, states, or counties, etc. This can be particularly useful in estimating a future trend of the epidemic in an early stage of pendemic where nearly little information is known regarding the disease. [9]

Example: Prediction of oil production curve of shale oil wells at a new location with latent kriging

Prediction of oil production rate decline curve obtained by latent kriging. 324 training wells and two test wells in the Eagle Ford Shale Reservoir of South Texas (top left); A schematic example of a hydraulically fractured horizontal well (bottom left); Predicted curves at test wells via latent kriging method (right) Eagle Ford Oil Prediction.jpg
Prediction of oil production rate decline curve obtained by latent kriging. 324 training wells and two test wells in the Eagle Ford Shale Reservoir of South Texas (top left); A schematic example of a hydraulically fractured horizontal well (bottom left); Predicted curves at test wells via latent kriging method (right)

The eventual success of petroleum development projects relies on a large degree of well construction costs. As for unconventional oil and gas reservoirs, because of very low permeability, and a flow mechanism very different from that of conventional reservoirs, estimates for the well construction cost often contain high levels of uncertainty, and oil companies need to make heavy investment in the drilling and completion phase of the wells. The overall recent commercial success rate of horizontal wells in the United States is known to be 65%, which implies that only 2 out of 3 drilled wells will be commercially successful. For this reason, one of the crucial tasks of petroleum engineers is to quantify the uncertainty associated with oil or gas production from shale reservoirs, and further, to predict an approximated production behavior of a new well at a new location given specific completion data before actual drilling takes place to save a large degree of well construction costs.

The platform of the nonlinear mixed effect models can be extended to consider the spatial association by incorporating the geostatistical processes such as Gaussian process on the second stage of the model as follows: [10]

where

The Gaussian process regressions used on the latent level (the second stage) eventually produce kriging predictors for the curve parameters that dictate the shape of the mean curve on the date level (the first level). As the kriging techniques have been employed in the latent level, this technique is called latent kriging. The right panels show the prediction results of the latent kriging method applied to the two test wells in the Eagle Ford Shale Reservoir of South Texas.

Bayesian nonlinear mixed-effects model

Bayesian research cycle using Bayesian nonlinear mixed effects model: (a) standard research cycle and (b) Bayesian-specific workflow. Bayesian research cycle.png
Bayesian research cycle using Bayesian nonlinear mixed effects model: (a) standard research cycle and (b) Bayesian-specific workflow.

The framework of Bayesian hierarchical modeling is frequently used in diverse applications. Particularly, Bayesian nonlinear mixed-effects models have recently received significant attention. A basic version of the Bayesian nonlinear mixed-effects models is represented as the following three-stage:

Stage 1: Individual-Level Model

Stage 2: Population Model

Stage 3: Prior

Here, denotes the continuous response of the -th subject at the time point , and is the -th covariate of the -th subject. Parameters involved in the model are written in Greek letters. is a known function parameterized by the -dimensional vector . Typically, is a `nonlinear' function and describes the temporal trajectory of individuals. In the model, and describe within-individual variability and between-individual variability, respectively. If Stage 3: Prior is not considered, then the model reduces to a frequentist nonlinear mixed-effect model.


A central task in the application of the Bayesian nonlinear mixed-effect models is to evaluate the posterior density:


The panel on the right displays Bayesian research cycle using Bayesian nonlinear mixed-effects model. [12] A research cycle using the Bayesian nonlinear mixed-effects model comprises two steps: (a) standard research cycle and (b) Bayesian-specific workflow. Standard research cycle involves literature review, defining a problem and specifying the research question and hypothesis. Bayesian-specific workflow comprises three sub-steps: (b)–(i) formalizing prior distributions based on background knowledge and prior elicitation; (b)–(ii) determining the likelihood function based on a nonlinear function ; and (b)–(iii) making a posterior inference. The resulting posterior inference can be used to start a new research cycle.

See also

Related Research Articles

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed due to prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

<span class="mw-page-title-main">Expectation–maximization algorithm</span> Iterative method for finding maximum likelihood estimates in statistical models

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step. It can be used, for example, to estimate a mixture of gaussians, or to solve the multiple linear regression problem.

The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regression models may be compactly written as

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted , is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.

In mechanics, virtual work arises in the application of the principle of least action to the study of forces and movement of a mechanical system. The work of a force acting on a particle as it moves along a displacement is different for different displacements. Among all the possible displacements that a particle may follow, called virtual displacements, one will minimize the action. This displacement is therefore the displacement followed by the particle according to the principle of least action.

The work of a force on a particle along a virtual displacement is known as the virtual work.

<span class="mw-page-title-main">Chiral model</span> Model of mesons in the massless quark limit

In nuclear physics, the chiral model, introduced by Feza Gürsey in 1960, is a phenomenological model describing effective interactions of mesons in the chiral limit (where the masses of the quarks go to zero), but without necessarily mentioning quarks at all. It is a nonlinear sigma model with the principal homogeneous space of a Lie group as its target manifold. When the model was originally introduced, this Lie group was the SU(N), where N is the number of quark flavors. The Riemannian metric of the target manifold is given by a positive constant multiplied by the Killing form acting upon the Maurer–Cartan form of SU(N).

In statistics, the theory of minimum norm quadratic unbiased estimation (MINQUE) was developed by C. R. Rao. MINQUE is a theory alongside other estimation methods in estimation theory, such as the method of moments or maximum likelihood estimation. Similar to the theory of best linear unbiased estimation, MINQUE is specifically concerned with linear regression models. The method was originally conceived to estimate heteroscedastic error variance in multiple linear regression. MINQUE estimators also provide an alternative to maximum likelihood estimators or restricted maximum likelihood estimators for variance components in mixed effects models. MINQUE estimators are quadratic forms of the response variable and are used to estimate a linear function of the variances.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable. A more general treatment of this approach can be found in the article MMSE estimator.

<span class="mw-page-title-main">Dirichlet process</span> Family of stochastic processes

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

<span class="mw-page-title-main">Intraclass correlation</span> Descriptive statistic

In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures, it operates on data structured as groups rather than data structured as paired observations.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

Bayesian hierarchical modelling is a statistical model written in multiple levels that estimates the parameters of the posterior distribution using the Bayesian method. The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.

<span class="mw-page-title-main">Lode coordinates</span>

Lode coordinates or Haigh–Westergaard coordinates. are a set of tensor invariants that span the space of real, symmetric, second-order, 3-dimensional tensors and are isomorphic with respect to principal stress space. This right-handed orthogonal coordinate system is named in honor of the German scientist Dr. Walter Lode because of his seminal paper written in 1926 describing the effect of the middle principal stress on metal plasticity. Other examples of sets of tensor invariants are the set of principal stresses or the set of kinematic invariants . The Lode coordinate system can be described as a cylindrical coordinate system within principal stress space with a coincident origin and the z-axis parallel to the vector .

References

  1. 1 2 Pinheiro, J; Bates, DM (2006). Mixed-effects models in S and S-PLUS. Statistics and Computing. New York: Springer Science & Business Media. doi:10.1007/b98882. ISBN   0-387-98957-9.
  2. Bolker, BM (2008). Ecological models and data in R. Princeton University Press.{{cite book}}: |website= ignored (help)
  3. Lindstrom, MJ; Bates, DM (1990). "Nonlinear mixed effects models for repeated measures data". Biometrics. 46 (3): 673–687. doi:10.2307/2532087. JSTOR   2532087. PMID   2242409.
  4. Kuhn, E; Lavielle, M (2005). "Maximum likelihood estimation in nonlinear mixed effects models". Computational Statistics & Data Analysis. 49 (4): 1020–1038. doi:10.1016/j.csda.2004.07.002.
  5. 1 2 Raket, LL (2020). "Statistical disease progression modeling in Alzheimer's disease". Frontiers in Big Data. 3: 24. doi: 10.3389/fdata.2020.00024 . PMC   7931952 . PMID   33693397. S2CID   221105601.
  6. 1 2 Raket LL, Sommer S, Markussen B (2014). "A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data". Pattern Recognition Letters. 38: 1–7. doi:10.1016/j.patrec.2013.10.018.
  7. Cole TJ, Donaldson MD, Ben-Shlomo Y (2010). "SITAR—a useful instrument for growth curve analysis". International Journal of Epidemiology. 39 (6): 1558–66. doi: 10.1093/ije/dyq115 . PMC   2992626 . PMID   20647267. S2CID   17816715.
  8. Jonsson, EN; Karlsson, MO; Wade, JR (2000). "Nonlinearity detection: advantages of nonlinear mixed-effects modeling". AAPS PharmSci. 2 (3): E32. doi:10.1208/ps020332. PMC   2761142 . PMID   11741248.
  9. Lee, Se Yoon; Lei, Bowen; Mallick, Bani (2020). "Estimation of COVID-19 spread curves integrating global data and borrowing information". PLOS ONE. 15 (7): e0236860. arXiv: 2005.00662 . doi: 10.1371/journal.pone.0236860 . PMC   7390340 . PMID   32726361.
  10. Lee, Se Yoon; Mallick, Bani (2021). "Bayesian Hierarchical Modeling: Application Towards Production Results in the Eagle Ford Shale of South Texas". Sankhya B. 84: 1–43. doi:10.1007/s13571-020-00245-8.
  11. Lee, Se Yoon (2022). "Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications". Mathematics. 10 (6): 898. arXiv: 2201.12430 . doi: 10.3390/math10060898 .
  12. Lee, Se Yoon (2022). "Bayesian Nonlinear Models for Repeated Measurement Data: An Overview, Implementation, and Applications". Mathematics. 10 (6): 898. arXiv: 2201.12430 . doi: 10.3390/math10060898 .