Multilevel modeling for repeated measures

Last updated

One application of multilevel modeling (MLM) is the analysis of repeated measures data. Multilevel modeling for repeated measures data is most often discussed in the context of modeling change over time (i.e. growth curve modeling for longitudinal designs); however, it may also be used for repeated measures data in which time is not a factor. [1]

Contents

In multilevel modeling, an overall change function (e.g. linear, quadratic, cubic etc.) is fitted to the whole sample and, just as in multilevel modeling for clustered data, the slope and intercept may be allowed to vary. For example, in a study looking at income growth with age, individuals might be assumed to show linear improvement over time. However, the exact intercept and slope could be allowed to vary across individuals (i.e. defined as random coefficients).

Multilevel modeling with repeated measures employs the same statistical techniques as MLM with clustered data. In multilevel modeling for repeated measures data, the measurement occasions are nested within cases (e.g. individual or subject). Thus, level-1 units consist of the repeated measures for each subject, and the level-2 unit is the individual or subject. In addition to estimating overall parameter estimates, MLM allows regression equations at the level of the individual. Thus, as a growth curve modeling technique, it allows the estimation of inter-individual differences in intra-individual change over time by modeling the variances and covariances. [2] In other words, it allows the testing of individual differences in patterns of responses over time (i.e. growth curves). This characteristic of multilevel modeling makes it preferable to other repeated measures statistical techniques such as repeated measures-analysis of variance (RM-ANOVA) for certain research questions.

Assumptions

The assumptions of MLM that hold for clustered data also apply to repeated measures:

(1) Random components are assumed to have a normal distribution with a mean of zero
(2) The dependent variable is assumed to be normally distributed. However, binary and discrete dependent variables may be examined in MLM using specialized procedures (i.e. employ different link functions). [3]

One of the assumptions of using MLM for growth curve modeling is that all subjects show the same relationship over time (e.g. linear, quadratic etc.). Another assumption of MLM for growth curve modeling is that the observed changes are related to the passage of time. [4]

Statistics & Interpretation

Mathematically, multilevel analysis with repeated measures is very similar to the analysis of data in which subjects are clustered in groups. However, one point to note is that time-related predictors must be explicitly entered into the model to evaluate trend analyses and to obtain an overall test of the repeated measure. Furthermore, interpretation of these analyses is dependent on the scale of the time variable (i.e. how it is coded).

Extensions

  • Non-linear trends (quadratic, cubic, etc.) may be evaluated in MLM by adding the products of Time (TimeXTime, TimeXTimeXTime etc.) as either random or fixed effects to the model.
  • Covariance Structure: Multilevel software provides several different covariance or error structures to choose from for the analysis of multilevel data (e.g. autoregressive). These may be applied to the growth model as appropriate.
  • Dependent Variable: Dichotomous dependent variables may be analyzed with multilevel analysis by using more specialized analysis (i.e. using the logit or probit link functions).

Multilevel modeling versus other statistical techniques for repeated measures

Multilevel Modeling versus RM-ANOVA

Repeated measures analysis of variance (RM-ANOVA) has been traditionally used for analysis of repeated measures designs. However, violation of the assumptions of RM-ANOVA can be problematic. Multilevel modeling (MLM) is commonly used for repeated measures designs because it presents an alternative approach to analyzing this type of data with three main advantages over RM-ANOVA: [5]

1. MLM has Less Stringent Assumptions: MLM can be used if the assumptions of constant variances (homogeneity of variance, or homoscedasticity), constant covariances (compound symmetry), or constant variances of differences scores (sphericity) are violated for RM-ANOVA. MLM allows modeling of the variance-covariance matrix from the data; thus, unlike in RM-ANOVA, these assumptions are not necessary. [6]
2. MLM Allows Hierarchical Structure: MLM can be used for higher-order sampling procedures, whereas RM-ANOVA is limited to examining two-level sampling procedures. In other words, MLM can look at repeated measures within subjects, within a third level of analysis etc., whereas RM-ANOVA is limited to repeated measures within subjects.
3. MLM can Handle Missing Data: Missing data is permitted in MLM without causing additional complications. With RM-ANOVA, subject’s data must be excluded if they are missing a single data point. Missing data and attempts to resolve missing data (i.e. using the subject’s mean for non-missing data) can raise additional problems in RM-ANOVA.
4. MLM can also handle data in which there is variation in the exact timing of data collection (i.e. variable timing versus fixed timing). For example, data for a longitudinal study may attempt to collect measurements at age 6 months, 9 months, 12 months, and 15 months. However, participant availability, bank holidays, and other scheduling issues may result in variation regarding when data is collected. This variation may be addressed in MLM by adding “age” into the regression equation. There is also no need for equal intervals between measurement points in MLM.
5. MLM is relatively easily extended to discrete data. [7]
Note: Although missing data is permitted in MLM, it is assumed to be missing at random. Thus, systematically missing data can present problems. [5] [8] [9]

Multilevel Modeling versus Structural Equation Modeling (SEM; Latent Growth Model)

An alternative method of growth curve analysis is latent growth curve modeling using structural equation modeling (SEM). This approach will provide the same estimates as the multilevel modeling approach, provided that the model is specified identically in SEM. However, there are circumstances in which either MLM or SEM are preferable: [4] [6]

Multilevel modeling approach:
  • For designs with a large number of unequal intervals between time points (SEM cannot manage data with a lot of variation in time points)
  • When there are many data points per subject
  • When the growth model is nested in additional levels of analysis (i.e. hierarchical structure)
  • Multilevel modeling programs have for more options in terms of handling non-continuous dependent variables (link functions) and allowing different error structures
Structural equation modeling approach:
  • Better suited for extended models in which the model is embedded into a larger path model, or the intercept and slope are used as predictors for other variables. In this way, SEM allows greater flexibility.

The distinction between multilevel modeling and latent growth curve analysis has become less defined. Some statistical programs incorporate multilevel features within their structural equation modeling software, and some multilevel modeling software is beginning to add latent growth curve features.

Data Structure

Multilevel modeling with repeated measures data is computationally complex. Computer software capable of performing these analyses may require data to be represented in “long form” as opposed to “wide form” prior to analysis. In long form, each subject’s data is represented in several rows – one for every “time” point (observation of the dependent variable). This is opposed to wide form in which there is one row per subject, and the repeated measures are represented in separate columns. Also note that, in long form, time invariant variables are repeated across rows for each subject. See below for an example of wide form data transposed into long form:

Wide form:

SubjectGroupTime0Time1Time2
111284
211176
32151210
4211109

Long form:

SubjectGroupTimeDepVar
11012
1118
1124
............
42011
42110
4229

See also

Further reading

Notes

  1. Hoffman, Lesa; Rovine, Michael J. (2007). "Multilevel models for the experimental psychologist: Foundations and illustrative examples". Behavior Research Methods. 39 (1): 101–117. doi: 10.3758/BF03192848 . PMID   17552476.
  2. Curran, Patrick J.; Obeidat, Khawla; Losardo, Diane (2010). "Twelve Frequently Asked Questions About Growth Curve Modeling". Journal of Cognition and Development. 11 (2): 121–136. doi:10.1080/15248371003699969. PMC   3131138 . PMID   21743795.
  3. Snijders, Tom A.B.; Bosker, Roel J. (2002). Multilevel analysis : an introduction to basic and advanced multilevel modeling (Reprint. ed.). London: Sage Publications. ISBN   978-0761958901.
  4. 1 2 Hox, Joop (2005). Multilevel and SEM Approached to Growth Curve Modeling (PDF) ([Repr.]. ed.). Chichester: Wiley. ISBN   978-0-470-86080-9.
  5. 1 2 Quené, Hugo; van den Bergh, Huub (2004). "On multi-level modeling of data from repeated measures designs: a tutorial". Speech Communication. 43 (1–2): 103–121. CiteSeerX   10.1.1.2.8982 . doi:10.1016/j.specom.2004.02.004.
  6. 1 2 Cohen, Jacob; Cohen, Patricia; West, Stephen G.; Aiken, Leona S. (2003-10-03). Applied multiple regression/correlation analysis for the behavioral sciences (3. ed.). Mahwah, NJ [u.a.]: Erlbaum. ISBN   9780805822236.
  7. Molenberghs, Geert (2005). Models for discrete longitudinal data. New York: Springer Science+Business Media, Inc. ISBN   978-0387251448.
  8. Overall, John E.; Tonidandel, Scott (2007). "Analysis of Data from a Controlled Repeated Measurements Design with Baseline-Dependent Dropouts". Methodology: European Journal of Research Methods for the Behavioral and Social Sciences. 3 (2): 58–66. doi:10.1027/1614-2241.3.2.58.
  9. Overall, John; Ahn, Chul; Shivakumar, C.; Kalburgi, Yallapa (1999). "Problematic formulations of SAS PROC.MIXED models for repeated measurements". Journal of Biopharmaceutical Statistics. 9 (1): 189–216. doi:10.1081/BIP-100101008. PMID   10091918.

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates or nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses.

SUDAAN is a proprietary statistical software package for the analysis of correlated data, including correlated data encountered in complex sample surveys. SUDAAN originated in 1972 at RTI International. Individual commercial licenses are sold for $1,460 a year, or $3,450 permanently.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

A mixed model, mixed-effects model or mixed error-component model is a statistical model containing both fixed effects and random effects. These models are useful in a wide variety of disciplines in the physical, biological and social sciences. They are particularly useful in settings where repeated measurements are made on the same statistical units, or where measurements are made on clusters of related statistical units. Mixed models are often preferred over traditional analysis of variance regression models because of their flexibility in dealing with missing values and uneven spacing of repeated measurements. The Mixed model analysis allows measurements to be explicitly modeled in a wider variety of correlation and variance-covariance structures.

Latent growth modeling is a statistical technique used in the structural equation modeling (SEM) framework to estimate growth trajectories. It is a longitudinal analysis technique to estimate growth over a period of time. It is widely used in the field of psychology, behavioral science, education and social science. It is also called latent growth curve analysis. The latent growth model was derived from theories of SEM. General purpose SEM software, such as OpenMx, lavaan, AMOS, Mplus, LISREL, or EQS among others may be used to estimate growth trajectories.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unmeasured correlation between observations from different timepoints. Although some believe that Generalized estimating equations are robust in everything even with the wrong choice of working-correlation matrix, Generalized estimating equations are only robust to loss of consistency with the wrong choice.

In statistics, a mixed-design analysis of variance model, also known as a split-plot ANOVA, is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures. Thus, in a mixed-design ANOVA model, one factor is a between-subjects variable and the other is a within-subjects variable. Thus, overall, the model is a type of mixed-effects model.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

<span class="mw-page-title-main">Bivariate analysis</span> Concept in statistical analysis

Bivariate analysis is one of the simplest forms of quantitative (statistical) analysis. It involves the analysis of two variables, for the purpose of determining the empirical relationship between them.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

References