Growth curve (statistics)

Last updated
Table of height and weight for boys over time. The growth curve model (also known as GMANOVA) is used to analyze data such as this, where multiple observations are made on collections of individuals over time. PSM V85 D564 Table of height and weight for boys.png
Table of height and weight for boys over time. The growth curve model (also known as GMANOVA) is used to analyze data such as this, where multiple observations are made on collections of individuals over time.

The growth curve model in statistics is a specific multivariate linear model, also known as GMANOVA (Generalized Multivariate Analysis-Of-Variance). [1] It generalizes MANOVA by allowing post-matrices, as seen in the definition.

Contents

Definition

Growth curve model: [2] Let X be a p×n random matrix corresponding to the observations, A a p×q within design matrix with q  p, B a q×k parameter matrix, C a k×n between individual design matrix with rank(C) + p  n and let Σ be a positive-definite p×p matrix. Then

defines the growth curve model, where A and C are known, B and Σ are unknown, and E is a random matrix distributed as Np,n(0,Ip,n).

This differs from standard MANOVA by the addition of C, a "postmatrix". [3]

History

Many writers have considered the growth curve analysis, among them Wishart (1938), [4] Box (1950) [5] and Rao (1958). [6] Potthoff and Roy in 1964; [3] were the first in analyzing longitudinal data applying GMANOVA models.

Applications

GMANOVA is frequently used for the analysis of surveys, clinical trials, and agricultural data, [7] as well as more recently in the context of Radar adaptive detection. [8] [9]

Other uses

In mathematical statistics, growth curves such as those used in biology are often modeled as being continuous stochastic processes, e.g. as being sample paths that almost surely solve stochastic differential equations. [10] Growth curves have been also applied in forecasting market development. [11] When variables are measured with error, a Latent growth modeling SEM can be used.

Footnotes

  1. Kim, Kevin; Timm, Neil (2007). ""Restricted MGLM and growth curve model" (Chapter 7)". Univariate and multivariate general linear models: Theory and applications with SAS (with 1 CD-ROM for Windows and UNIX). Statistics: Textbooks and Monographs (Second ed.). Boca Raton, Florida: Chapman & Hall/CRC. ISBN   978-1-58488-634-1.
  2. Kollo, Tõnu; von Rosen, Dietrich (2005). ""Multivariate linear models" (chapter 4), especially "The Growth curve model and extensions" (Chapter 4.1)". Advanced multivariate statistics with matrices. Mathematics and its applications. Vol. 579. Dordrecht: Springer. ISBN   978-1-4020-3418-3.
  3. 1 2 R.F. Potthoff and S.N. Roy, “A generalized multivariate analysis of variance model useful especially for growth curve problems,” Biometrika, vol. 51, pp. 313–326, 1964
  4. Wishart, John (1938). "Growth rate determinations in nutrition studies with the bacon pig, and their analysis". Biometrika. 30 (1–2): 16–28. doi:10.1093/biomet/30.1-2.16.
  5. Box, G.E.P. (1950). "Problems in the analysis of growth and wear curves". Biometrics. 6 (4): 362–89. doi:10.2307/3001781. JSTOR   3001781. PMID   14791573.
  6. Radhakrishna, Rao (1958). "Some statistical methods for comparison of growth curves". Biometrics. 14 (1): 1–17. doi:10.2307/2527726. JSTOR   2527726.
  7. Pan, Jian-Xin; Fang, Kai-Tai (2002). Growth curve models and statistical diagnostics. Springer Series in Statistics. New York: Springer-Verlag. ISBN   0-387-95053-2.
  8. Ciuonzo, D.; De Maio, A.; Orlando, D. (2016). "A Unifying Framework for Adaptive Radar Detection in Homogeneous plus Structured Interference-Part I: On the Maximal Invariant Statistic". IEEE Transactions on Signal Processing. PP (99): 2894–2906. arXiv: 1507.05263 . Bibcode:2016ITSP...64.2894C. doi:10.1109/TSP.2016.2519003. S2CID   5473094.
  9. Ciuonzo, D.; De Maio, A.; Orlando, D. (2016). "A Unifying Framework for Adaptive Radar Detection in Homogeneous plus Structured Interference-Part II: Detectors Design". IEEE Transactions on Signal Processing. PP (99): 2907–2919. arXiv: 1507.05266 . Bibcode:2016ITSP...64.2907C. doi:10.1109/TSP.2016.2519005. S2CID   12069007.
  10. Seber, G. A. F.; Wild, C. J. (1989). ""Growth models (Chapter 7)"". Nonlinear regression. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: John Wiley & Sons, Inc. pp. 325–367. ISBN   0-471-61760-1.
  11. Meade, Nigel (1984). "The use of growth curves in forecasting market development—a review and appraisal". Journal of Forecasting. 3 (4): 429–451. doi:10.1002/for.3980030406.

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

Statistics is a field of inquiry that studies the collection, analysis, interpretation, and presentation of data. It is applicable to a wide variety of academic disciplines, from the physical and social sciences to the humanities; it is also used and misused for making informed decisions in all areas of business and government.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Multivariate analysis of variance</span> Procedure for comparing multivariate sample means

In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.

The general linear model or general multivariate regression model is a compact way of simultaneously writing several multiple linear regression models. In that sense it is not a separate statistical linear model. The various multiple linear regression models may be compactly written as

Ridge regression is a method of estimating the coefficients of multiple-regression models in scenarios where the independent variables are highly correlated. It has been used in many fields including econometrics, chemistry, and engineering. Also known as Tikhonov regularization, named for Andrey Tikhonov, it is a method of regularization of ill-posed problems. It is particularly useful to mitigate the problem of multicollinearity in linear regression, which commonly occurs in models with large numbers of parameters. In general, the method provides improved efficiency in parameter estimation problems in exchange for a tolerable amount of bias.

<span class="mw-page-title-main">Nonlinear regression</span> Regression analysis

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations.

<span class="mw-page-title-main">Optimal design</span> Experimental design that is optimal with respect to some statistical criterion

In the design of experiments, optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion. The creation of this field of statistics has been credited to Danish statistician Kirstine Smith.

Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional data is considered to be a random function. The physical continuum over which these functions are defined is often time, but may also be spatial location, wavelength, probability, etc. Intrinsically, functional data are infinite dimensional. The high intrinsic dimensionality of these data brings challenges for theory as well as computation, where these challenges vary with how the functional data were sampled. However, the high or infinite dimensional structure of the data is a rich source of information and there are many interesting challenges for research and data analysis.

<span class="mw-page-title-main">Samarendra Nath Roy</span> Indian-born American mathematician

Samarendra Nath Roy was an Indian-born American mathematician and an applied statistician.

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

<span class="mw-page-title-main">Generalized chi-squared distribution</span>

In probability theory and statistics, the generalized chi-squared distribution is the distribution of a quadratic form of a multinormal variable, or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. Intuitively, in the simplified two and three dimensional case, the joint distribution forms an ellipse and an ellipsoid, respectively, in iso-density plots.

The Generalized Additive Model for Location, Scale and Shape (GAMLSS) is an approach to statistical modelling and learning. GAMLSS is a modern distribution-based approach to (semiparametric) regression. A parametric distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions. In machine learning parlance, GAMLSS is a form of supervised machine learning.

David Firth is a British statistician. He is Emeritus Professor in the Department of Statistics at the University of Warwick.

Fang Kaitai, also known as Kai-Tai Fang, is a Chinese mathematician and statistician who has helped to develop generalized multivariate analysis, which extends classical multivariate analysis beyond the multivariate normal distribution to more general elliptical distributions. He has also contributed to the design of experiments.

References