Ordination (statistics)

Last updated

Ordination or gradient analysis, in multivariate analysis, is a method complementary to data clustering, and used mainly in exploratory data analysis (rather than in hypothesis testing). In contrast to cluster analysis, ordination orders quantities in a (usually lower-dimensional) latent space. In the ordination space, quantities that are near each other share attributes (i.e., are similar to some degree), and dissimilar objects are farther from each other. Such relationships between the objects, on each of several axes or latent variables, are then characterized numerically and/or graphically in a biplot.

Contents

The first ordination method, principal components analysis, was suggested by Karl Pearson in 1901.

Methods

Ordination methods can broadly be categorized in eigenvector-, algorithm-, or model-based methods. Many classical ordination techniques, including principal components analysis, correspondence analysis (CA) and its derivatives (detrended correspondence analysis, canonical correspondence analysis, and redundancy analysis, belong to the first group.

The second group includes some distance-based methods such as non-metric multidimensional scaling, and machine learning methods such as T-distributed stochastic neighbor embedding and nonlinear dimensionality reduction.

The third group includes model-based ordination methods, which can be considered as multivariate extensions of Generalized Linear Models. [1] [2] [3] [4] Model-based ordination methods are more flexible in their application than classical ordination methods, so that it is for example possible to include random-effects. [5] Unlike in the aforementioned two groups, there is no (implicit or explicit) distance measure in the ordination. Instead, a distribution needs to be specified for the responses as is typical for statistical models. These and other assumptions, such as the assumed mean-variance relationship, can be validated with the use of residual diagnostics, unlike in other ordination methods.

Applications

Ordination can be used on the analysis of any set of multivariate objects. It is frequently used in several environmental or ecological sciences, particularly plant community ecology. It is also used in genetics and systems biology for microarray data analysis and in psychometrics.

See also

Related Research Articles

Multivariate statistics is a subdivision of statistics encompassing the simultaneous observation and analysis of more than one outcome variable, i.e., multivariate random variables. Multivariate statistics concerns understanding the different aims and background of each of the different forms of multivariate analysis, and how they relate to each other. The practical application of multivariate statistics to a particular problem may involve several types of univariate and multivariate analyses in order to understand the relationships between variables and their relevance to the problem being studied.

Psychological statistics is application of formulas, theorems, numbers and laws to psychology. Statistical methods for psychology include development and application statistical theory and methods for modeling psychological data. These methods include psychometrics, factor analysis, experimental designs, and Bayesian statistics. The article also discusses journals in the same field.

In statistics, canonical analysis (from Ancient Greek: κανων bar, measuring rod, ruler) belongs to the family of regression methods for data analysis. Regression analysis quantifies a relationship between a predictor variable and a criterion variable by the coefficient of correlation r, coefficient of determination r2, and the standard regression coefficient β. Multiple regression analysis expresses a relationship between a set of predictor variables and a single criterion variable by the multiple correlation R, multiple coefficient of determination R2, and a set of standard partial regression weights β1, β2, etc. Canonical variate analysis captures a relationship between a set of predictor variables and a set of criterion variables by the canonical correlations ρ1, ρ2, ..., and by the sets of canonical weights C and D.

In statistics, a latent class model (LCM) relates a set of observed multivariate variables to a set of latent variables. It is a type of latent variable model. It is called a latent class model because the latent variable is discrete. A class is characterized by a pattern of conditional probabilities that indicate the chance that variables take on certain values.

In archaeology, seriation is a relative dating method in which assemblages or artifacts from numerous sites in the same culture are placed in chronological order. Where absolute dating methods, such as radio carbon, cannot be applied, archaeologists have to use relative dating methods to date archaeological finds and features. Seriation is a standard method of dating in archaeology. It can be used to date stone tools, pottery fragments, and other artifacts. In Europe, it has been used frequently to reconstruct the chronological sequence of graves in a cemetery.

In statistics, confirmatory factor analysis (CFA) is a special form of factor analysis, most commonly used in social science research. It is used to test whether measures of a construct are consistent with a researcher's understanding of the nature of that construct. As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. CFA was first developed by Jöreskog (1969) and has built upon and replaced older methods of analyzing construct validity such as the MTMM Matrix as described in Campbell & Fiske (1959).

Correspondence analysis (CA) is a multivariate statistical technique proposed by Herman Otto Hartley (Hirschfeld) and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. In a similar manner to principal component analysis, it provides a means of displaying or summarising a set of data in two-dimensional graphical form. Its aim is to display in a biplot any structure hidden in the multivariate setting of the data table. As such it is a technique from the field of multivariate ordination. Since the variant of CA described here can be applied either with a focus on the rows or on the columns it should in fact be called simple (symmetric) correspondence analysis.

Detrended correspondence analysis (DCA) is a multivariate statistical technique widely used by ecologists to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community data. DCA is frequently used to suppress artifacts inherent in most other multivariate analyses when applied to gradient data.

Sparse principal component analysis is a technique used in statistical analysis and, in particular, in the analysis of multivariate data sets. It extends the classic method of principal component analysis (PCA) for the reduction of dimensionality of data by introducing sparsity structures to the input variables.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

Free statistical software is a practical alternative to commercial packages. Many of the free to use programs aim to be similar in function to commercial packages, in that they are general statistical packages that perform a variety of statistical analyses. Many other free to use programs were designed specifically for particular functions, like factor analysis, power analysis in sample size calculations, classification and regression trees, or analysis of missing data.

<span class="mw-page-title-main">LaplacesDemon</span> Open-source statistical package

LaplacesDemon is an open-source statistical package that is intended to provide a complete environment for Bayesian inference. LaplacesDemon has been used in numerous fields. The user writes their own model specification function and selects a numerical approximation algorithm to update their Bayesian model. Some numerical approximation families of algorithms include Laplace's method, numerical integration, Markov chain Monte Carlo (MCMC), and variational Bayesian methods.

In statistics, the class of vector generalized linear models (VGLMs) was proposed to enlarge the scope of models catered for by generalized linear models (GLMs). In particular, VGLMs allow for response variables outside the classical exponential family and for more than one parameter. Each parameter can be transformed by a link function. The VGLM framework is also large enough to naturally accommodate multiple responses; these are several independent responses each coming from a particular statistical distribution with possibly different parameter values.

Analysis of similarities (ANOSIM) is a non-parametric statistical test widely used in the field of ecology. The test was first suggested by K. R. Clarke as an ANOVA-like test, where instead of operating on raw data, operates on a ranked dissimilarity matrix.

Bibliometrix is a package for the R statistical programming language for quantitative research in scientometrics and bibliometrics.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

Marti J. Anderson is an ecological statistician whose works is interdisciplinary, from marine biology and ecology to mathematical and applied statistics. Her core areas of research and expertise are: community ecology, biodiversity, multivariate analysis, resampling methods, experimental designs, and statistical models of species abundances. Based in Auckland, New Zealand, she is a Distinguished Professor in the New Zealand Institute for Advanced Study at Massey University and also the Director of the New Zealand research and software-development company, PRIMER-e.

References

  1. Hui, Francis K.C.; Taskinen, Sara; Pledger, Shirley; Foster, Scott D.; Warton, David I. (2015). O'Hara, Robert B. (ed.). "Model‐based approaches to unconstrained ordination". Methods in Ecology and Evolution. 6 (4): 399–411. doi: 10.1111/2041-210X.12236 . ISSN   2041-210X. S2CID   62624917.
  2. Warton, David I.; Blanchet, F. Guillaume; O’Hara, Robert B.; Ovaskainen, Otso; Taskinen, Sara; Walker, Steven C.; Hui, Francis K. C. (2015-12-01). "So Many Variables: Joint Modeling in Community Ecology". Trends in Ecology & Evolution. 30 (12): 766–779. doi:10.1016/j.tree.2015.09.007. ISSN   0169-5347. PMID   26519235.
  3. Yee, Thomas W. (2004). "A New Technique for Maximum-Likelihood Canonical Gaussian Ordination". Ecological Monographs. 74 (4): 685–701. doi:10.1890/03-0078. ISSN   0012-9615.
  4. Hawinkel, Stijn; Kerckhof, Frederiek-Maarten; Bijnens, Luc; Thas, Olivier (2019-02-13). "A unified framework for unconstrained and constrained ordination of microbiome read count data". PLOS ONE. 14 (2): e0205474. doi: 10.1371/journal.pone.0205474 . ISSN   1932-6203. PMC   6373939 . PMID   30759084.
  5. van der Veen, Bert; Hui, Francis K. C.; Hovstad, Knut A.; O'Hara, Robert B. (2023). "Concurrent ordination: Simultaneous unconstrained and constrained latent variable modelling". Methods in Ecology and Evolution. 14 (2): 683–695. doi: 10.1111/2041-210X.14035 . hdl: 11250/3050891 . ISSN   2041-210X.

Further reading

  1. General
  2. Specific Techniques
  3. Software