Generalized Procrustes analysis

Last updated

Generalized Procrustes analysis (GPA) is a method of statistical analysis that can be used to compare the shapes of objects, or the results of surveys, interviews, or panels. It was developed for analysing the results of free-choice profiling, a survey technique which allows respondents (such as sensory panelists) to describe a range of products in their own words or language. GPA is one way to make sense of free-choice profiling data; [1] other ways can be multiple factor analysis (MFA), [2] [3] or the STATIS method. [4] The method was first published by J. C. Gower in 1975. [5]

Generalized Procrustes analysis estimates the scaling factor applied to respondent scale usage, generating a weighting factor that is used to compensate for individual scale usage differences. Unlike measures such as a principal component analysis, GPA uses individual level data and a measure of variance is utilized in the analysis.

The Procrustes distance provides a metric to minimize in order to superimpose a pair of shape instances annotated by landmark points. GPA applies the Procrustes analysis method to superimpose a population of shapes instead of only two shape instances.

The algorithm outline is the following:

  1. arbitrarily choose a reference shape (typically by selecting it among the available instances)
  2. superimpose all instances to current reference shape
  3. compute the mean shape of the current set of superimposed shapes
  4. if the Procrustes distance between the mean shape and the reference is above a certain threshold, set the reference to mean shape and continue to step 2.

See also

Related Research Articles

<span class="mw-page-title-main">Shape</span> Form of an object or its external boundary

A shape or figure is a graphical representation of an object or its external boundary, outline, or external surface, as opposed to other properties such as color, texture, or material type. A plane shape or plane figure is constrained to lie on a plane, in contrast to solid 3D shapes. A two-dimensional shape or two-dimensional figure may lie on a more general curved surface.

<span class="mw-page-title-main">Computational sociology</span> Branch of the discipline of sociology

Computational sociology is a branch of sociology that uses computationally intensive methods to analyze and model social phenomena. Using computer simulations, artificial intelligence, complex statistical methods, and analytic approaches like social network analysis, computational sociology develops and tests theories of complex social processes through bottom-up modeling of social interactions.

In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". There are three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias. A few of the well known attempts to deal with missing data include: hot deck and cold deck imputation; listwise and pairwise deletion; mean imputation; non-negative matrix factorization; regression imputation; last observation carried forward; stochastic imputation; and multiple imputation.

<span class="mw-page-title-main">Morphometrics</span> Quantitative study of size and shape

Morphometrics or morphometry refers to the quantitative analysis of form, a concept that encompasses size and shape. Morphometric analyses are commonly performed on organisms, and are useful in analyzing their fossil record, the impact of mutations on shape, developmental changes in form, covariances between ecological factors and shape, as well for estimating quantitative-genetic parameters of shape. Morphometrics can be used to quantify a trait of evolutionary significance, and by detecting changes in the shape, deduce something of their ontogeny, function or evolutionary relationships. A major objective of morphometrics is to statistically test hypotheses about the factors that affect shape.

<span class="mw-page-title-main">Spatial analysis</span> Formal techniques which study entities using their topological, geometric, or geographic properties

Spatial analysis or spatial statistics includes any of the formal techniques which studies entities using their topological, geometric, or geographic properties. Spatial analysis includes a variety of techniques, many still in their early development, using different analytic approaches and applied in fields as diverse as astronomy, with its studies of the placement of galaxies in the cosmos, to chip fabrication engineering, with its use of "place and route" algorithms to build complex wiring structures. In a more restricted sense, spatial analysis is the technique applied to structures at the human scale, most notably in the analysis of geographic data or transcriptomics data.

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

<span class="mw-page-title-main">Procrustes analysis</span> Statistical shape analysis technique

In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off.

Free-choice profiling is a method for determining the quality of a thing by having a large number of subjects experience it and then allowing them to describe the thing in their own words, as opposed to posing them a set of "yes-no-maybe" questions. All of the descriptions are then analyzed to determine a "consensus configuration" of qualities, usually through Generalized Procrustes analysis (GPA) or Multiple factor analysis (MFA).

Consensus-based assessment expands on the common practice of consensus decision-making and the theoretical observation that expertise can be closely approximated by large numbers of novices or journeymen. It creates a method for determining measurement standards for very ambiguous domains of knowledge, such as emotional intelligence, politics, religion, values and culture in general. From this perspective, the shared knowledge that forms cultural consensus can be assessed in much the same way as expertise or general intelligence.

Statistical shape analysis is an analysis of the geometrical properties of some given set of shapes by statistical methods. For instance, it could be used to quantify differences between male and female gorilla skull shapes, normal and pathological bone shapes, leaf outlines with and without herbivory by insects, etc. Important aspects of shape analysis are to obtain a measure of distance between shapes, to estimate mean shapes from samples, to estimate shape variability within samples, to perform clustering and to test for differences between shapes. One of the main methods used is principal component analysis (PCA). Statistical shape analysis has applications in various fields, including medical imaging, computer vision, computational anatomy, sensor measurement, and geographical profiling.

The orthogonal Procrustes problem is a matrix approximation problem in linear algebra. In its classical form, one is given two matrices and and asked to find an orthogonal matrix which most closely maps to . Specifically,

Choice modelling attempts to model the decision process of an individual or segment via revealed preferences or stated preferences made in a particular context or contexts. Typically, it attempts to use discrete choices in order to infer positions of the items on some relevant latent scale. Indeed many alternative models exist in econometrics, marketing, sociometrics and other fields, including utility maximization, optimization applied to consumer theory, and a plethora of other identification strategies which may be more or less accurate depending on the data, sample, hypothesis and the particular decision being modelled. In addition, choice modelling is regarded as the most suitable method for estimating consumers' willingness to pay for quality improvements in multiple dimensions.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

The Generalized Additive Model for Location, Scale and Shape (GAMLSS) is an approach to statistical modelling and learning. GAMLSS is a modern distribution-based approach to (semiparametric) regression. A parametric distribution is assumed for the response (target) variable but the parameters of this distribution can vary according to explanatory variables using linear, nonlinear or smooth functions. In machine learning parlance, GAMLSS is a form of supervised machine learning.

The Mokken scale is a psychometric method of data reduction. A Mokken scale is a unidimensional scale that consists of hierarchically-ordered items that measure the same underlying, latent concept. This method is named after the political scientist Rob Mokken who suggested it in 1971.

Multiple factor analysis (MFA) is a factorial method devoted to the study of tables in which a group of individuals is described by a set of variables structured in groups. It may be seen as an extension of:

<span class="mw-page-title-main">Geometric morphometrics in anthropology</span>

The study of geometric morphometrics in anthropology has made a major impact on the field of morphometrics by aiding in some of the technological and methodological advancements. Geometric morphometrics is an approach that studies shape using Cartesian landmark and semilandmark coordinates that are capable of capturing morphologically distinct shape variables. The landmarks can be analyzed using various statistical techniques separate from size, position, and orientation so that the only variables being observed are based on morphology. Geometric morphometrics is used to observe variation in numerous formats, especially those pertaining to evolutionary and biological processes, which can be used to help explore the answers to a lot of questions in physical anthropology. Geometric morphometrics is part of a larger subfield in anthropology, which has more recently been named virtual anthropology. Virtual anthropology looks at virtual morphology, the use of virtual copies of specimens to perform various quantitative analyses on shape and form...

Thomas Lumley is an Australian statistician who serves as the chair of biostatistics at the University of Auckland in New Zealand. Lumley is also a member of the "R Core Team."

References

  1. Meullenet, Jean-François; Xiong, Rui; Findlay, Christopher J, eds. (2007). Multivariate and Probabilistic Analyses of Sensory Science Problems. doi:10.1002/9780470277539. ISBN   9780470277539.
  2. Escofier, B.; Pagès, J. (1994). "Multiple factor analysis (AFMULT package)". Computational Statistics & Data Analysis. 18: 121–140. doi:10.1016/0167-9473(94)90135-X.
  3. A comparison of GPA and MFA with sensory data is a chapter of the bookMFA is the core of a recent book: Pagès Jérôme (2014). Multiple Factor Analysis by Example Using R. Chapman & Hall/CRC The R Series London 272 p
  4. Lavit, C.; Escoufier, Y.; Sabatier, R.; Traissac, P. (1994). "The ACT (STATIS method)". Computational Statistics & Data Analysis. 18: 97–119. doi:10.1016/0167-9473(94)90134-1.
  5. Gower, J. C. (1975). "Generalized procrustes analysis". Psychometrika. 40: 33–51. doi:10.1007/BF02291478. hdl: 10.1007/BF02291478 . S2CID   122244491.