# Thurstonian model

Last updated

A Thurstonian model is a stochastic transitivity model with latent variables for describing the mapping of some continuous scale onto discrete, possibly ordered categories of response. In the model, each of these categories of response corresponds to a latent variable whose value is drawn from a normal distribution, independently of the other response variables and with constant variance. Developments over the last two decades, however, have led to Thurstonian models that allow unequal variance and non zero covariance terms. Thurstonian models have been used as an alternative to generalized linear models in analysis of sensory discrimination tasks. [1] They have also been used to model long-term memory in ranking tasks of ordered alternatives, such as the order of the amendments to the US Constitution. [2] Their main advantage over other models ranking tasks is that they account for non-independence of alternatives. [3] Ennis [4] provides a comprehensive account of the derivation of Thurstonian models for a wide variety of behavioral tasks including preferential choice, ratings, triads, tetrads, dual pair, same-different and degree of difference, ranks, first-last choice, and applicability scoring. In Chapter 7 of this book[ citation needed ], a closed form expression, derived in 1988, is given for a Euclidean-Gaussian similarity model that provides a solution to the well-known problem that many Thurstonian models are computationally complex often involving multiple integration. In Chapter 10, a simple form for ranking tasks is presented that only involves the product of univariate normal distribution functions and includes rank-induced dependency parameters. A theorem is proven that shows that the particular form of the dependency parameters provides the only way that this simplification is possible. Chapter 6 links discrimination, identification and preferential choice through a common multivariate model in the form of weighted sums of central F distribution functions and allows a general variance-covariance matrix for the items.

## Definition

Consider a set of m options to be ranked by n independent judges. Such a ranking can be represented by the ordering vector rn = (rn1, rn2,...,rnm).

Rankings are assumed to be derived from real-valued latent variables zij, representing the evaluation of option j by judge i. Rankings ri are derived deterministically from zi such that zi(ri1) < zi(ri2) < ... < zi(rim).

The zi are assumed to be derived from an underlying ground truth value μ for each option. In the most general case, they are multivariate-normal:

${\displaystyle \mathbf {z} _{j}\ \sim \ {\mathcal {N}}(\mathbf {\mu } _{j},\mathbf {\Sigma } _{j})}$

One common simplification is to assume an isotropic Gaussian distribution, with a single standard deviation parameter for each judge:

${\displaystyle \mathbf {z} _{j}\ \sim \ {\mathcal {N}}(\mathbf {\mu } _{j},\sigma _{i}^{2}\mathbf {I} ).}$

## Inference

The Gibbs-sampler based approach to estimating model parameters is due to Yao and Bockenholt (1999). [3]

• Step 1: Given β, Σ, and ri, sample zi.

The zij must be sampled from a truncated multivariate normal distribution to preserve their rank ordering. Hajivassiliou's Truncated Multivariate Normal Gibbs sampler can be used to sample efficiently. [5] [6]

• Step 2: Given Σ, zi, sample β.

β is sampled from a normal distribution:

${\displaystyle \beta \ \sim \ {\mathcal {N}}(\beta ^{*},\Sigma ^{*}).}$

where β* and Σ* are the current estimates for the means and covariance matrices.

• Step 3: Given β, zi, sample Σ.

Σ−1 is sampled from a Wishart posterior, combining a Wishart prior with the data likelihood from the samples εi =zi - β.

## History

Thurstonian models were introduced by Louis Leon Thurstone to describe the law of comparative judgment. [7] Prior to 1999, Thurstonian models were rarely used for modeling tasks involving more than 4 options because of the high-dimensional integration required to estimate parameters of the model. In 1999, Yao and Bockenholt introduced their Gibbs-sampler based approach to estimating model parameters. [3] This comment, however, only applies to ranking and Thurstonian models with a much broader range of applications were developed prior to 1999. For instance, a multivariate Thurstonian model for preferential choice with a general variance-covariance structure is discussed in chapter 6 of Ennis (2016) that was based on papers published in 1993 and 1994. Even earlier, a closed form for a Thurstonian multivariate model of similarity with arbitrary covariance matrices was published in 1988 as discussed in Chapter 7 of Ennis (2016). This model has numerous applications and is not limited to any particular number of items or individuals.

## Applications to sensory discrimination

Thurstonian models have been applied to a range of sensory discrimination tasks, including auditory, taste, and olfactory discrimination, to estimate sensory distance between stimuli that range along some sensory continuum. [8] [9] [10]

The Thurstonian approach motivated Frijter (1979)'s explanation of Gridgeman's Paradox, also known as the paradox of discriminatory nondiscriminators: [1] [9] [11] [12] People perform better in a three-alternative forced choice task when told in advance which dimension of the stimulus to attend to. (For example, people are better at identifying which of one three drinks is different from the other two when told in advance that the difference will be in degree of sweetness.) This result is accounted for by differing cognitive strategies: when the relevant dimension is known in advance, people can estimate values along that particular dimension. When the relevant dimension is not known in advance, they must rely on a more general, multi-dimensional measure of sensory distance.

The above paragraph contains a common misunderstanding of the Thurstonian resolution of Gridgeman's paradox. Although it is true that different decision rules (cognitive strategies) are used in making a choice among three alternatives, the mere fact of knowing an attribute in advance does not explain the paradox, nor are subjects required to rely on a more general, multidimensional measure of sensory difference. In the triangular method, for instance, the subject is instructed to choose the most different of three items, two of which are putatively identical. The items may differ on a unidimensional scale and the subject may be made aware of the nature of the scale in advance. Gridgeman's paradox will still be observed. This occurs because of the sampling process combined with a distance-based decision rule as opposed to a magnitude-based decision rule assumed to model the results of the 3-alternative forced choice task.

## Related Research Articles

In signal processing, white noise is a random signal having equal intensity at different frequencies, giving it a constant power spectral density. The term is used, with this or similar meanings, in many scientific and technical disciplines, including physics, acoustical engineering, telecommunications, and statistical forecasting. White noise refers to a statistical model for signals and signal sources, rather than to any specific signal. White noise draws its name from white light, although light that appears white generally does not have a flat power spectral density over the visible band.

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

The principal components of a collection of points in a real coordinate space are a sequence of unit vectors, where the vector is the direction of a line that best fits the data while being orthogonal to the first vectors. Here, a best-fitting line is defined as one that minimizes the average squared distance from the points to the line. These directions constitute an orthonormal basis in which different individual dimensions of the data are linearly uncorrelated. Principal component analysis (PCA) is the process of computing the principal components and using them to perform a change of basis on the data, sometimes using only the first few principal components and ignoring the rest.

In statistics, a vector of random variables is heteroscedastic if the variability of the random disturbance is different across elements of the vector. Here, variability could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. A typical example is the set of observations of income in different cities.

In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in Rp×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has normal distribution, the sample covariance matrix has Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares.

In statistics, generalized least squares (GLS) is a technique for estimating the unknown parameters in a linear regression model when there is a certain degree of correlation between the residuals in a regression model. In these cases, ordinary least squares and weighted least squares can be statistically inefficient, or even give misleading inferences. GLS was first described by Alexander Aitken in 1936.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In statistics and econometrics, the multivariate probit model is a generalization of the probit model used to estimate several correlated binary outcomes jointly. For example, if it is believed that the decisions of sending at least one child to public school and that of voting in favor of a school budget are correlated, then the multivariate probit model would be appropriate for jointly predicting these two choices on an individual-specific basis. J.R. Ashford and R.R. Sowden initially proposed an approach for multivariate probit analysis. Siddhartha Chib and Edward Greenberg extended this idea and also proposed simulation-based inference methods for the multivariate probit model which simplified and generalized parameter estimation.

In probability and statistics, the truncated normal distribution is the probability distribution derived from that of a normally distributed random variable by bounding the random variable from either below or above. The truncated normal distribution has wide applications in statistics and econometrics. For example, it is used to model the probabilities of the binary outcomes in the probit model and to model censored data in the tobit model.

Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covariates – is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the 'factoring out' of noise or error that has been introduced by the covariant. A commonly used multivariate version of the ANOVA F-statistic is Wilks' Lambda (Λ), which represents the ratio between the error variance and the effect variance.

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

In probability and statistics, an elliptical distribution is any member of a broad family of probability distributions that generalize the multivariate normal distribution. Intuitively, in the simplified two and three dimensional case, the joint distribution forms an ellipse and an ellipsoid, respectively, in iso-density plots.

In statistics, the matrix t-distribution is the generalization of the multivariate t-distribution from vectors to matrices. The matrix t-distribution shares the same relationship with the multivariate t-distribution that the matrix normal distribution shares with the multivariate normal distribution. For example, the matrix t-distribution is the compound distribution that results from sampling from a matrix normal distribution having sampled the covariance matrix of the matrix normal from an inverse Wishart distribution.

In statistics, linear regression is a linear approach to modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

The GHK algorithm is an importance sampling method for simulating choice probabilities in the multivariate probit model. These simulated probabilities can be used to recover parameter estimates from the maximized likelihood equation using any one of the usual well known maximization methods. Train has well documented steps for implementing this algorithm for a multinomial probit model. What follows here will applies to the binary multivariate probit model.

In the mathematical theory of probability, multivariate Laplace distributions are extensions of the Laplace distribution and the asymmetric Laplace distribution to multiple variables. The marginal distributions of symmetric multivariate Laplace distribution variables are Laplace distributions. The marginal distributions of asymmetric multivariate Laplace distribution variables are asymmetric Laplace distributions.

## References

1. Lundahl, David (1997). "Thurstonian Models — an Answer to Gridgeman's Paradox?". CAMO Software Statistical Methods.
2. Lee, Michael; Steyvers, Mark; de Young, Mindy; Miller, Brent (2011). "A Model-Based Approach to Measuring Expertise in Ranking Tasks" (PDF). CogSci 2011 Proceedings (PDF). ISBN   978-0-9768318-7-7.
3. Yao, G.; Bockenholt, U. (1999). "Bayesian estimation of Thurstonian ranking models based on the Gibbs sampler". British Journal of Mathematical and Statistical Psychology. 52: 19–92. doi:10.1348/000711099158973.
4. Ennis, Daniel (2016). Thurstonian Models — Categorical Decision Making in the Presence of Noise. Richmond: The Institute for Perception. ISBN   978-0-9906446-0-6.
5. Hajivassiliou, V.A. (1993). "Simulation estimation methods for limited dependent variable models". In Maddala, G.S.; Rao, C.R.; Vinod, H.D. (eds.). Econometrics. Handbook of statistics. 11. Amsterdam: Elsevier. ISBN   0444895779.
6. V.A., Hajivassiliou; D., McFadden; P., Ruud (1996). "Simulation of multivariate normal rectangle probabilities and their derivatives. Theoretical and computational results". Journal of Econometrics. 72 (1–2): 85–134. doi:10.1016/0304-4076(94)01716-6.
7. Thurstone, Louis Leon (1927). "A Law of Comparative Judgment". Psychological Review. 34 (4): 273–286. doi:10.1037/h0070288. Reprinted: Thurstone, L. L. (1994). "A law of comparative judgment". Psychological Review. 101 (2): 266–270. doi:10.1037/0033-295X.101.2.266.
8. Durlach, N.I.; Braida, L.D. (1969). "Intensity Perception. I. Preliminary Theory of Intensity Resolution". Journal of the Acoustical Society of America . 46 (2): 372–383. Bibcode:1969ASAJ...46..372D. doi:10.1121/1.1911699. PMID   5804107.
9. Dessirier, Jean-Marc; O’Mahony, Michael (9 October 1998). "Comparison of d′ values for the 2-AFC (paired comparison) and 3-AFC discrimination methods: Thurstonian models, sequential sensitivity analysis and power". Food Quality and Preference. 10 (1): 51–58. doi:10.1016/S0950-3293(98)00037-8.
10. Frijter, J.E.R. (1980). "Three-stimulus procedures in olfactory psychophysics: an experimental comparison of Thurstone-Ura and three-alternative forced choice models of signal detection theory". Perception & Psychophysics. 28 (5): 390–7. doi:. PMID   7208248.
11. Gridgement, N.T. (1970). "A Reexamination of the Two-Stage Triangle Test for the Perception of Sensory Differences". Journal of Food Science. 35 (1): 87–91. doi:10.1111/j.1365-2621.1970.tb12376.x.
12. Frijters, J.E.R. (1979). "The paradox of discriminatory nondiscriminators resolved". Chemical Senses & Flavor. 4 (4): 355–8. doi:10.1093/chemse/4.4.355.