Ridit scoring

Last updated

In statistics, ridit scoring is a statistical method used to analyze ordered qualitative measurements. The tools of ridit analysis were developed and first applied by Bross, [1] who coined the term "ridit" by analogy with other statistical transformations such as probit and logit. A ridit describes how the distribution of the dependent variable in row i of a contingency table compares relative to an identified distribution (e.g., the marginal distribution of the dependent variable).

Contents

Calculation of ridit scores

Choosing a reference data set

Since ridit scoring is used to compare two or more sets of ordered qualitative data, one set is designated as a reference against which other sets can be compared. In econometric studies, for example, the ridit scores measuring taste survey answers of a competing or historically important product are often used as the reference data set against which taste surveys of new products are compared. Absent a convenient reference data set, an accumulation of pooled data from several sets or even an artificial or hypothetical set can be used.

Determining the probability function

After a reference data set has been chosen, the reference data set must be converted to a probability function. To do this, let x1, x2,..., xn denote the ordered categories of the preference scale. For each j, xj represents a choice or judgment. Then, let the probability function p be defined with respect to the reference data set as

Determining ridits

The ridit scores, or simply ridits, of the reference data set are then easily calculated as

Each of the categories of the reference data set are then associated with a ridit score. More formally, for each , the value wj is the ridit score of the choice xj.

Interpretation and examples

Intuitively, ridit scores can be understood as a modified notion of percentile ranks. For any j, if xj has a low (close to 0) ridit score, one can conclude that

is very small, which is to say that very few respondents have chosen a category "lower" than xj.

Applications

Ridit scoring has found use primarily in the health sciences (including nursing and epidemiology) and econometric preference studies.[ citation needed ]

A mathematical approach

Besides having intuitive appeal, the derivation for ridit scoring can be arrived at with mathematically rigorous methods as well. Brockett and Levine [2] presented a derivation of the above ridit score equations based on several intuitively uncontroversial mathematical postulates.

Notes

R statistical computing package for Ridit Analysis: https://cran.r-project.org/package=Ridit

  1. Bross, Irwin D.J. (1958) "How to Use Ridit Analysis," Biometrics, 14 (1):18-38 JSTOR   2527727
  2. Brockett, Patrick L. and Levine, Arnold (1977) "On a Characterization of Ridits," The Annals of Statistics, 5 (6):1245-1248 JSTOR   2958658

Further reading

Donaldson, G. W. (1998). "Ridit scores for analysis and interpretation of ordinal pain data". European Journal of Pain. 2 (3): 221–227. doi:10.1016/S1090-3801(98)90018-0. PMID   15102382. S2CID   37751388.

Related Research Articles

<span class="mw-page-title-main">Kolmogorov–Smirnov test</span> Non-parametric statistical test between two distributions

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution, or to compare two samples. In essence, the test answers the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov.

<span class="mw-page-title-main">Median</span> Middle quantile of a data set or probability distribution

In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe center of the income distribution because increases in the largest incomes alone have no effect on median. For this reason, the median is of central importance in robust statistics.

<span class="mw-page-title-main">Partially ordered set</span> Mathematical set with an ordering

In mathematics, especially order theory, a partial order on a set is an arrangement such that, for certain pairs of elements, one precedes the other. The word partial is used to indicate that not every pair of elements needs to be comparable; that is, there may be pairs for which neither element precedes the other. Partial orders thus generalize total orders, in which every pair is comparable.

In statistics, a quartile is a type of quantile which divides the number of data points into four parts, or quarters, of more-or-less equal size. The data must be ordered from smallest to largest to compute quartiles; as such, quartiles are a form of order statistic. The three main quartiles are as follows:

In mathematics, the branch of real analysis studies the behavior of real numbers, sequences and series of real numbers, and real functions. Some particular properties of real-valued sequences and functions that real analysis studies include convergence, limits, continuity, smoothness, differentiability and integrability.

<span class="mw-page-title-main">Order statistic</span> Kth smallest value in a statistical sample

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.

<span class="mw-page-title-main">Direct limit</span> Special case of colimit in category theory

In mathematics, a direct limit is a way to construct a object from many objects that are put together in a specific way. These objects may be groups, rings, vector spaces or in general objects from any category. The way they are put together is specified by a system of homomorphisms between those smaller objects. The direct limit of the objects , where ranges over some directed set , is denoted by .

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

A permutation test is an exact statistical hypothesis test making use of the proof by contradiction. A permutation test involves two or more samples. The null hypothesis is that all samples come from the same distribution . Under the null hypothesis, the distribution of the test statistic is obtained by calculating all possible values of the test statistic under possible rearrangements of the observed data. Permutation tests are, therefore, a form of resampling.

<span class="mw-page-title-main">Kaplan–Meier estimator</span> Non-parametric statistic used to estimate the survival function

The Kaplan–Meier estimator, also known as the product limit estimator, is a non-parametric statistic used to estimate the survival function from lifetime data. In medical research, it is often used to measure the fraction of patients living for a certain amount of time after treatment. In other fields, Kaplan–Meier estimators may be used to measure the length of time people remain unemployed after a job loss, the time-to-failure of machine parts, or how long fleshy fruits remain on plants before they are removed by frugivores. The estimator is named after Edward L. Kaplan and Paul Meier, who each submitted similar manuscripts to the Journal of the American Statistical Association. The journal editor, John Tukey, convinced them to combine their work into one paper, which has been cited more than 61,800 times since its publication in 1958.

In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.

In statistics, the ordered logit model is an ordinal regression model—that is, a regression model for ordinal dependent variables—first considered by Peter McCullagh. For example, if one question on a survey is to be answered by a choice among "poor", "fair", "good", "very good" and "excellent", and the purpose of the analysis is to see how well that response can be predicted by the responses to other questions, some of which may be quantitative, then ordered logistic regression may be used. It can be thought of as an extension of the logistic regression model that applies to dichotomous dependent variables, allowing for more than two (ordered) response categories.

In statistics, ordered probit is a generalization of the widely used probit analysis to the case of more than two outcomes of an ordinal dependent variable. Similarly, the widely used logit method also has a counterpart ordered logit. Ordered probit, like ordered logit, is a particular method of ordinal regression.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In economics, discrete choice models, or qualitative choice models, describe, explain, and predict choices between two or more discrete alternatives, such as entering or not entering the labor market, or choosing between modes of transport. Such choices contrast with standard consumption models in which the quantity of each good consumed is assumed to be a continuous variable. In the continuous case, calculus methods can be used to determine the optimum amount chosen, and demand can be modeled empirically using regression analysis. On the other hand, discrete choice analysis examines situations in which the potential outcomes are discrete, such that the optimum is not characterized by standard first-order conditions. Thus, instead of examining "how much" as in problems with continuous choice variables, discrete choice analysis examines "which one". However, discrete choice analysis can also be used to examine the chosen quantity when only a few distinct quantities must be chosen from, such as the number of vehicles a household chooses to own and the number of minutes of telecommunications service a customer decides to purchase. Techniques such as logistic regression and probit regression can be used for empirical analysis of discrete choice.

In probability theory and statistics, a categorical distribution is a discrete probability distribution that describes the possible results of a random variable that can take on one of K possible categories, with the probability of each category separately specified. There is no innate underlying ordering of these outcomes, but numerical labels are often attached for convenience in describing the distribution,. The K-dimensional categorical distribution is the most general distribution over a K-way event; any other discrete distribution over a size-K sample space is a special case. The parameters specifying the probabilities of each possible outcome are constrained only by the fact that each must be in the range 0 to 1, and all must sum to 1.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

In statistics, ordinal regression, also called ordinal classification, is a type of regression analysis used for predicting an ordinal variable, i.e. a variable whose value exists on an arbitrary scale where only the relative ordering between different values is significant. It can be considered an intermediate problem between regression and classification. Examples of ordinal regression are ordered logit and ordered probit. Ordinal regression turns up often in the social sciences, for example in the modeling of human levels of preference, as well as in information retrieval. In machine learning, ordinal regression may also be called ranking learning.

Ordinal data is a categorical, statistical data type where the variables have natural, ordered categories and the distances between the categories are not known. These data exist on an ordinal scale, one of four levels of measurement described by S. S. Stevens in 1946. The ordinal scale is distinguished from the nominal scale by having a ranking. It also differs from the interval scale and ratio scale by not having category widths that represent equal increments of the underlying attribute.