Scott's Pi

Last updated April 23, 2023

Scott's pi (named after William A Scott) is a statistic for measuring inter-rater reliability for nominal data in communication studies. Textual entities are annotated with categories by different annotators, and various measures are used to assess the extent of agreement between the annotators, one of which is Scott's pi. Since automatically annotating text is a popular problem in natural language processing, and the goal is to get the computer program that is being developed to agree with the humans in the annotations it creates, assessing the extent to which humans agree with each other is important for establishing a reasonable upper limit on computer performance.

Introduction

Scott's pi is similar to Cohen's kappa in that they improve on simple observed agreement by factoring in the extent of agreement that might be expected by chance. However, in each statistic, the expected agreement is calculated slightly differently. Scott's pi makes the assumption that annotators have the same distribution of responses, which makes Cohen's kappa slightly more informative. Scott's pi is extended to more than two annotators by Fleiss' kappa.

The equation for Scott's pi, as in Cohen's kappa, is:

\pi ={\frac {\Pr(a)-\Pr(e)}{1-\Pr(e)}},

However, Pr(e) is calculated using squared "joint proportions" which are squared arithmetic means of the marginal proportions (whereas Cohen's uses squared geometric means of them).

Worked example

Confusion matrix for two annotators, three categories {Yes, No, Maybe} and 45 items rated (90 ratings for 2 annotators):

	Yes	No	Maybe	Marginal Sum
Yes	1	2	3	6
No	4	5	6	15
Maybe	7	8	9	24
Marginal Sum	12	15	18	45

To calculate the expected agreement, sum marginals across annotators and divide by the total number of ratings to obtain joint proportions. Square and total these:

	Ann1	Ann2	Joint Proportion	JP Squared
Yes	12	6	(12 + 6)/90 = 0.2	0.04
No	15	15	(15 + 15)/90 = 0.333	0.111
Maybe	18	24	(18 + 24)/90 = 0.467	0.218
Total				0.369

To calculate observed agreement, divide the number of items on which annotators agreed by the total number of items. In this case,

\Pr(a)={\frac {1+5+9}{45}}=0.333.

Given that Pr(e) = 0.369, Scott's pi is then

\pi ={\frac {0.333-0.369}{1-0.369}}=-0.057.

Related Research Articles

The number $π$ is a mathematical constant that is the ratio of a circle's circumference to its diameter, approximately equal to 3.14159. The number $π$ appears in many formulae across mathematics and physics. It is an irrational number, meaning that it cannot be expressed exactly as a ratio of two integers, although fractions such as $are commonly used to approximate it. Consequently, its decimal representation never ends, nor enters a permanently repeating pattern. It is a transcendental number, meaning that it cannot be a solution of an equation involving only sums, products, powers, and integers. The transcendence of π implies that it is impossible to solve the ancient challenge of squaring the circle with a compass and straightedge. The decimal digits of π appear to be randomly distributed, but no proof of this conjecture has been found.$

In probability and statistics, Student's t-distribution is any member of a family of continuous probability distributions that arise when estimating the mean of a normally distributed population in situations where the sample size is small and the population's standard deviation is unknown. It was developed by English statistician William Sealy Gosset under the pseudonym "Student".

In statistics, the binomial test is an exact test of the statistical significance of deviations from a theoretically expected distribution of observations into two categories using sample data.

In mathematics, a pairing function is a process to uniquely encode two natural numbers into a single natural number.

Cohen's kappa coefficient is a statistic that is used to measure inter-rater reliability for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, as κ takes into account the possibility of the agreement occurring by chance. There is controversy surrounding Cohen's kappa due to the difficulty in interpreting indices of agreement. Some researchers have suggested that it is conceptually simpler to evaluate disagreement between items.

The surface gravity, g, of an astronomical object is the gravitational acceleration experienced at its surface at the equator, including the effects of rotation. The surface gravity may be thought of as the acceleration due to gravity experienced by a hypothetical test particle which is very close to the object's surface and which, in order not to disturb the system, has negligible mass. For objects where the surface is deep in the atmosphere and the radius not known, the surface gravity is given at the 1 bar pressure level in the atmosphere.

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle $on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises-Fisher distribution on the N -dimensional sphere.$

The chemists Peter Debye and Erich Hückel noticed that solutions that contain ionic solutes do not behave ideally even at very low concentrations. So, while the concentration of the solutes is fundamental to the calculation of the dynamics of a solution, they theorized that an extra factor that they termed gamma is necessary to the calculation of the activities of the solution. Hence they developed the Debye–Hückel equation and Debye–Hückel limiting law. The activity is only proportional to the concentration and is altered by a factor known as the activity coefficient $. This factor takes into account the interaction energy of ions in solution.$

The Coulomb constant, the electric force constant, or the electrostatic constant (denoted $k e$ , $k$ or $K$ ) is a proportionality constant in electrostatics equations. In SI base units it is equal to 8.9875517923(14)×10⁹ kg⋅m³⋅s⁻⁴⋅A⁻². It was named after the French physicist Charles-Augustin de Coulomb (1736–1806) who introduced Coulomb's law.

Approximations for the mathematical constant pi in the history of mathematics reached an accuracy within 0.04% of the true value before the beginning of the Common Era. In Chinese mathematics, this was improved to approximations correct to what corresponds to about seven decimal digits by the 5th century.

Fleiss' kappa is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with other kappas such as Cohen's kappa, which only work when assessing the agreement between not more than two raters or the intra-rater reliability. The measure calculates the degree of agreement in classification over that which would be expected by chance.

The Debye–Hückel theory was proposed by Peter Debye and Erich Hückel as a theoretical explanation for departures from ideality in solutions of electrolytes and plasmas. It is a linearized Poisson–Boltzmann model, which assumes an extremely simplified model of electrolyte solution but nevertheless gave accurate predictions of mean activity coefficients for ions in dilute solution. The Debye–Hückel equation provides a starting point for modern treatments of non-ideality of electrolyte solutions.

In statistics, inter-rater reliability is the degree of agreement among independent observers who rate, code, or assess the same phenomenon.

Contact mechanics is the study of the deformation of solids that touch each other at one or more points. A central distinction in contact mechanics is between stresses acting perpendicular to the contacting bodies' surfaces and frictional stresses acting tangentially between the surfaces. Normal contact mechanics or frictionless contact mechanics focuses on normal stresses caused by applied normal forces and by the adhesion present on surfaces in close contact, even if they are clean and dry. Frictional contact mechanics emphasizes the effect of friction forces.

<span class="mw-page-title-main">Intraclass correlation</span> Descriptive statistic

In statistics, the intraclass correlation, or the intraclass correlation coefficient (ICC), is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations.

In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability.

In statistics, the multinomial test is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values; it is used for categorical data.

Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in psychological testing where alternative tests of the same phenomena need to be compared, or in observational studies where unstructured happenings are recorded for subsequent analysis.

The Hosmer–Lemeshow test is a statistical test for goodness of fit for logistic regression models. It is used frequently in risk prediction models. The test assesses whether or not the observed event rates match expected event rates in subgroups of the model population. The Hosmer–Lemeshow test specifically identifies subgroups as the deciles of fitted risk values. Models for which expected and observed event rates in subgroups are similar are called well calibrated.

A Calvo contract is the name given in macroeconomics to the pricing model that when a firm sets a nominal price there is a constant probability that a firm might be able to reset its price which is independent of the time since the price was last reset. The model was first put forward by Guillermo Calvo in his 1983 article "Staggered Prices in a Utility-Maximizing Framework". The original article was written in a continuous time mathematical framework, but nowadays is mostly used in its discrete time version. The Calvo model is the most common way to model nominal rigidity in new Keynesian DSGE macroeconomic models.

References

Scott, W. (1955). "Reliability of content analysis: The case of nominal scale coding." Public Opinion Quarterly, 19(3), 321–325.
Krippendorff, K. (2004b) “Reliability in content analysis: Some common misconceptions and recommendations.” in Human Communication Research. Vol. 30, pp. 411–433.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Scott's Pi

Contents

Introduction

Worked example

See also

Related Research Articles

References