Kendall's W

Last updated

Kendall's W (also known as Kendall's coefficient of concordance) is a non-parametric statistic for rank correlation. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters and in particular inter-rater reliability. Kendall's W ranges from 0 (no agreement) to 1 (complete agreement).

Contents

Suppose, for instance, that a number of people have been asked to rank a list of political concerns, from the most important to the least important. Kendall's W can be calculated from these data. If the test statistic W is 1, then all the survey respondents have been unanimous, and each respondent has assigned the same order to the list of concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various responses.

While tests using the standard Pearson correlation coefficient assume normally distributed values and compare two sequences of outcomes simultaneously, Kendall's W makes no assumptions regarding the nature of the probability distribution and can handle any number of distinct outcomes.

Steps of Kendall's W

Suppose that object i is given the rank ri,j by judge number j, where there are in total n objects and m judges. Then the total rank given to object i is

and the mean value of these total ranks is

The sum of squared deviations, S, is defined as

and then Kendall's W is defined as [1]

If the test statistic W is 1, then all the judges or survey respondents have been unanimous, and each judge or respondent has assigned the same order to the list of objects or concerns. If W is 0, then there is no overall trend of agreement among the respondents, and their responses may be regarded as essentially random. Intermediate values of W indicate a greater or lesser degree of unanimity among the various judges or respondents.

Kendall and Gibbons (1990) also show W is linearly related to the mean value of the Spearman's rank correlation coefficients between all possible pairs of rankings between judges

Incomplete Blocks

When the judges evaluate only some subset of the n objects, and when the correspondent block design is a (n, m, r, p, λ)-design (note the different notation). In other words, when

  1. each judge ranks the same number p of objects for some ,
  2. every object is ranked exactly the same total number r of times,
  3. and each pair of objects is presented together to some judge a total of exactly λ times, , a constant for all pairs.

Then Kendall's W is defined as [2]

If and so that each judge ranks all n objects, the formula above is equivalent to the original one.

Correction for Ties

When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. For example, the data set {80,76,34,80,73,80} has values of 80 tied for 4th, 5th, and 6th place; since the mean of {4,5,6} = 5, ranks would be assigned to the raw data values as follows: {5,3,1,5,2,5}.

The effect of ties is to reduce the value of W; however, this effect is small unless there are a large number of ties. To correct for ties, assign ranks to tied values as above and compute the correction factors

where ti is the number of tied ranks in the ith group of tied ranks, (where a group is a set of values having constant (tied) rank,) and gj is the number of groups of ties in the set of ranks (ranging from 1 to n) for judge j. Thus, Tj is the correction factor required for the set of ranks for judge j, i.e. the jth set of ranks. Note that if there are no tied ranks for judge j, Tj equals 0.

With the correction for ties, the formula for W becomes

where Ri is the sum of the ranks for object i, and is the sum of the values of Tj over all m sets of ranks. [3]

Steps of Weighted Kendall's W

In some cases, the importance of the raters (experts) might not be the same as each other. In this case, the Weighted Kendall's W should be used. [4] Suppose that object is given the rank by judge number , where there are in total objects and judges. Also, the weight of judge is shown by (in real-world situation, the importance of each rater can be different). Indeed, the weight of judges is . Then, the total rank given to object is

and the mean value of these total ranks is,

The sum of squared deviations, , is defined as,

and then Weighted Kendall's W is defined as,

The above formula is suitable when we do not have any tie rank.

Correction for Ties

In case of tie rank, we need to consider it in the above formula. To correct for ties, we should compute the correction factors,

where represents the number of tie ranks in judge for object . shows the total number of ties in judge . With the correction for ties, the formula for Weighted Kendall's W becomes,

If the weights of the raters are equal (the distribution of the weights is uniform), the value of Weighted Kendall's W and Kendall's W are equal. [4]

Significance Tests

In the case of complete ranks, a commonly used significance test for W against a null hypothesis of no agreement (i.e. random rankings) is given by Kendall and Gibbons (1990) [5]

Where the test statistic takes a chi-squared distribution with degrees of freedom.

In the case of incomplete rankings (see above), this becomes

Where again, there are degrees of freedom.

Legendre [6] compared via simulation the power of the chi-square and permutation testing approaches to determining significance for Kendall's W. Results indicated the chi-square method was overly conservative compared to a permutation test when . Marozzi [7] extended this by also considering the F test, as proposed in the original publication introducing the W statistic by Kendall & Babington Smith (1939):

Where the test statistic follows an F distribution with and degrees of freedom. Marozzi found the F test performs approximately as well as the permutation test method, and may be preferred to when is small, as it is computationally simpler.

Software

Kendall's W and Weighted Kendall's W are implemented in MATLAB, [8] SPSS, R, [9] and other statistical software packages.

See also

Notes

  1. Dodge (2003): see "concordance, coefficient of"
  2. Gibbons & Chakraborti (2003)
  3. Siegel & Castellan (1988, p. 266)
  4. 1 2 Mahmoudi, Amin; Abbasi, Mehdi; Yuan, Jingfeng; Li, Lingzhi (2022). "Large-scale group decision-making (LSGDM) for performance measurement of healthcare construction projects: Ordinal Priority Approach". Applied Intelligence. 52 (12): 13781–13802. doi:10.1007/s10489-022-04094-y. ISSN   1573-7497. PMC   9449288 . PMID   36091930.
  5. Kendall, Maurice G. (Maurice George), 1907-1983. (1990). Rank correlation methods . Gibbons, Jean Dickinson, 1938- (5th ed.). London: E. Arnold. ISBN   0-19-520837-4. OCLC   21195423.{{cite book}}: CS1 maint: multiple names: authors list (link)
  6. Legendre (2005)
  7. Marozzi, Marco (2014). "Testing for concordance between several criteria". Journal of Statistical Computation and Simulation. 84 (9): 1843–1850. doi:10.1080/00949655.2013.766189. S2CID   119577430.
  8. "Weighted Kendall's W". www.mathworks.com. Retrieved 2022-10-06.
  9. "Kendall's coefficient of concordance W – generalized for randomly incomplete datasets". The R Project for Statistical Computing.

Related Research Articles

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

<span class="mw-page-title-main">Spearman's rank correlation coefficient</span> Nonparametric measure of rank correlation

In statistics, Spearman's rank correlation coefficient or Spearman's ρ, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In statistics, the Mann–Whitney U test is a nonparametric test of the null hypothesis that, for randomly selected values X and Y from two populations, the probability of X being greater than Y is equal to the probability of Y being greater than X.

The Kruskal–Wallis test by ranks, Kruskal–Wallis H test, or one-way ANOVA on ranks is a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

<span class="mw-page-title-main">Directional statistics</span>

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used either to test the location of a population based on a sample of data, or to compare the locations of two populations using two matched samples. The one-sample version serves a purpose similar to that of the one-sample Student's t-test. For two matched samples, it is a paired difference test like the paired Student's t-test. The Wilcoxon test can be a good alternative to the t-test when population means are not of interest; for example, when one wishes to test whether a population's median is nonzero, or whether there is a better than 50% chance that a sample from one population is greater than a sample from another population.

The principle of detailed balance can be used in kinetic systems which are decomposed into elementary processes. It states that at equilibrium, each elementary process is in equilibrium with its reverse process.

The Friedman test is a non-parametric statistical test developed by Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row together, then considering the values of ranks by columns. Applicable to complete block designs, it is thus a special case of the Durbin test.

The point biserial correlation coefficient (rpb) is a correlation coefficient used when one variable is dichotomous; Y can either be "naturally" dichotomous, like whether a coin lands heads or tails, or an artificially dichotomized variable. In most situations it is not advisable to dichotomize variables artificially. When a new variable is artificially dichotomized the new dichotomous variable may be conceptualized as having an underlying continuity. If this is the case, a biserial correlation would be the more appropriate calculation.

In statistics, a rank correlation is any of several statistics that measure an ordinal association—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A rank correlation coefficient measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

<span class="mw-page-title-main">Chebyshev function</span>

In mathematics, the Chebyshev function is either a scalarising function (Tchebycheff function) or one of two related functions. The first Chebyshev functionϑ  (x) or θ (x) is given by

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

In statistics, the concordance correlation coefficient measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

Krippendorff's alpha coefficient, named after academic Klaus Krippendorff, is a statistical measure of the agreement achieved when coding a set of units of analysis. Since the 1970s, alpha has been used in content analysis where textual units are categorized by trained readers, in counseling and survey research where experts code open-ended interview data into analyzable terms, in psychological testing where alternative tests of the same phenomena need to be compared, or in observational studies where unstructured happenings are recorded for subsequent analysis.

In statistics, the Jonckheere trend test is a test for an ordered alternative hypothesis within an independent samples (between-participants) design. It is similar to the Kruskal–Wallis test in that the null hypothesis is that several independent samples are from the same population. However, with the Kruskal–Wallis test there is no a priori ordering of the populations from which the samples are drawn. When there is an a priori ordering, the Jonckheere test has more statistical power than the Kruskal–Wallis test. The test was developed by Aimable Robert Jonckheere, who was a psychologist and statistician at University College London.

In mathematics, a linear recurrence with constant coefficients sets equal to 0 a polynomial that is linear in the various iterates of a variable—that is, in the values of the elements of a sequence. The polynomial's linearity means that each of its terms has degree 0 or 1. A linear recurrence denotes the evolution of some variable over time, with the current time period or discrete moment in time denoted as t, one period earlier denoted as t − 1, one period later as t + 1, etc.

In statistics, ranking is the data transformation in which numerical or ordinal values are replaced by their rank when the data are sorted. For example, the numerical data 3.4, 5.1, 2.6, 7.3 are observed, the ranks of these data items would be 2, 3, 1 and 4 respectively. For example, the ordinal data hot, cold, warm would be replaced by 3, 1, 2. In these examples, the ranks are assigned to values in ascending order. Ranks are related to the indexed list of order statistics, which consists of the original dataset rearranged into ascending order.

References