In statistics, the **Kendall rank correlation coefficient**, commonly referred to as **Kendall's τ coefficient** (after the Greek letter τ, tau), is a statistic used to measure the ordinal association between two measured quantities. A **τ test** is a non-parametric hypothesis test for statistical dependence based on the τ coefficient.

- Definition
- Properties
- Hypothesis test
- Accounting for ties
- Tau-a
- Tau-b
- Tau-c
- Significance tests
- Algorithms
- Software Implementations
- See also
- References
- Further reading
- External links

It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938,^{ [1] } though Gustav Fechner had proposed a similar measure in the context of time series in 1897.^{ [2] }

Intuitively, the Kendall correlation between two variables will be high when observations have a similar (or identical for a correlation of 1) rank (i.e. relative position label of the observations within the variable: 1st, 2nd, 3rd, etc.) between the two variables, and low when observations have a dissimilar (or fully different for a correlation of −1) rank between the two variables.

Both Kendall's and Spearman's can be formulated as special cases of a more general correlation coefficient.

Let be a set of observations of the joint random variables *X* and *Y*, such that all the values of () and () are unique (ties are neglected for simplicity). Any pair of observations and , where , are said to be * concordant * if the sort order of and * agrees: that is, if either both and holds or both and ; otherwise they are said to be **discordant*.

The Kendall τ coefficient is defined as:

^{ [3] }

Where is the binomial coefficient for the number of ways to choose two items from n items.

The denominator is the total number of pair combinations, so the coefficient must be in the range −1 ≤ *τ* ≤ 1.

- If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1.
- If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value −1.
- If
*X*and*Y*are independent, then we would expect the coefficient to be approximately zero. - An explicit expression for Kendall's rank coefficient is .

The Kendall rank coefficient is often used as a test statistic in a statistical hypothesis test to establish whether two variables may be regarded as statistically dependent. This test is non-parametric, as it does not rely on any assumptions on the distributions of *X* or *Y* or the distribution of (*X*,*Y*).

Under the null hypothesis of independence of *X* and *Y*, the sampling distribution of *τ* has an expected value of zero. The precise distribution cannot be characterized in terms of common distributions, but may be calculated exactly for small samples; for larger samples, it is common to use an approximation to the normal distribution, with mean zero and variance

- .
^{ [4] }

A pair is said to be *tied* if or ; a tied pair is neither concordant nor discordant. When tied pairs arise in the data, the coefficient may be modified in a number of ways to keep it in the range [−1, 1]:

The Tau-a statistic tests the strength of association of the cross tabulations. Both variables have to be ordinal. Tau-a will not make any adjustment for ties. It is defined as:

where *n*_{c}, *n*_{d} and *n*_{0} are defined as in the next section.

The Tau-b statistic, unlike Tau-a, makes adjustments for ties.^{ [5] } Values of Tau-b range from −1 (100% negative association, or perfect inversion) to +1 (100% positive association, or perfect agreement). A value of zero indicates the absence of association.

The Kendall Tau-b coefficient is defined as:

where

A simple algorithm developed in BASIC computes Tau-b coefficient using an alternative formula. ^{ [6] }

Be aware that some statistical packages, e.g. SPSS, use alternative formulas for computational efficiency, with double the 'usual' number of concordant and discordant pairs.^{ [7] }

Tau-c (also called Stuart-Kendall Tau-c)^{ [8] } is more suitable than Tau-b for the analysis of data based on non-square (i.e. rectangular) contingency tables.^{ [8] }^{ [9] } So use Tau-b if the underlying scale of both variables has the same number of possible values (before ranking) and Tau-c if they differ. For instance, one variable might be scored on a 5-point scale (very good, good, average, bad, very bad), whereas the other might be based on a finer 10-point scale.

The Kendall Tau-c coefficient is defined as:^{ [9] }

where

When two quantities are statistically independent, the distribution of is not easily characterizable in terms of known distributions. However, for the following statistic, , is approximately distributed as a standard normal when the variables are statistically independent:

Thus, to test whether two variables are statistically dependent, one computes , and finds the cumulative probability for a standard normal distribution at . For a 2-tailed test, multiply that number by two to obtain the *p*-value. If the *p*-value is below a given significance level, one rejects the null hypothesis (at that significance level) that the quantities are statistically independent.

Numerous adjustments should be added to when accounting for ties. The following statistic, , has the same distribution as the distribution, and is again approximately equal to a standard normal distribution when the quantities are statistically independent:

where

This is sometimes referred to as the Mann-Kendall test.^{ [10] }

The direct computation of the numerator , involves two nested iterations, as characterized by the following pseudocode:

numer := 0fori := 2..Ndoforj := 1..(i − 1)donumer := numer + sign(x[i] − x[j]) × sign(y[i] − y[j])returnnumer

Although quick to implement, this algorithm is in complexity and becomes very slow on large samples. A more sophisticated algorithm^{ [11] } built upon the Merge Sort algorithm can be used to compute the numerator in time.

Begin by ordering your data points sorting by the first quantity, , and secondarily (among ties in ) by the second quantity, . With this initial ordering, is not sorted, and the core of the algorithm consists of computing how many steps a Bubble Sort would take to sort this initial . An enhanced Merge Sort algorithm, with complexity, can be applied to compute the number of swaps, , that would be required by a Bubble Sort to sort . Then the numerator for is computed as:

where is computed like and , but with respect to the joint ties in and .

A Merge Sort partitions the data to be sorted, into two roughly equal halves, and , then sorts each half recursive, and then merges the two sorted halves into a fully sorted vector. The number of Bubble Sort swaps is equal to:

where and are the sorted versions of and , and characterizes the Bubble Sort swap-equivalent for a merge operation. is computed as depicted in the following pseudo-code:

functionM(L[1..n], R[1..m])isi := 1 j := 1 nSwaps := 0whilei ≤ nandj ≤ mdoifR[j] < L[i]thennSwaps := nSwaps + n − i + 1 j := j + 1elsei := i + 1returnnSwaps

A side effect of the above steps is that you end up with both a sorted version of and a sorted version of . With these, the factors and used to compute are easily obtained in a single linear-time pass through the sorted arrays.

- R's statistics base-package implements the test
`cor.test(x, y, method = "kendall")`

in its "stats" package (also`cor(x, y, method = "kendall")`

will work, but without returning the p-value). - For Python, the SciPy library implements the computation of in
`scipy.stats.kendalltau`

- Correlation
- Kendall tau distance
- Kendall's W
- Spearman's rank correlation coefficient
- Goodman and Kruskal's gamma
- Theil–Sen estimator
- Mann–Whitney U test - it is equivalent to Kendall's tau correlation coefficient if one of the variables is binary.

**Autocorrelation**, sometimes known as **serial correlation** in the discrete time case, is the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals.

In statistics, **correlation ** or **dependence ** is any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense **correlation** is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

In statistics, the **Pearson correlation coefficient** ― also known as **Pearson's r**, the

In statistics, **Spearman's rank correlation coefficient** or **Spearman's ρ**, named after Charles Spearman and often denoted by the Greek letter (rho) or as , is a nonparametric measure of rank correlation. It assesses how well the relationship between two variables can be described using a monotonic function.

In signal processing, **cross-correlation** is a measure of similarity of two series as a function of the displacement of one relative to the other. This is also known as a *sliding dot product* or *sliding inner-product*. It is commonly used for searching a long signal for a shorter, known feature. It has applications in pattern recognition, single particle analysis, electron tomography, averaging, cryptanalysis, and neurophysiology. The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero, and its size will be the signal energy.

In statistics, a **contingency table** is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business intelligence, engineering, and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them. The term *contingency table* was first used by Karl Pearson in "On the Theory of Contingency and Its Relation to Association and Normal Correlation", part of the *Drapers' Company Research Memoirs Biometric Series I* published in 1904.

In system analysis, among other fields of study, a **linear time-invariant system** is a system that produces an output signal from any input signal subject to the constraints of linearity and time-invariance; these terms are briefly defined below. These properties apply to many important physical systems, in which case the response ** y(t)** of the system to an arbitrary input

An **activity coefficient** is a factor used in thermodynamics to account for deviations from ideal behaviour in a mixture of chemical substances. In an ideal mixture, the microscopic interactions between each pair of chemical species are the same and, as a result, properties of the mixtures can be expressed directly in terms of simple concentrations or partial pressures of the substances present e.g. Raoult's law. Deviations from ideality are accommodated by modifying the concentration by an *activity coefficient*. Analogously, expressions involving gases can be adjusted for non-ideality by scaling partial pressures by a fugacity coefficient.

The **control variates** method is a variance reduction technique used in Monte Carlo methods. It exploits information about the errors in estimates of known quantities to reduce the error of an estimate of an unknown quantity.

In statistics, a **rank correlation** is any of several statistics that measure an **ordinal association**—the relationship between rankings of different ordinal variables or different rankings of the same variable, where a "ranking" is the assignment of the ordering labels "first", "second", "third", etc. to different observations of a particular variable. A **rank correlation coefficient** measures the degree of similarity between two rankings, and can be used to assess the significance of the relation between them. For example, two common nonparametric methods of significance that use rank correlation are the Mann–Whitney U test and the Wilcoxon signed-rank test.

**Kendall's W** is a non-parametric statistic. It is a normalization of the statistic of the Friedman test, and can be used for assessing agreement among raters. Kendall's

In statistics, a ** concordant pair** is a pair of observations, each on two variables, (**X**_{1},**Y**_{1}) and (**X**_{2},**Y**_{2}), having the property that

The **Kendall tau rank distance** is a metric that counts the number of pairwise disagreements between two ranking lists. The larger the distance, the more dissimilar the two lists are. Kendall tau distance is also called **bubble-sort distance** since it is equivalent to the number of swaps that the bubble sort algorithm would take to place one list in the same order as the other list. The Kendall tau distance was created by Maurice Kendall.

In statistics, **Goodman and Kruskal's gamma** is a measure of rank correlation, i.e., the similarity of the orderings of the data when ranked by each of the quantities. It measures the strength of association of the cross tabulated data when both variables are measured at the ordinal level. It makes no adjustment for either table size or ties. Values range from −1 to +1. A value of zero indicates the absence of association.

**Fluorescence cross-correlation spectroscopy** (**FCCS**) was introduced by Eigen and Rigler in 1994 and experimentally realized by Schwille in 1997. It is essentially an extension of the fluorescence correlation spectroscopy (FCS) procedure by utilizing two differentially colored molecules, instead of one. In other words, coincident green and red intensity fluctuations of distinct molecules correlate if green and red labeled particles are moving together through a predefined confocal volume. As a result, FCCS provides a highly sensitive measurement of molecular interactions independent of diffusion rate. This is an important advancement, given that diffusion rate depends only weakly on the size of the molecular complex.

In statistics, **L-moments** are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called **L-moment ratios** and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.

In statistical mechanics, the **Griffiths inequality**, sometimes also called **Griffiths–Kelly–Sherman inequality** or **GKS inequality**, named after Robert B. Griffiths, is a correlation inequality for ferromagnetic spin systems. Informally, it says that in ferromagnetic spin systems, if the 'a-priori distribution' of the spin is invariant under spin flipping, the correlation of any monomial of the spins is non-negative; and the two point correlation of two monomial of the spins is non-negative.

In machine learning, a **Ranking SVM** is a variant of the support vector machine algorithm, which is used to solve certain ranking problems. The ranking SVM algorithm was published by Thorsten Joachims in 2002. The original purpose of the algorithm was to improve the performance of an internet search engine. However, it was found that Ranking SVM also can be used to solve other problems such as Rank SIFT.

In statistics, **Somers’ D**, sometimes incorrectly referred to as Somer’s

**Tau functions** are an important ingredient in the modern theory of integrable systems, and have numerous applications in a variety of other domains. They were originally introduced by **Ryogo Hirota** in his *direct method* approach to soliton equations, based on expressing them in an equivalent bilinear form. The term **Tau function**, or **-function**, was first used systematically by Mikio Sato and his students in the specific context of the Kadomtsev–Petviashvili equation, and related integrable hierarchies. It is a central ingredient in the theory of solitons. Tau functions also appear as matrix model partition functions in the spectral theory of Random Matrices, and may also serve as generating functions, in the sense of combinatorics and enumerative geometry, especially in relation to moduli spaces of Riemann surfaces, and enumeration of branched coverings, or so-called Hurwitz numbers.

- ↑ Kendall, M. (1938). "A New Measure of Rank Correlation".
*Biometrika*.**30**(1–2): 81–89. doi:10.1093/biomet/30.1-2.81. JSTOR 2332226. - ↑ Kruskal, W. H. (1958). "Ordinal Measures of Association".
*Journal of the American Statistical Association*.**53**(284): 814–861. doi:10.2307/2281954. JSTOR 2281954. MR 0100941. - ↑ Nelsen, R.B. (2001) [1994], "Kendall tau metric",
*Encyclopedia of Mathematics*, EMS Press - ↑ Prokhorov, A.V. (2001) [1994], "Kendall coefficient of rank correlation",
*Encyclopedia of Mathematics*, EMS Press - ↑ Agresti, A. (2010).
*Analysis of Ordinal Categorical Data*(Second ed.). New York: John Wiley & Sons. ISBN 978-0-470-08289-8. - ↑ Alfred Brophy. "An algorithm and program for calculation of Kendall's rank correlation coefficient" (PDF).
- ↑ IBM (2016).
*IBM SPSS Statistics 24 Algorithms*. IBM. p. 168. Retrieved 31 August 2017. - 1 2 Berry, K. J.; Johnston, J. E.; Zahran, S.; Mielke, P. W. (2009). "Stuart's tau measure of effect size for ordinal variables: Some methodological considerations".
*Behavior Research Methods*.**41**(4): 1144–1148. doi: 10.3758/brm.41.4.1144 . PMID 19897822. - 1 2 Stuart, A. (1953). "The Estimation and Comparison of Strengths of Association in Contingency Tables".
*Biometrika*.**40**(1–2): 105–110. doi:10.2307/2333101. JSTOR 2333101. - ↑ Glen_b. "Relationship between Mann-Kendall and Kendall Tau-b".
- ↑ Knight, W. (1966). "A Computer Method for Calculating Kendall's Tau with Ungrouped Data".
*Journal of the American Statistical Association*.**61**(314): 436–439. doi:10.2307/2282833. JSTOR 2282833.

- Abdi, H. (2007). "Kendall rank correlation" (PDF). In Salkind, N.J. (ed.).
*Encyclopedia of Measurement and Statistics*. Thousand Oaks (CA): Sage. - Daniel, Wayne W. (1990). "Kendall's tau".
*Applied Nonparametric Statistics*(2nd ed.). Boston: PWS-Kent. pp. 365–377. ISBN 978-0-534-91976-4. - Kendall, Maurice; Gibbons, Jean Dickinson (1990) [First published 1948].
*Rank Correlation Methods*. Charles Griffin Book Series (5th ed.). Oxford: Oxford University Press. ISBN 978-0195208375. - Bonett, Douglas G.; Wright, Thomas A. (2000). "Sample size requirements for estimating Pearson, Kendall, and Spearman correlations".
*Psychometrika*.**65**(1): 23–28. doi:10.1007/BF02294183.

This page is based on this Wikipedia article

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.

Text is available under the CC BY-SA 4.0 license; additional terms may apply.

Images, videos and audio are available under their respective licenses.