Rand index

Last updated August 24, 2023

The Rand index^[1] or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. The Rand index is the accuracy of determining if a link belongs within a cluster or not.

Rand index

Definition

Given a set of $n$ elements $S=\{o_{1},\ldots ,o_{n}\}$ and two partitions of $S$ to compare, $X=\{X_{1},\ldots ,X_{r}\}$ , a partition of S into r subsets, and $Y=\{Y_{1},\ldots ,Y_{s}\}$ , a partition of S into s subsets, define the following:

$a$ , the number of pairs of elements in $S$ that are in the same subset in $X$ and in the same subset in $Y$
$b$ , the number of pairs of elements in $S$ that are in different subsets in $X$ and in different subsets in $Y$
$c$ , the number of pairs of elements in $S$ that are in the same subset in $X$ and in different subsets in $Y$
$d$ , the number of pairs of elements in $S$ that are in different subsets in $X$ and in the same subset in $Y$

The Rand index, $R$ , is:^[1]^[2]

R={\frac {a+b}{a+b+c+d}}={\frac {a+b}{n \choose 2}}

Intuitively, $a+b$ can be considered as the number of agreements between $X$ and $Y$ and $c+d$ as the number of disagreements between $X$ and $Y$ .

Since the denominator is the total number of pairs, the Rand index represents the frequency of occurrence of agreements over the total pairs, or the probability that $X$ and $Y$ will agree on a randomly chosen pair.

${n \choose 2}$ is calculated as $n(n-1)/2$ .

Similarly, one can also view the Rand index as a measure of the percentage of correct decisions made by the algorithm. It can be computed using the following formula:

RI={\frac {TP+TN}{TP+FP+FN+TN}}

where

TP

is the number of true positives,

TN

is the number of true negatives,

FP

is the number of false positives, and

FN

is the number of false negatives.

Properties

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusterings do not agree on any pair of points and 1 indicating that the data clusterings are exactly the same.

In mathematical terms, a, b, c, d are defined as follows:

$a=|S^{*}|$ , where $S^{*}=\{(o_{i},o_{j})\mid o_{i},o_{j}\in X_{k},o_{i},o_{j}\in Y_{l}\}$
$b=|S^{*}|$ , where $S^{*}=\{(o_{i},o_{j})\mid o_{i}\in X_{k_{1}},o_{j}\in X_{k_{2}},o_{i}\in Y_{l_{1}},o_{j}\in Y_{l_{2}}\}$
$c=|S^{*}|$ , where $S^{*}=\{(o_{i},o_{j})\mid o_{i},o_{j}\in X_{k},o_{i}\in Y_{l_{1}},o_{j}\in Y_{l_{2}}\}$
$d=|S^{*}|$ , where $S^{*}=\{(o_{i},o_{j})\mid o_{i}\in X_{k_{1}},o_{j}\in X_{k_{2}},o_{i},o_{j}\in Y_{l}\}$

for some $1\leq i,j\leq n,i\neq j,1\leq k,k_{1},k_{2}\leq r,k_{1}\neq k_{2},1\leq l,l_{1},l_{2}\leq s,l_{1}\neq l_{2}$

Relationship with classification accuracy

The Rand index can also be viewed through the prism of binary classification accuracy over the pairs of elements in $S$ . The two class labels are " $o_{i}$ and $o_{j}$ are in the same subset in $X$ and $Y$ " and " $o_{i}$ and $o_{j}$ are in different subsets in $X$ and $Y$ ".

In that setting, $a$ is the number of pairs correctly labeled as belonging to the same subset (true positives), and $b$ is the number of pairs correctly labeled as belonging to different subsets (true negatives).

Adjusted Rand index

The adjusted Rand index is the corrected-for-chance version of the Rand index.^[1]^[2]^[3] Such a correction for chance establishes a baseline by using the expected similarity of all pair-wise comparisons between clusterings specified by a random model. Traditionally, the Rand Index was corrected using the Permutation Model for clusterings (the number and size of clusters within a clustering are fixed, and all random clusterings are generated by shuffling the elements between the fixed clusters). However, the premises of the permutation model are frequently violated; in many clustering scenarios, either the number of clusters or the size distribution of those clusters vary drastically. For example, consider that in K-means the number of clusters is fixed by the practitioner, but the sizes of those clusters are inferred from the data. Variations of the adjusted Rand Index account for different models of random clusterings.^[4]

Though the Rand Index may only yield a value between 0 and +1, the adjusted Rand index can yield negative values if the index is less than the expected index.^[5]

The contingency table

Given a set $S$ of $n$ elements, and two groupings or partitions (e.g. clusterings) of these elements, namely $X=\{X_{1},X_{2},\ldots ,X_{r}\}$ and $Y=\{Y_{1},Y_{2},\ldots ,Y_{s}\}$ , the overlap between $X$ and $Y$ can be summarized in a contingency table $\left[n_{ij}\right]$ where each entry $n_{ij}$ denotes the number of objects in common between $X_{i}$ and $Y_{j}$ : $n_{ij}=|X_{i}\cap Y_{j}|$ .

{\begin{array}{c|cccc|c}{{} \atop X}\!\diagdown \!^{Y}&Y_{1}&Y_{2}&\cdots &Y_{s}&{\text{sums}}\\\hline X_{1}&n_{11}&n_{12}&\cdots &n_{1s}&a_{1}\\X_{2}&n_{21}&n_{22}&\cdots &n_{2s}&a_{2}\\\vdots &\vdots &\vdots &\ddots &\vdots &\vdots \\X_{r}&n_{r1}&n_{r2}&\cdots &n_{rs}&a_{r}\\\hline {\text{sums}}&b_{1}&b_{2}&\cdots &b_{s}&\end{array}}

Definition

The original Adjusted Rand Index using the Permutation Model is

ARI={\frac {\left.\sum _{ij}{\binom {n_{ij}}{2}}-\left[\sum _{i}{\binom {a_{i}}{2}}\sum _{j}{\binom {b_{j}}{2}}\right]\right/{\binom {n}{2}}}{\left.{\frac {1}{2}}\left[\sum _{i}{\binom {a_{i}}{2}}+\sum _{j}{\binom {b_{j}}{2}}\right]-\left[\sum _{i}{\binom {a_{i}}{2}}\sum _{j}{\binom {b_{j}}{2}}\right]\right/{\binom {n}{2}}}}

where $n_{ij},a_{i},b_{j}$ are values from the contingency table.

Related Research Articles

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers $n \geq k \geq 0$ and is written $It is the coefficient of the x k term in the polynomial expansion of the binomial power (1 + x) n; this coefficient can be computed by the multiplicative formula$

In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.

In linear algebra, a minor of a matrix A is the determinant of some smaller square matrix, cut down from A by removing one or more of its rows and columns. Minors obtained by removing just one row and one column from square matrices are required for calculating matrix cofactors, which in turn are useful for computing both the determinant and inverse of square matrices. The requirement that the square matrix be smaller than the original matrix is often omitted in the definition.

In linear algebra and functional analysis, the partial trace is a generalization of the trace. Whereas the trace is a scalar valued function on operators, the partial trace is an operator-valued function. The partial trace has applications in quantum information and decoherence which is relevant for quantum measurement and thereby to the decoherent approaches to interpretations of quantum mechanics, including consistent histories and the relative state interpretation.

In linear algebra, a nilpotent matrix is a square matrix N such that

In algebraic combinatorics, the Kruskal–Katona theorem gives a complete characterization of the f-vectors of abstract simplicial complexes. It includes as a special case the Erdős–Ko–Rado theorem and can be restated in terms of uniform hypergraphs. It is named after Joseph Kruskal and Gyula O. H. Katona, but has been independently discovered by several others.

In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron (1907) and Georg Frobenius (1912), asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by $or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions. They are named after James Stirling.$

In mathematics, in the field of functional analysis, the Cotlar–Stein almost orthogonality lemma is named after mathematicians Mischa Cotlar and Elias Stein. It may be used to obtain information on the operator norm on an operator, acting from one Hilbert space into another when the operator can be decomposed into almost orthogonal pieces. The original version of this lemma (for self-adjoint and mutually commuting operators) was proved by Mischa Cotlar in 1955 and allowed him to conclude that the Hilbert transform is a continuous linear operator in $without using the Fourier transform. A more general version was proved by Elias Stein.$

In mathematics, the Loomis–Whitney inequality is a result in geometry, which in its simplest form, allows one to estimate the "size" of a $-dimensional set by the sizes of its -dimensional projections. The inequality has applications in incidence geometry, the study of so-called "lattice animals", and other areas.$

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while more complex indices include the information entropy.

In mathematics, the Grothendieck inequality states that there is a universal constant $with the following property. If M ij is an n \times n matrix with$

In mathematics, the Schuette–Nesbitt formula is a generalization of the inclusion–exclusion principle. It is named after Donald R. Schuette and Cecil J. Nesbitt.

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra $. It can be used to relate an invariant ƒ to the invariant Ω ƒ, where Ω is Cayley's Ω process.$

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

<span class="mw-page-title-main">Variation of information</span> Measure of distance between two clusterings related to mutual information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings. It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

The bid–ask matrix is a matrix with elements corresponding with exchange rates between the assets. These rates are in physical units and not with respect to any numeraire. The $element of the matrix is the number of units of asset which can be exchanged for 1 unit of asset .$

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry $represents the rating of movie by customer, if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.$

In mathematics, a fusion frame of a vector space is a natural extension of a frame. It is an additive construct of several, potentially "overlapping" frames. The motivation for this concept comes from the event that a signal can not be acquired by a single sensor alone, rather the partial components of the signal must be collected via a network of sensors, and the partial signal representations are then fused into the complete signal.

In computer science, $sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.$

References

1 2 3 W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association . American Statistical Association. 66 (336): 846–850. doi:10.2307/2284239. JSTOR 2284239.
1 2 Lawrence Hubert and Phipps Arabie (1985). "Comparing partitions". Journal of Classification. 2 (1): 193–218. doi:10.1007/BF01908075.
↑ Nguyen Xuan Vinh, Julien Epps and James Bailey (2009). "Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?" (PDF). ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 1073–1080. PDF.
↑ Alexander J Gates and Yong-Yeol Ahn (2017). "The Impact of Random Models on Clustering Similarity" (PDF). Journal of Machine Learning Research. 18: 1–28.
↑ "Comparing Clusterings - An Overview" (PDF).

External links

C++ implementation with MATLAB mex files

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rand71-1] 1 2 3 W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association . American Statistical Association. 66 (336): 846–850. doi:10.2307/2284239. JSTOR 2284239.

[hb85-2] 1 2 Lawrence Hubert and Phipps Arabie (1985). "Comparing partitions". Journal of Classification. 2 (1): 193–218. doi:10.1007/BF01908075.

[3] Nguyen Xuan Vinh, Julien Epps and James Bailey (2009). "Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?" (PDF). ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 1073–1080. PDF.

[ga17-4] Alexander J Gates and Yong-Yeol Ahn (2017). "The Impact of Random Models on Clustering Similarity" (PDF). Journal of Machine Learning Research. 18: 1–28.

[5] "Comparing Clusterings - An Overview" (PDF).

[1]

[2]

[3]

[4]

[5]

v t e Machine learning evaluation metrics
Regression	MSE MAE sMAPE MAPE MASE MSPE RMS RMSE/RMSD R² MDA MAD
Classification	F-score P4 Accuracy Precision Recall Kappa MCC AUC ROC Sensitivity and specificity Logarithmic Loss
Clustering	Silhouette Calinski-Harabasz Davies-Bouldin Dunn index Hopkins statistic Jaccard index Rand index Similarity measure SMC SimHash
Ranking	MRR DCG NDCG AP
Computer Vision	PSNR SSIM IoU
NLP	Perplexity BLEU
Deep Learning Related Metrics	Inception score FID
Recommender system	Coverage Intra-list Similarity
Similarity	Cosine similarity Euclidean distance Pearson correlation coefficient
Confusion matrix

Rand index

Contents

Rand index

Definition

Properties

Relationship with classification accuracy

Adjusted Rand index

The contingency table

Definition

See also

Related Research Articles

References

External links