Rand index

Last updated
Example clusterings for a dataset with the kMeans (left) and Mean shift (right) algorithms. The calculated Adjusted Rand index for these two clusterings is
A
R
I
[?]
0.94
{\displaystyle ARI\approx 0.94} Example for Adjusted Rand index.svg
Example clusterings for a dataset with the kMeans (left) and Mean shift (right) algorithms. The calculated Adjusted Rand index for these two clusterings is

The Rand index [1] or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. The Rand index is the accuracy of determining if a link belongs within a cluster or not.

Contents

Rand index

Definition

Given a set of elements and two partitions of to compare, , a partition of S into r subsets, and , a partition of S into s subsets, define the following:

The Rand index, , is: [1] [2]

Intuitively, can be considered as the number of agreements between and and as the number of disagreements between and .

Since the denominator is the total number of pairs, the Rand index represents the frequency of occurrence of agreements over the total pairs, or the probability that and will agree on a randomly chosen pair.

is calculated as .

Similarly, one can also view the Rand index as a measure of the percentage of correct decisions made by the algorithm. It can be computed using the following formula:

where is the number of true positives, is the number of true negatives, is the number of false positives, and is the number of false negatives.

Properties

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusterings do not agree on any pair of points and 1 indicating that the data clusterings are exactly the same.

In mathematical terms, a, b, c, d are defined as follows:

for some

Relationship with classification accuracy

The Rand index can also be viewed through the prism of binary classification accuracy over the pairs of elements in . The two class labels are " and are in the same subset in and " and " and are in different subsets in and ".

In that setting, is the number of pairs correctly labeled as belonging to the same subset (true positives), and is the number of pairs correctly labeled as belonging to different subsets (true negatives).

Adjusted Rand index

The adjusted Rand index is the corrected-for-chance version of the Rand index. [1] [2] [3] Such a correction for chance establishes a baseline by using the expected similarity of all pair-wise comparisons between clusterings specified by a random model. Traditionally, the Rand Index was corrected using the Permutation Model for clusterings (the number and size of clusters within a clustering are fixed, and all random clusterings are generated by shuffling the elements between the fixed clusters). However, the premises of the permutation model are frequently violated; in many clustering scenarios, either the number of clusters or the size distribution of those clusters vary drastically. For example, consider that in K-means the number of clusters is fixed by the practitioner, but the sizes of those clusters are inferred from the data. Variations of the adjusted Rand Index account for different models of random clusterings. [4]

Though the Rand Index may only yield a value between 0 and +1, the adjusted Rand index can yield negative values if the index is less than the expected index. [5]

The contingency table

Given a set S of n elements, and two groupings or partitions (e.g. clusterings) of these elements, namely and , the overlap between X and Y can be summarized in a contingency table where each entry denotes the number of objects in common between and  : .

Definition

The original Adjusted Rand Index using the Permutation Model is

where are values from the contingency table.

See also

Related Research Articles

<span class="mw-page-title-main">Binomial coefficient</span> Number of subsets of a given size

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n; this coefficient can be computed by the multiplicative formula

In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.

In linear algebra, a minor of a matrix A is the determinant of some smaller square matrix, cut down from A by removing one or more of its rows and columns. Minors obtained by removing just one row and one column from square matrices are required for calculating matrix cofactors, which in turn are useful for computing both the determinant and inverse of square matrices. The requirement that the square matrix be smaller than the original matrix is often omitted in the definition.

In linear algebra and functional analysis, the partial trace is a generalization of the trace. Whereas the trace is a scalar valued function on operators, the partial trace is an operator-valued function. The partial trace has applications in quantum information and decoherence which is relevant for quantum measurement and thereby to the decoherent approaches to interpretations of quantum mechanics, including consistent histories and the relative state interpretation.

In linear algebra, a nilpotent matrix is a square matrix N such that

In algebraic combinatorics, the Kruskal–Katona theorem gives a complete characterization of the f-vectors of abstract simplicial complexes. It includes as a special case the Erdős–Ko–Rado theorem and can be restated in terms of uniform hypergraphs. It is named after Joseph Kruskal and Gyula O. H. Katona, but has been independently discovered by several others.

In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron (1907) and Georg Frobenius (1912), asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.

<span class="mw-page-title-main">Stirling numbers of the second kind</span> Numbers parameterizing ways to partition a set

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions. They are named after James Stirling.

In mathematics, in the field of functional analysis, the Cotlar–Stein almost orthogonality lemma is named after mathematicians Mischa Cotlar and Elias Stein. It may be used to obtain information on the operator norm on an operator, acting from one Hilbert space into another when the operator can be decomposed into almost orthogonal pieces. The original version of this lemma (for self-adjoint and mutually commuting operators) was proved by Mischa Cotlar in 1955 and allowed him to conclude that the Hilbert transform is a continuous linear operator in without using the Fourier transform. A more general version was proved by Elias Stein.

In mathematics, the Loomis–Whitney inequality is a result in geometry, which in its simplest form, allows one to estimate the "size" of a -dimensional set by the sizes of its -dimensional projections. The inequality has applications in incidence geometry, the study of so-called "lattice animals", and other areas.

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while more complex indices include the information entropy.

In mathematics, the Grothendieck inequality states that there is a universal constant with the following property. If Mij is an n × n matrix with

In mathematics, the Schuette–Nesbitt formula is a generalization of the inclusion–exclusion principle. It is named after Donald R. Schuette and Cecil J. Nesbitt.

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra . It can be used to relate an invariant ƒ to the invariant Ωƒ, where Ω is Cayley's Ω process.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

<span class="mw-page-title-main">Variation of information</span> Measure of distance between two clusterings related to mutual information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings. It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

The bid–ask matrix is a matrix with elements corresponding with exchange rates between the assets. These rates are in physical units and not with respect to any numeraire. The element of the matrix is the number of units of asset which can be exchanged for 1 unit of asset .

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry represents the rating of movie by customer , if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.

In mathematics, a fusion frame of a vector space is a natural extension of a frame. It is an additive construct of several, potentially "overlapping" frames. The motivation for this concept comes from the event that a signal can not be acquired by a single sensor alone, rather the partial components of the signal must be collected via a network of sensors, and the partial signal representations are then fused into the complete signal.

<i>X</i> + <i>Y</i> sorting Problem of sorting pairs of numbers by their sum

In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.

References

  1. 1 2 3 W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association . American Statistical Association. 66 (336): 846–850. doi:10.2307/2284239. JSTOR   2284239.
  2. 1 2 Lawrence Hubert and Phipps Arabie (1985). "Comparing partitions". Journal of Classification. 2 (1): 193–218. doi:10.1007/BF01908075.
  3. Nguyen Xuan Vinh, Julien Epps and James Bailey (2009). "Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?" (PDF). ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 1073–1080. PDF.
  4. Alexander J Gates and Yong-Yeol Ahn (2017). "The Impact of Random Models on Clustering Similarity" (PDF). Journal of Machine Learning Research. 18: 1–28.
  5. "Comparing Clusterings - An Overview" (PDF).