Rand index

Last updated
Example clusterings for a dataset with the kMeans (left) and Mean shift (right) algorithms. The calculated Adjusted Rand index for these two clusterings is
A
R
I
[?]
0.94
{\displaystyle ARI\approx 0.94} Example for Adjusted Rand index.svg
Example clusterings for a dataset with the kMeans (left) and Mean shift (right) algorithms. The calculated Adjusted Rand index for these two clusterings is

The Rand index [1] or Rand measure (named after William M. Rand) in statistics, and in particular in data clustering, is a measure of the similarity between two data clusterings. A form of the Rand index may be defined that is adjusted for the chance grouping of elements, this is the adjusted Rand index. The Rand index is the accuracy of determining if a link belongs within a cluster or not.

Contents

Rand index

Definition

Given a set of elements and two partitions of to compare, , a partition of S into r subsets, and , a partition of S into s subsets, define the following:

The Rand index, , is: [1] [2]

Intuitively, can be considered as the number of agreements between and and as the number of disagreements between and .

Since the denominator is the total number of pairs, the Rand index represents the frequency of occurrence of agreements over the total pairs, or the probability that and will agree on a randomly chosen pair.

is calculated as .

Similarly, one can also view the Rand index as a measure of the percentage of correct decisions made by the algorithm. It can be computed using the following formula:

where is the number of true positives, is the number of true negatives, is the number of false positives, and is the number of false negatives.

Properties

The Rand index has a value between 0 and 1, with 0 indicating that the two data clusterings do not agree on any pair of points and 1 indicating that the data clusterings are exactly the same.

In mathematical terms, a, b, c, d are defined as follows:

for some

Relationship with classification accuracy

The Rand index can also be viewed through the prism of binary classification accuracy over the pairs of elements in . The two class labels are " and are in the same subset in and " and " and are in different subsets in and ".

In that setting, is the number of pairs correctly labeled as belonging to the same subset (true positives), and is the number of pairs correctly labeled as belonging to different subsets (true negatives).

Adjusted Rand index

The adjusted Rand index is the corrected-for-chance version of the Rand index. [1] [2] [3] Such a correction for chance establishes a baseline by using the expected similarity of all pair-wise comparisons between clusterings specified by a random model. Traditionally, the Rand Index was corrected using the Permutation Model for clusterings (the number and size of clusters within a clustering are fixed, and all random clusterings are generated by shuffling the elements between the fixed clusters). However, the premises of the permutation model are frequently violated; in many clustering scenarios, either the number of clusters or the size distribution of those clusters vary drastically. For example, consider that in K-means the number of clusters is fixed by the practitioner, but the sizes of those clusters are inferred from the data. Variations of the adjusted Rand Index account for different models of random clusterings. [4]

Though the Rand Index may only yield a value between 0 and +1, the adjusted Rand index can yield negative values if the index is less than the expected index. [5]

The contingency table

Given a set S of n elements, and two groupings or partitions (e.g. clusterings) of these elements, namely and , the overlap between X and Y can be summarized in a contingency table where each entry denotes the number of objects in common between and  : .

Definition

The original Adjusted Rand Index using the Permutation Model is

where are values from the contingency table.

See also

Related Research Articles

<span class="mw-page-title-main">Binomial coefficient</span> Number of subsets of a given size

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers nk ≥ 0 and is written It is the coefficient of the xk term in the polynomial expansion of the binomial power (1 + x)n; this coefficient can be computed by the multiplicative formula

In linear algebra, the permanent of a square matrix is a function of the matrix similar to the determinant. The permanent, as well as the determinant, is a polynomial in the entries of the matrix. Both are special cases of a more general function of a matrix called the immanant.

In linear algebra, a minor of a matrix A is the determinant of some smaller square matrix generated from A by removing one or more of its rows and columns. Minors obtained by removing just one row and one column from square matrices are required for calculating matrix cofactors, which are useful for computing both the determinant and inverse of square matrices. The requirement that the square matrix be smaller than the original matrix is often omitted in the definition.

<span class="mw-page-title-main">Inclusion–exclusion principle</span> Counting technique in combinatorics

In combinatorics, the inclusion–exclusion principle is a counting technique which generalizes the familiar method of obtaining the number of elements in the union of two finite sets; symbolically expressed as

In linear algebra and functional analysis, the partial trace is a generalization of the trace. Whereas the trace is a scalar-valued function on operators, the partial trace is an operator-valued function. The partial trace has applications in quantum information and decoherence which is relevant for quantum measurement and thereby to the decoherent approaches to interpretations of quantum mechanics, including consistent histories and the relative state interpretation.

In linear algebra, a nilpotent matrix is a square matrix N such that

In matrix theory, the Perron–Frobenius theorem, proved by Oskar Perron and Georg Frobenius, asserts that a real square matrix with positive entries has a unique eigenvalue of largest magnitude and that eigenvalue is real. The corresponding eigenvector can be chosen to have strictly positive components, and also asserts a similar statement for certain classes of nonnegative matrices. This theorem has important applications to probability theory ; to the theory of dynamical systems ; to economics ; to demography ; to social networks ; to Internet search engines (PageRank); and even to ranking of American football teams. The first to discuss the ordering of players within tournaments using Perron–Frobenius eigenvectors is Edmund Landau.

<span class="mw-page-title-main">Stirling numbers of the second kind</span> Numbers parameterizing ways to partition a set

In mathematics, particularly in combinatorics, a Stirling number of the second kind is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by or . Stirling numbers of the second kind occur in the field of mathematics called combinatorics and the study of partitions. They are named after James Stirling.

In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical dependence based on the τ coefficient. It is a measure of rank correlation: the similarity of the orderings of the data when ranked by each of the quantities. It is named after Maurice Kendall, who developed it in 1938, though Gustav Fechner had proposed a similar measure in the context of time series in 1897.

The Cotlar–Stein almost orthogonality lemma is a mathematical lemma in the field of functional analysis. It may be used to obtain information on the operator norm on an operator, acting from one Hilbert space into another, when the operator can be decomposed into almost orthogonal pieces.

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions. Examples include the variation ratio or the information entropy.

In mathematics, the Grothendieck inequality states that there is a universal constant with the following property. If Mij is an n × n matrix with

In extremal graph theory, the forbidden subgraph problem is the following problem: given a graph , find the maximal number of edges an -vertex graph can have such that it does not have a subgraph isomorphic to . In this context, is called a forbidden subgraph.

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra . It can be used to relate an invariant ƒ to the invariant Ωƒ, where Ω is Cayley's Ω process.

Rossmo's formula is a geographic profiling formula to predict where a serial criminal lives. It relies upon the tendency of criminals to not commit crimes near places where they might be recognized, but also to not travel excessively long distances. The formula was developed and patented in 1996 by criminologist Kim Rossmo and integrated into a specialized crime analysis software product called Rigel. The Rigel product is developed by the software company Environmental Criminology Research Inc. (ECRI), which Rossmo co-founded.

<span class="mw-page-title-main">Variation of information</span> Measure of distance between two clusterings related to mutual information

In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings. It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information. Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.

The bid–ask matrix is a matrix with elements corresponding with exchange rates between the assets. These rates are in physical units and not with respect to any numeraire. The element of the matrix is the number of units of asset which can be exchanged for 1 unit of asset .

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry represents the rating of movie by customer , if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.

In mathematics, a fusion frame of a vector space is a natural extension of a frame. It is an additive construct of several, potentially "overlapping" frames. The motivation for this concept comes from the event that a signal can not be acquired by a single sensor alone, rather the partial components of the signal must be collected via a network of sensors, and the partial signal representations are then fused into the complete signal.

<i>X</i> + <i>Y</i> sorting Problem of sorting pairs of numbers by their sum

In computer science, sorting is the problem of sorting pairs of numbers by their sums. Applications of the problem include transit fare minimisation, VLSI design, and sparse polynomial multiplication. As with comparison sorting and integer sorting more generally, algorithms for this problem can be based only on comparisons of these sums, or on other operations that work only when the inputs are small integers.

References

  1. 1 2 3 W. M. Rand (1971). "Objective criteria for the evaluation of clustering methods". Journal of the American Statistical Association . 66 (336). American Statistical Association: 846–850. doi:10.2307/2284239. JSTOR   2284239.
  2. 1 2 Lawrence Hubert and Phipps Arabie (1985). "Comparing partitions". Journal of Classification. 2 (1): 193–218. doi:10.1007/BF01908075.
  3. Nguyen Xuan Vinh, Julien Epps and James Bailey (2009). "Information Theoretic Measures for Clustering Comparison: Is a Correction for Chance Necessary?" (PDF). ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. ACM. pp. 1073–1080. PDF.
  4. Alexander J Gates and Yong-Yeol Ahn (2017). "The Impact of Random Models on Clustering Similarity" (PDF). Journal of Machine Learning Research. 18: 1–28. arXiv: 1701.06508 .
  5. "Comparing Clusterings - An Overview" (PDF).