Median trick

Last updated January 31, 2025

The median trick is a generic approach that increases the chances of a probabilistic algorithm to succeed.^[1] Apparently first used in 1986^[2] by Jerrum et al.^[3] for approximate counting algorithms, the technique was later applied to a broad selection of classification and regression problems.^[2]

The idea of median trick is very simple: run the randomized algorithm with numeric output multiple times, and use the median of the obtained results as a final answer. For example, for sublinear in time algorithms the same algorithm can be run repeatedly (or in parallel) over random subsets of input data, and, per Chernoff inequality, the median of the results will converge to solution very fast.^[4] For the algorithms that are sublinear in space (e.g., counting the distinct elements of a stream), different randomizations of the algorithm (say, with different hash functions) should be used for repeated runs over the same data.^[5]

Statement

Given a set of independent random variables ${\textstyle X_{1},\dots ,X_{n}}$ , and an unknown deterministic number ${\textstyle Y}$ .

Suppose that each random variable ${\textstyle X_{i}}$ falls within ${\textstyle [Y\pm \epsilon ]}$ with probability ${\textstyle \geq p}$ where ${\textstyle p>1/2}$ is a constant, then the median trick states that ${\textstyle Med(X_{i})\in [Y\pm \epsilon ]}$ with probability ${\textstyle \geq 1-e^{-2n(p-1/2)^{2}}}$ .

In other words, in order to ensure that ${\textstyle Y\in [Med(X_{i})\pm \epsilon ]}$ with probability ${\textstyle \geq 1-\delta }$ , it suffices to use ${\textstyle {\frac {\ln {\frac {1}{\delta }}}{2(p-1/2)^{2}}}}$ samples.

Proof

Let ${\textstyle Z_{i}}$ be the indicator variable for the event that ${\textstyle X_{i}\in [Y\pm \epsilon ]}$ . Then, the event ${\textstyle Med(X_{i})\in [Y\pm \epsilon ]}$ fails to occur only if at least half of ${\textstyle Z_{i}=1}$ , that is, ${\textstyle {\frac {1}{n}}\sum _{i}Z_{i}\leq 1/2}$ .

By Hoeffding's inequality, this event occurs with probability ${\textstyle \leq e^{-2n(p-1/2)^{2}}}$ .

Related Research Articles

A binary symmetric channel is a common communications channel model used in coding theory and information theory. In this model, a transmitter wishes to send a bit, and the receiver will receive a bit. The bit will be "flipped" with a "crossover probability" of p, and otherwise is received correctly. This model can be applied to varied communication channels such as telephone lines or disk drive storage.

In probability theory and statistics, the chi-squared distribution with $degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.$

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in $with a density.$

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

In extremal graph theory, Szemerédi’s regularity lemma states that a graph can be partitioned into a bounded number of parts so that the edges between parts are regular. The lemma shows that certain properties of random graphs can be applied to dense graphs like counting the copies of a given subgraph within graphs. Endre Szemerédi proved the lemma over bipartite graphs for his theorem on arithmetic progressions in 1975 and for general graphs in 1978. Variants of the lemma use different notions of regularity and apply to other mathematical objects like hypergraphs.

In network theory, a giant component is a connected component of a given random graph that contains a significant fraction of the entire graph's vertices.

In the mathematical discipline of graph theory, the expander walk sampling theorem intuitively states that sampling vertices in an expander graph by doing relatively short random walk can simulate sampling the vertices independently from a uniform distribution. The earliest version of this theorem is due to Ajtai, Komlós & Szemerédi (1987), and the more general version is typically attributed to Gillman (1998).

In mathematics, the Brunn–Minkowski theorem is an inequality relating the volumes of compact subsets of Euclidean space. The original version of the Brunn–Minkowski theorem applied to convex sets; the generalization to compact nonconvex sets stated here is due to Lazar Lyusternik (1935).

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In graph theory, a random geometric graph (RGG) is the mathematically simplest spatial network, namely an undirected graph constructed by randomly placing N nodes in some metric space and connecting two nodes by a link if and only if their distance is in a given range, e.g. smaller than a certain neighborhood radius, r.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of the distinct elements of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, but can only approximate the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of > 10⁹ with a typical accuracy (standard error) of 2%, using 1.5 kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm, itself deriving from the 1984 Flajolet–Martin algorithm.

In mathematics, Kingman's subadditive ergodic theorem is one of several ergodic theorems. It can be seen as a generalization of Birkhoff's ergodic theorem. Intuitively, the subadditive ergodic theorem is a kind of random variable version of Fekete's lemma. As a result, it can be rephrased in the language of probability, e.g. using a sequence of random variables and expected values. The theorem is named after John Kingman.

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

The multiplicative weights update method is an algorithmic technique most commonly used for decision making and prediction, and also widely deployed in game theory and algorithm design. The simplest use case is the problem of prediction from expert advice, in which a decision maker needs to iteratively decide on an expert whose advice to follow. The method assigns initial weights to the experts, and updates these weights multiplicatively and iteratively according to the feedback of how well an expert performed: reducing it in case of poor performance, and increasing it otherwise. It was discovered repeatedly in very diverse fields such as machine learning, optimization, theoretical computer science, and game theory.

References

↑ Kogler & Traxler 2017, p. 378.
1 2 Kogler & Traxler 2017, p. 380.
↑ Jerrum, Valiant & Vazirani 1986, p. 182, Lemma 6.1.
↑ Wang & Han 2015, p. 11.
↑ Wang & Han 2015, pp. 17–18, Median Trick in Boosting Confidence.

Sources

Kogler, Alexander; Traxler, Patrick (2017). "Parallel and Robust Empirical Risk Minimization via the Median Trick". Mathematical Aspects of Computer and Information Sciences. Cham: Springer International Publishing. doi:10.1007/978-3-319-72453-9_31. ISBN 978-3-319-72452-2. ISSN 0302-9743.
Jerrum, Mark R.; Valiant, Leslie G.; Vazirani, Vijay V. (1986). "Random generation of combinatorial structures from a uniform distribution". Theoretical Computer Science. 43. Elsevier BV: 169–188. doi:10.1016/0304-3975(86)90174-x. ISSN 0304-3975.
Wang, Dan; Han, Zhu (2015). "Basics for Sublinear Algorithms". Sublinear Algorithms for Big Data Applications. Cham: Springer International Publishing. doi:10.1007/978-3-319-20448-2_2. ISBN 978-3-319-20447-5. ISSN 2191-5768.

This algorithms or data structures-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[FOOTNOTEKoglerTraxler2017378-1] Kogler & Traxler 2017, p. 378.

[FOOTNOTEKoglerTraxler2017380-2] 1 2 Kogler & Traxler 2017, p. 380.

[FOOTNOTEJerrumValiantVazirani1986182Lemma_6.1-3] Jerrum, Valiant & Vazirani 1986, p. 182, Lemma 6.1.

[FOOTNOTEWangHan201511-4] Wang & Han 2015, p. 11.

[FOOTNOTEWangHan201517–18Median_Trick_in_Boosting_Confidence-5] Wang & Han 2015, pp. 17–18, Median Trick in Boosting Confidence.

[1]

[2]

[3]

[4]

[5]

Median trick

Contents

Statement

Related Research Articles

References

Sources