Median trick

Last updated

The median trick is a generic approach that increases the chances of a probabilistic algorithm to succeed. [1] Apparently first used in 1986 [2] by Jerrum et al. [3] for approximate counting algorithms, the technique was later applied to a broad selection of classification and regression problems. [2]

The idea of median trick is very simple: run the randomized algorithm with numeric output multiple times, and use the median of the obtained results as a final answer. For example, for sublinear in time algorithms the same algorithm can be run repeatedly (or in parallel) over random subsets of input data, and, per Chernoff inequality, the median of the results will converge to solution very fast. [4] For the algorithms that are sublinear in space (e.g., counting the distinct elements of a stream), different randomizations of the algorithm (say, with different hash functions) should be used for repeated runs over the same data. [5]

Related Research Articles

The #P-complete problems form a complexity class in computational complexity theory. The problems in this complexity class are defined by having the following two properties:

<span class="mw-page-title-main">Component (graph theory)</span> Maximal subgraph whose vertices can reach each other

In graph theory, a component of an undirected graph is a connected subgraph that is not part of any larger connected subgraph. The components of any graph partition its vertices into disjoint sets, and are the induced subgraphs of those sets. A graph that is itself connected has exactly one component, consisting of the whole graph. Components are sometimes called connected components.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

The Gödel Prize is an annual prize for outstanding papers in the area of theoretical computer science, given jointly by the European Association for Theoretical Computer Science (EATCS) and the Association for Computing Machinery Special Interest Group on Algorithms and Computational Theory. The award is named in honor of Kurt Gödel. Gödel's connection to theoretical computer science is that he was the first to mention the "P versus NP" question, in a 1956 letter to John von Neumann in which Gödel asked whether a certain NP-complete problem could be solved in quadratic or linear time.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

The Fulkerson Prize for outstanding papers in the area of discrete mathematics is sponsored jointly by the Mathematical Optimization Society (MOS) and the American Mathematical Society (AMS). Up to three awards of $1,500 each are presented at each (triennial) International Symposium of the MOS. Originally, the prizes were paid out of a memorial fund administered by the AMS that was established by friends of the late Delbert Ray Fulkerson to encourage mathematical excellence in the fields of research exemplified by his work. The prizes are now funded by an endowment administered by MPS.

The Valiant–Vazirani theorem is a theorem in computational complexity theory stating that if there is a polynomial time algorithm for Unambiguous-SAT, then NP = RP. It was proven by Leslie Valiant and Vijay Vazirani in their paper titled NP is as easy as detecting unique solutions published in 1986. The proof is based on the Mulmuley–Vazirani–Vazirani isolation lemma, which was subsequently used for a number of important applications in theoretical computer science.

<span class="mw-page-title-main">Shlomi Dolev</span>

Shlomi Dolev is a Rita Altura Trust Chair Professor in Computer Science at Ben-Gurion University of the Negev (BGU) and the head of the BGU Negev Hi-Tech Faculty Startup Accelerator.

<span class="mw-page-title-main">Leslie Valiant</span> British American computer scientist

Leslie Gabriel Valiant is a British American computer scientist and computational theorist. He was born to a chemical engineer father and a translator mother. He is currently the T. Jefferson Coolidge Professor of Computer Science and Applied Mathematics at Harvard University. Valiant was awarded the Turing Award in 2010, having been described by the A.C.M. as a heroic figure in theoretical computer science and a role model for his courage and creativity in addressing some of the deepest unsolved problems in science; in particular for his "striking combination of depth and breadth".

<span class="mw-page-title-main">Vijay Vazirani</span> Indian American professor of computer science

Vijay Virkumar Vazirani is an Indian American distinguished professor of computer science in the Donald Bren School of Information and Computer Sciences at the University of California, Irvine.

Umesh Virkumar Vazirani is an Indian–American academic who is the Roger A. Strauch Professor of Electrical Engineering and Computer Science at the University of California, Berkeley, and the director of the Berkeley Quantum Computation Center. His research interests lie primarily in quantum computing. He is also a co-author of a textbook on algorithms.

<span class="mw-page-title-main">Structural complexity theory</span>

In computational complexity theory of computer science, the structural complexity theory or simply structural complexity is the study of complexity classes, rather than computational complexity of individual problems and algorithms. It involves the research of both internal structures of various complexity classes and the relations between different complexity classes.

In linear algebra, the computation of the permanent of a matrix is a problem that is thought to be more difficult than the computation of the determinant of a matrix despite the apparent similarity of the definitions.

In computer science, data stream clustering is defined as the clustering of data that arrive continuously such as telephone records, multimedia data, financial transactions etc. Data stream clustering is usually studied as a streaming algorithm and the objective is, given a sequence of points, to construct a good clustering of the stream, using a small amount of memory and time.

In theoretical computer science, the term isolation lemma refers to randomized algorithms that reduce the number of solutions to a problem to one, should a solution exist. This is achieved by constructing random constraints such that, with non-negligible probability, exactly one solution satisfies these additional constraints if the solution space is not empty. Isolation lemmas have important applications in computer science, such as the Valiant–Vazirani theorem and Toda's theorem in computational complexity theory.

In graph theory, a bipolar orientation or st-orientation of an undirected graph is an assignment of a direction to each edge that causes the graph to become a directed acyclic graph with a single source s and a single sink t, and an st-numbering of the graph is a topological ordering of the resulting directed acyclic graph.

The vertex k-center problem is a classical NP-hard problem in computer science. It has application in facility location and clustering. Basically, the vertex k-center problem models the following real problem: given a city with facilities, find the best facilities where to build fire stations. Since firemen must attend any emergency as quickly as possible, the distance from the farthest facility to its nearest fire station has to be as small as possible. In other words, the position of the fire stations must be such that every possible fire is attended as quickly as possible.

Kirkpatrick–Reisch sorting is a fast sorting algorithm for items with limited-size integer keys. It is notable for having an asymptotic time complexity that is better than radix sort.

Algorand is a cryptocurrency protocol providing proof-of-stake on a blockchain. Algorand's native cryptocurrency is called ALGO.

<span class="mw-page-title-main">Alexei Semenov (mathematician)</span> Russian mathematician

Alexei L. Semenov is a Russian mathematician, educationalist, Academician of the Russian Academy of Sciences, Academician of the Russian Academy of Education, Head of the Department of Mathematical Logic and Theory of Algorithms, Lomonosov State University, Professor, Dr. Sc.

References

  1. Kogler & Traxler 2017, p. 378.
  2. 1 2 Kogler & Traxler 2017, p. 380.
  3. Jerrum, Valiant & Vazirani 1986, p. 182, Lemma 6.1.
  4. Wang & Han 2015, p. 11.
  5. Wang & Han 2015, pp. 17–18, Median Trick in Boosting Confidence.

Sources