Yao's principle

Last updated

In computational complexity theory, Yao's principle (also called Yao's minimax principle or Yao's lemma) relates the performance of randomized algorithms to deterministic (non-random) algorithms. It states that, for certain classes of algorithms, and certain measures of the performance of the algorithms, the following two quantities are equal:

Contents

Yao's principle is often used to prove limitations on the performance of randomized algorithms, by finding a probability distribution on inputs that is difficult for deterministic algorithms, and inferring that randomized algorithms have the same limitation on their worst case performance. [1]

This principle is named after Andrew Yao, who first proposed it in a 1977 paper. [2] It is closely related to the minimax theorem in the theory of zero-sum games, and to the duality theory of linear programs.

Formulation

Consider any cost measure of an algorithm on an input , such as its running time, for which we want to study the expected value over randomized algorithms and random inputs. Consider, also, any finite set of deterministic algorithms (made finite, for instance, by limiting the algorithms to a specific input size), and a finite set of inputs to these algorithms. Let denote the class of randomized algorithms obtained from probability distributions over the deterministic behaviors in , and let denote the class of probability distributions on inputs in . Then, Yao's principle states that: [1]

Here, finiteness of and allows and to be interpreted as simplices of probability vectors, [3] whose compactness implies that the minima and maxima in these formulas exist. [4]

A weaker form of the same principle, as an inequality rather than an equality, converts any hard input distribution for deterministic algorithms into a lower bound on the cost of all randomized algorithms. If is any specific choice of a hard input distribution, and is any specific randomized algorithm in , then [1] That is, the best possible deterministic performance against distribution is a lower bound for the performance of any randomized algorithm against its worst-case input. One may also observe this weaker version of Yao's principle directly, as by linearity of expectation and the principle that for any distribution. By avoiding maximization and minimization over and , this version of Yao's principle can apply in some cases where or are not finite. [5]

Applications and examples

Time complexity

When the cost denotes the running time of an algorithm, Yao's principle states that the best possible running time of a deterministic algorithm, on a hard input distribution, gives a lower bound for the expected time of any Las Vegas algorithm on its worst-case input. Here, a Las Vegas algorithm is a randomized algorithm whose runtime may vary, but for which the result is always correct. [6] [7] For example, this form of Yao's principle has been used to prove the optimality of certain Monte Carlo tree search algorithms for the exact evaluation of game trees. [7]

Comparisons

The time complexity of comparison-based sorting and selection algorithms is often studied using the number of comparisons between pairs of data elements as a proxy for the total time. For these problems, for any fixed number of elements, an input can be expressed as a permutation and a deterministic algorithm can be expressed as a decision tree, both finitely numbered as Yao's principle requires. An averaging argument identifies the hardest input distributions: they are the random permutations, the distributions on distinct elements for which all permutations are equally likely. This is because, if any other distribution were hardest, averaging it with all permutations of the same hard distribution would be equally hard, and would produce a random permutation. Yao's principle extends lower bounds for the average case number of comparisons made by deterministic algorithms, for random permutations, to the worst case analysis of randomized comparison algorithms. [2]

An example given by Yao is the analysis of algorithms for finding the th largest of a given set of values, the selection problem. [2] Subsequently to Yao's work, Walter Cunto and Ian Munro showed that, for random permutations, any deterministic algorithm must perform at least expected comparisons. [8] By Yao's principle, the same number of comparisons must be made by randomized algorithms on their worst-case input. [9] The Floyd–Rivest algorithm comes within comparisons of this bound. [10]

Evasiveness of graph properties

Another of the original applications by Yao of his principle was to the evasiveness of graph properties, the number of tests of the adjacency of pairs of vertices needed to determine whether a graph has a given property, when the only access to the graph is through such tests. [2] Richard M. Karp conjectured that for every nontrivial monotone graph property (true for every subgraph of a graph with the property), a randomized algorithm requires a quadratic number of tests, but only weaker bounds are known. [11]

As Yao stated, for graph properties that are true of the empty graph but false for some other graph on vertices with only a bounded number of edges, a randomized algorithm must probe a quadratic number of pairs of vertices. For instance, for the property of being a planar graph, because the 9-edge utility graph is non-planar. More precisely, Yao states that for these properties, at least tests are needed, for every , for a randomized algorithm to have probability at most of making a mistake. Yao also used this method to show that quadratically many queries are needed for the properties of containing a given tree or clique as a subgraph, of containing a perfect matching, and of containing a Hamiltonian cycle, for small enough constant error probabilities. [2]

Black-box optimization

In black-box optimization, the problem is to determine the minimum or maximum value of a function, from a given class of functions, accessible only through calls to the function on arguments from some finite domain. In this case, the cost to be optimized is the number of calls. Yao's principle has been described as "the only method available for proving lower bounds for all randomized search heuristics for selected classes of problems". [12] Results that can be proven in this way include the following:

Communication complexity

In communication complexity, an algorithm describes a communication protocol between two or more parties, and its cost may be the number of bits or messages transmitted between the parties. In this case, Yao's principle describes an equality between the average-case complexity of deterministic communication protocols, on an input distribution that is the worst case for the problem, and the expected communication complexity of randomized protocols on their worst-case inputs. [13] [14]

Online algorithms

Yao's principle has also been applied to the competitive ratio of online algorithms. An online algorithm must respond to a sequence of requests, without knowledge of future requests, incurring some cost or profit per request depending on its choices. The competitive ratio is the ratio of its cost or profit to the value that could be achieved achieved by an offline algorithm with access to knowledge of all future requests, for a worst-case request sequence that causes this ratio to be as far from one as possible. Here, one must be careful to formulate the ratio with the algorithm's performance in the numerator and the optimal performance of an offline algorithm in the denominator, so that the cost measure can be formulated as an expected value rather than as the reciprocal of an expected value. [5]

An example given by Borodin & El-Yaniv (2005) concerns page replacement algorithms, which respond to requests for pages of computer memory by using a cache of pages, for a given parameter . If a request matches a cached page, it costs nothing; otherwise one of the cached pages must be replaced by the requested page, at a cost of one page fault. A difficult distribution of request sequences for this model can be generated by choosing each request uniformly at random from a pool of pages. Any deterministic online algorith has expected page faults, over requests. Instead, an offline algorithm can divide the request sequence into phases within which only pages are used, incurring only one fault at the start of a phase to replace the one page that is unused within the phase. As an instance of the coupon collector's problem, the expected requests per phase is , where is the th harmonic number. By renewal theory, the offline algorithm incurs page faults with high probability, so the competitive ratio of any deterministic algorithm against this input distribution is at least . By Yao's principle, also lower bounds the competitive ratio of any randomized page replacement algorithm against a request sequence chosen by an oblivious adversary to be a worst case for the algorithm but without knowledge of the algorithm's random choices. [15]

Relation to game theory and linear programming

Yao's principle may be interpreted in game theoretic terms, via a two-player zero-sum game in which one player, Alice, selects a deterministic algorithm, the other player, Bob, selects an input, and the payoff is the cost of the selected algorithm on the selected input. Any randomized algorithm may be interpreted as a randomized choice among deterministic algorithms, and thus as a mixed strategy for Alice. Similarly, a non-random algorithm may be thought of as a pure strategy for Alice. In any two-player zero-sum game, if one player chooses a mixed strategy, then the other player has an optimal pure strategy against it. By the minimax theorem of John von Neumann, there exists a game value , and mixed strategies for each player, such that the players can guarantee expected value or better by playing those strategies, and such that the optimal pure strategy against either mixed strategy produces expected value exactly . Thus, the minimax mixed strategy for Alice, set against the best opposing pure strategy for Bob, produces the same expected game value as the minimax mixed strategy for Bob, set against the best opposing pure strategy for Alice. This equality of expected game values, for the game described above, is Yao's principle in its form as an equality. [5] Yao's 1977 paper, originally formulating Yao's principle, proved it in this way. [2]

The optimal mixed strategy for Alice (a randomized algorithm) and the optimal mixed strategy for Bob (a hard input distribution) may each be computed using a linear program that has one player's probabilities as its variables, with a constraint on the game value for each choice of the other player. The two linear programs obtained in this way for each player are dual linear programs, whose equality is an instance of linear programming duality. [3] However, although linear programs may be solved in polynomial time, the numbers of variables and constraints in these linear programs (numbers of possible algorithms and inputs) are typically too large to list explicitly. Therefore, formulating and solving these programs to find these optimal strategies is often impractical. [13] [12]

Related Research Articles

In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties each receive a -bit string and . The goal is for Alice to compute the value of a certain function, , that depends on both and , with the least amount of communication between them.

The Miller–Rabin primality test or Rabin–Miller primality test is a probabilistic primality test: an algorithm which determines whether a given number is likely to be prime, similar to the Fermat primality test and the Solovay–Strassen primality test.

<span class="mw-page-title-main">Algorithmic probability</span>

In algorithmic information theory, algorithmic probability, also known as Solomonoff probability, is a mathematical method of assigning a prior probability to a given observation. It was invented by Ray Solomonoff in the 1960s. It is used in inductive inference theory and analyses of algorithms. In his general theory of inductive inference, Solomonoff uses the method together with Bayes' rule to obtain probabilities of prediction for an algorithm's future outputs.

A randomized algorithm is an algorithm that employs a degree of randomness as part of its logic or procedure. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random determined by the random bits; thus either the running time, or the output are random variables.

In computing, a Las Vegas algorithm is a randomized algorithm that always gives correct results; that is, it always produces the correct result or it informs about the failure. However, the runtime of a Las Vegas algorithm differs depending on the input. The usual definition of a Las Vegas algorithm includes the restriction that the expected runtime be finite, where the expectation is carried out over the space of random information, or entropy, used in the algorithm. An alternative definition requires that a Las Vegas algorithm always terminates, but may output a symbol not part of the solution space to indicate failure in finding a solution. The nature of Las Vegas algorithms makes them suitable in situations where the number of possible solutions is limited, and where verifying the correctness of a candidate solution is relatively easy while finding a solution is complex.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

Empirical risk minimization is a principle in statistical learning theory which defines a family of learning algorithms based on evaluating performance over a known and fixed dataset. The core idea is based on an application of the law of large numbers; more specifically, we cannot know exactly how well a predictive algorithm will work in practice because we do not know the true distribution of the data, but we can instead estimate and optimize the performance of the algorithm on a known set of training data. The performance over the known set of training data is referred to as the "empirical risk".

In statistics, the k-nearest neighbors algorithm (k-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and regression. In both cases, the input consists of the k closest training examples in a data set. The output depends on whether k-NN is used for classification or regression:

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In computational complexity theory, the average-case complexity of an algorithm is the amount of some computational resource used by the algorithm, averaged over all possible inputs. It is frequently contrasted with worst-case complexity which considers the maximal complexity of the algorithm over all possible inputs.

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.

In computer science and graph theory, the Canadian traveller problem (CTP) is a generalization of the shortest path problem to graphs that are partially observable. In other words, a "traveller" on a given point on the graph cannot see the full graph, rather only adjacent nodes or a certain "realization restriction."

In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture is a group of related conjectures about the number of questions of the form "Is there an edge between vertex and vertex ?" that have to be answered to determine whether or not an undirected graph has a particular property such as planarity or bipartiteness. They are named after Stål Aanderaa, Richard M. Karp, and Arnold L. Rosenberg. According to the conjecture, for a wide class of properties, no algorithm can guarantee that it will be able to skip any questions: any algorithm for determining whether the graph has the property, no matter how clever, might need to examine every pair of vertices before it can give its answer. A property satisfying this conjecture is called evasive.

<span class="mw-page-title-main">Decision tree model</span> Model of computational complexity

In computational complexity theory, the decision tree model is the model of computation in which an algorithm can be considered to be a decision tree, i.e. a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

In graph theory, the metric k-center problem or vertex k-center problem is a classical combinatorial optimization problem studied in theoretical computer science that is NP-hard. Given n cities with specified distances, one wants to build k warehouses in different cities and minimize the maximum distance of a city to a warehouse. In graph theory, this means finding a set of k vertices for which the largest distance of any point to its closest vertex in the k-set is minimum. The vertices must be in a metric space, providing a complete graph that satisfies the triangle inequality. It has application in facility location and clustering.

In computer science and operations research, randomized rounding is a widely used approach for designing and analyzing approximation algorithms.

In theoretical computer science, the term isolation lemma refers to randomized algorithms that reduce the number of solutions to a problem to one, should a solution exist. This is achieved by constructing random constraints such that, with non-negligible probability, exactly one solution satisfies these additional constraints if the solution space is not empty. Isolation lemmas have important applications in computer science, such as the Valiant–Vazirani theorem and Toda's theorem in computational complexity theory.

Quantum refereed game in quantum information processing is a class of games in the general theory of quantum games. It is played between two players, Alice and Bob, and arbitrated by a referee. The referee outputs the pay-off for the players after interacting with them for a fixed number of rounds, while exchanging quantum information.

The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function.

In statistical decision theory, a randomised decision rule or mixed decision rule is a decision rule that associates probabilities with deterministic decision rules. In finite decision problems, randomised decision rules define a risk set which is the convex hull of the risk points of the nonrandomised decision rules.

References

  1. 1 2 3 Arora, Sanjeev; Barak, Boaz (2009), "Note 12.8: Yao's Min-Max Lemma", Computational Complexity: A Modern Approach, Cambridge University Press, p.  265, ISBN   9780511530753
  2. 1 2 3 4 5 6 Yao, Andrew (1977), "Probabilistic computations: Toward a unified measure of complexity", Proceedings of the 18th IEEE Symposium on Foundations of Computer Science (FOCS), pp. 222–227, doi:10.1109/SFCS.1977.24
  3. 1 2 Laraki, Rida; Renault, Jérôme; Sorin, Sylvain (2019), "2.3 The Minmax Theorem", Mathematical Foundations of Game Theory, Universitext, Springer, pp. 16–18, doi:10.1007/978-3-030-26646-2, ISBN   978-3-030-26646-2
  4. Bohnenblust, H. F.; Karlin, S.; Shapley, L. S. (1950), "Solutions of discrete, two-person games", in Kuhn, Harold W.; Tucker, Albert William (eds.), Contributions to the Theory of Games, Annals of Mathematics Studies, vol. 24, Princeton University Press, pp. 51–72, doi:10.1515/9781400881727-006, MR   0039218
  5. 1 2 3 Borodin, Allan; El-Yaniv, Ran (2005), "8.3 Yao's principle: A technique for obtaining lower bounds", Online Computation and Competitive Analysis, Cambridge University Press, pp. 115–120, ISBN   9780521619462
  6. Moore, Cristopher; Mertens, Stephan (2011), "Theorem 10.1 (Yao's principle)", The Nature of Computation, Oxford University Press, p. 471, ISBN   9780199233212
  7. 1 2 Motwani, Rajeev; Raghavan, Prabhakar (2010), "Chapter 12: Randomized Algorithms", in Atallah, Mikhail J.; Blanton, Marina (eds.), Algorithms and Theory of Computation Handbook: General Concepts and Techniques (2nd ed.), CRC Press, pp. 12-1 12-24; see in particular Section 12.5: The minimax principle and lower bounds, pp. 12-8 12-10
  8. Cunto, Walter; Munro, J. Ian (1989), "Average case selection", Journal of the ACM , 36 (2): 270–279, doi: 10.1145/62044.62047 , MR   1072421, S2CID   10947879
  9. Chan, Timothy M. (2010), "Comparison-based time-space lower bounds for selection", ACM Transactions on Algorithms , 6 (2): A26:1–A26:16, doi:10.1145/1721837.1721842, MR   2675693, S2CID   11742607
  10. Knuth, Donald E. (1998), "Section 5.3.3: Minimum-comparison selection", The Art of Computer Programming, Volume 3: Sorting and Searching (2nd ed.), Addison-Wesley, pp. 207–219, ISBN   0-201-89685-0
  11. Chakrabarti, Amit; Khot, Subhash (2007), "Improved lower bounds on the randomized complexity of graph properties", Random Structures & Algorithms, 30 (3): 427–440, doi:10.1002/rsa.20164, MR   2309625, S2CID   8384071
  12. 1 2 3 4 Wegener, Ingo (2005), "9.2 Yao's minimax principle", Complexity Theory: Exploring the Limits of Efficient Algorithms, Springer-Verlag, pp. 118–120, doi:10.1007/3-540-27477-4, ISBN   978-3-540-21045-0, MR   2146155
  13. 1 2 Fortnow, Lance (October 16, 2006), "Favorite theorems: Yao principle", Computational Complexity
  14. Wigderson, Avi (2019), Mathematics and Computation: A Theory Revolutionizing Technology and Science, Princeton University Press, p. 210, ISBN   9780691189130
  15. Borodin & El-Yaniv (2005), pp. 120–122, 8.4 Paging revisited.

Further reading