Decision tree model

Last updated

In computational complexity the decision tree model is the model of computation in which an algorithm is considered to be basically a decision tree, i.e., a sequence of queries or tests that are done adaptively, so the outcome of previous tests can influence the tests performed next.

Contents

Typically, these tests have a small number of outcomes (such as a yes–no question) and can be performed quickly (say, with unit computational cost), so the worst-case time complexity of an algorithm in the decision tree model corresponds to the depth of the corresponding decision tree. This notion of computational complexity of a problem or an algorithm in the decision tree model is called its decision tree complexity or query complexity.

Decision trees models are instrumental in establishing lower bounds for complexity theory for certain classes of computational problems and algorithms. Several variants of decision tree models have been introduced, depending on the computational model and type of query algorithms are allowed to perform.

For example, a decision tree argument is used to show that a comparison sort of items must take comparisons. For comparison sorts, a query is a comparison of two items , with two outcomes (assuming no items are equal): either or . Comparison sorts can be expressed as a decision tree in this model, since such sorting algorithms only perform these types of queries.

Comparison trees and lower bounds for sorting

Decision trees are often employed to understand algorithms for sorting and other similar problems; this was first done by Ford and Johnson. [1]

For example, many sorting algorithms are comparison sorts, which means that they only gain information about an input sequence via local comparisons: testing whether , , or . Assuming that the items to be sorted are all distinct and comparable, this can be rephrased as a yes-or-no question: is ?

These algorithms can be modeled as binary decision trees, where the queries are comparisons: an internal node corresponds to a query, and the node's children correspond to the next query when the answer to the question is yes or no. For leaf nodes, the output corresponds to a permutation that describes how the input sequence was scrambled from the fully ordered list of items. (The inverse of this permutation, , re-orders the input sequence.)

One can show that comparison sorts must use comparisons through a simple argument: for an algorithm to be correct, it must be able to output every possible permutation of elements; otherwise, the algorithm would fail for that particular permutation as input. So, its corresponding decision tree must have at least as many leaves as permutations: leaves. Any binary tree with at least leaves has depth at least , so this is a lower bound on the run time of a comparison sorting algorithm. In this case, the existence of numerous comparison-sorting algorithms having this time complexity, such as mergesort and heapsort, demonstrates that the bound is tight. [2] :91

This argument does not use anything about the type of query, so it in fact proves a lower bound for any sorting algorithm that can be modeled as a binary decision tree. In essence, this is a rephrasing of the information-theoretic argument that a correct sorting algorithm must learn at least bits of information about the input sequence. As a result, this also works for randomized decision trees as well.

Other decision tree lower bounds do use that the query is a comparison. For example, consider the task of only using comparisons to find the smallest number among numbers. Before the smallest number can be determined, every number except the smallest must "lose" (compare greater) in at least one comparison. So, it takes at least comparisons to find the minimum. (The information-theoretic argument here only gives a lower bound of .) A similar argument works for general lower bounds for computing order statistics. [2] :214

Linear and algebraic decision trees

Linear decision trees generalize the above comparison decision trees to computing functions that take real vectors as input. The tests in linear decision trees are linear functions: for a particular choice of real numbers , output the sign of . (Algorithms in this model can only depend on the sign of the output.) Comparison trees are linear decision trees, because the comparison between and corresponds to the linear function . From its definition, linear decision trees can only specify functions whose fibers can be constructed by taking unions and intersections of half-spaces.

Algebraic decision trees are a generalization of linear decision trees that allow the test functions to be polynomials of degree . Geometrically, the space is divided into semi-algebraic sets (a generalization of hyperplane).

These decision tree models, defined by Rabin [3] and Reingold, [4] are often used for proving lower bounds in computational geometry. [5] For example, Ben-Or showed that element uniqueness (the task of computing , where is 0 if and only if there exist distinct coordinates such that ) requires an algebraic decision tree of depth . [6] This was first showed for linear decision models by Dobkin and Lipton. [7] They also show a lower bound for linear decision trees on the knapsack problem, generalized to algebraic decision trees by Steele and Yao. [8]

Boolean decision tree complexities

For Boolean decision trees, the task is to compute the value of an n-bit Boolean function for an input . The queries correspond to reading a bit of the input, , and the output is . Each query may be dependent on previous queries. There are many types of computational models using decision trees that could be considered, admitting multiple complexity notions, called complexity measures.

Deterministic decision tree

If the output of a decision tree is , for all , the decision tree is said to "compute" . The depth of a tree is the maximum number of queries that can happen before a leaf is reached and a result obtained. , the deterministic decision tree complexity of is the smallest depth among all deterministic decision trees that compute .

Randomized decision tree

One way to define a randomized decision tree is to add additional nodes to the tree, each controlled by a probability . Another equivalent definition is to define it as a distribution over deterministic decision trees. Based on this second definition, the complexity of the randomized tree is defined as the largest depth among all the trees in the support of the underlying distribution. is defined as the complexity of the lowest-depth randomized decision tree whose result is with probability at least for all (i.e., with bounded 2-sided error).

is known as the Monte Carlo randomized decision-tree complexity, because the result is allowed to be incorrect with bounded two-sided error. The Las Vegas decision-tree complexity measures the expected depth of a decision tree that must be correct (i.e., has zero-error). There is also a one-sided bounded-error version which is denoted by .

Nondeterministic decision tree

The nondeterministic decision tree complexity of a function is known more commonly as the certificate complexity of that function. It measures the number of input bits that a nondeterministic algorithm would need to look at in order to evaluate the function with certainty.

Formally, the certificate complexity of at is the size of the smallest subset of indices such that, for all , if for all , then . The certificate complexity of is the maximum certificate complexity over all . The analogous notion where one only requires the verifier to be correct with 2/3 probability is denoted .

Quantum decision tree

The quantum decision tree complexity is the depth of the lowest-depth quantum decision tree that gives the result with probability at least for all . Another quantity, , is defined as the depth of the lowest-depth quantum decision tree that gives the result with probability 1 in all cases (i.e. computes exactly). and are more commonly known as quantum query complexities, because the direct definition of a quantum decision tree is more complicated than in the classical case. Similar to the randomized case, we define and .

These notions are typically bounded by the notions of degree and approximate degree. The degree of , denoted , is the smallest degree of any polynomial satisfying for all . The approximate degree of , denoted , is the smallest degree of any polynomial satisfying whenever and whenever .

Beals et al. established that and . [9]

Relationships between boolean function complexity measures

It follows immediately from the definitions that for all -bit Boolean functions ,, and . Finding the best upper bounds in the converse direction is a major goal in the field of query complexity.

All of these types of query complexity are polynomially related. Blum and Impagliazzo, [10] Hartmanis and Hemachandra, [11] and Tardos [12] independently discovered that . Noam Nisan found that the Monte Carlo randomized decision tree complexity is also polynomially related to deterministic decision tree complexity: . [13] (Nisan also showed that .) A tighter relationship is known between the Monte Carlo and Las Vegas models: . [14] This relationship is optimal up to polylogarithmic factors. [15] As for quantum decision tree complexities, , and this bound is tight. [16] [15] Midrijanis showed that , [17] [18] improving a quartic bound due to Beals et al. [9]

It is important to note that these polynomial relationships are valid only for total Boolean functions. For partial Boolean functions, that have a domain a subset of , an exponential separation between and is possible; the first example of such a problem was discovered by Deutsch and Jozsa.

Sensitivity conjecture

For a Boolean function , the sensitivity of is defined to be the maximum sensitivity of over all , where the sensitivity of at is the number of single-bit changes in that change the value of . Sensitivity is related to the notion of total influence from the analysis of Boolean functions, which is equal to average sensitivity over all .

The sensitivity conjecture is the conjecture that sensitivity is polynomially related to query complexity; that is, there exists exponent such that, for all , and . One can show through a simple argument that , so the conjecture is specifically concerned about finding a lower bound for sensitivity. Since all of the previously-discussed complexity measures are polynomially related, the precise type of complexity measure is not relevant. However, this is typically phrased as the question of relating sensitivity with block sensitivity.

The block sensitivity of , denoted , is defined to be the maximum block sensitivity of over all . The block sensitivity of at is the maximum number of disjoint subsets such that, for any of the subsets , flipping the bits of corresponding to changes the value of . [13]

Since block sensitivity takes a maximum over more choices of subsets, . Further, block sensitivity is polynomially related to the previously discussed complexity measures; for example, Nisan's paper introducing block-sensitivity showed that . [13] So, one could rephrase the sensitivity conjecture as showing that, for some , . In 1992, Nisan and Szegedy conjectured that suffices. [19] This would be tight, as Rubinstein in 1995 showed a quadratic separation between sensitivity and block sensitivity. [20]

In July 2019, 27 years after the conjecture was initially posed, Hao Huang from Emory University proved the sensitivity conjecture, showing that . [21] This proof is notably succinct, proving this statement in two pages when prior progress towards the sensitivity conjecture had been limited. [22] [23]

Summary of known results

Best-known separations for complexity measures as of October 2020 [16]
22, 322, 32, 33, 62, 32, 344
1222, 32, 33, 62, 32, 33, 44
1122, 32, 33, 61.5, 32, 33, 44
111, 2222.22, 51.15, 31.63, 32, 42, 4
11111.5, 22, 41.15, 21.63, 222
111112, 41.15, 21.63, 222
1111111.15, 21.63, 222
11.33, 21.33, 322, 32, 33, 62, 32, 44
11.33, 21.33, 22222122
11122, 32, 33, 612, 34
1112222111

This table summarizes results on separations between Boolean function complexity measures. The complexity measures are, in order, deterministic, zero-error randomized, two-sided-error randomized, certificate, randomized certificate, block sensitivity, sensitivity, exact quantum, degree, quantum, and approximate degree complexities.

The number in the -th row and -th column denotes bounds on the exponent , which is the infimum of all satisfying for all boolean functions . For example, the entry in the D-th row and s-th column is "3, 6", so for all , and there exists a function such that .

See also

Related Research Articles

In computational complexity theory, the class NC (for "Nick's Class") is the set of decision problems decidable in polylogarithmic time on a parallel computer with a polynomial number of processors. In other words, a problem with input size n is in NC if there exist constants c and k such that it can be solved in time O((log n)c) using O(nk) parallel processors. Stephen Cook coined the name "Nick's class" after Nick Pippenger, who had done extensive research on circuits with polylogarithmic depth and polynomial size.

In computational complexity theory, the complexity class #P (pronounced "sharp P" or, sometimes "number P" or "hash P") is the set of the counting problems associated with the decision problems in the set NP. More formally, #P is the class of function problems of the form "compute f(x)", where f is the number of accepting paths of a nondeterministic Turing machine running in polynomial time. Unlike most well-known complexity classes, it is not a class of decision problems but a class of function problems. The most difficult, representative problems of this class are #P-complete.

In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties each receive a -bit string and . The goal is for Alice to compute the value of a certain function, , that depends on both and , with the least amount of communication between them.

<span class="mw-page-title-main">Time complexity</span> Estimate of time taken for running an algorithm

In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by the algorithm, supposing that each elementary operation takes a fixed amount of time to perform. Thus, the amount of time taken and the number of elementary operations performed by the algorithm are taken to be related by a constant factor.

In mathematics, a low-discrepancy sequence is a sequence with the property that for all values of N, its subsequence x1, ..., xN has a low discrepancy.

<span class="mw-page-title-main">Complexity class</span> Set of problems in computational complexity theory

In computational complexity theory, a complexity class is a set of computational problems "of related resource-based complexity". The two most commonly analyzed resources are time and memory.

In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a deterministic Turing machine to the Boolean satisfiability problem.

In computational complexity theory, an Arthur–Merlin protocol, introduced by Babai (1985), is an interactive proof system in which the verifier's coin tosses are constrained to be public. Goldwasser & Sipser (1986) proved that all (formal) languages with interactive proofs of arbitrary length with private coins also have interactive proofs with public coins.

MAX-3SAT is a problem in the computational complexity subfield of computer science. It generalises the Boolean satisfiability problem (SAT) which is a decision problem considered in complexity theory. It is defined as:

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

In theoretical computer science, the Aanderaa–Karp–Rosenberg conjecture is a group of related conjectures about the number of questions of the form "Is there an edge between vertex and vertex ?" that have to be answered to determine whether or not an undirected graph has a particular property such as planarity or bipartiteness. They are named after Stål Aanderaa, Richard M. Karp, and Arnold L. Rosenberg. According to the conjecture, for a wide class of properties, no algorithm can guarantee that it will be able to skip any questions: any algorithm for determining whether the graph has the property, no matter how clever, might need to examine every pair of vertices before it can give its answer. A property satisfying this conjecture is called evasive.

The exponential mechanism is a technique for designing differentially private algorithms. It was developed by Frank McSherry and Kunal Talwar in 2007. Their work was recognized as a co-winner of the 2009 PET Award for Outstanding Research in Privacy Enhancing Technologies.

Quantum complexity theory is the subfield of computational complexity theory that deals with complexity classes defined using quantum computers, a computational model based on quantum mechanics. It studies the hardness of computational problems in relation to these complexity classes, as well as the relationship between quantum complexity classes and classical complexity classes.

A locally decodable code (LDC) is an error-correcting code that allows a single bit of the original message to be decoded with high probability by only examining a small number of bits of a possibly corrupted codeword. This property could be useful, say, in a context where information is being transmitted over a noisy channel, and only a small subset of the data is required at a particular time and there is no need to decode the entire message at once. Note that locally decodable codes are not a subset of locally testable codes, though there is some overlap between the two.

<span class="mw-page-title-main">Group testing</span> Procedure that breaks up the task of identifying certain objects into tests on groups of items

In statistics and combinatorial mathematics, group testing is any procedure that breaks up the task of identifying certain objects into tests on groups of items, rather than on individual ones. First studied by Robert Dorfman in 1943, group testing is a relatively new field of applied mathematics that can be applied to a wide range of practical applications and is an active area of research today.

The distributional learning theory or learning of probability distribution is a framework in computational learning theory. It has been proposed from Michael Kearns, Yishay Mansour, Dana Ron, Ronitt Rubinfeld, Robert Schapire and Linda Sellie in 1994 and it was inspired from the PAC-framework introduced by Leslie Valiant.

<span class="mw-page-title-main">Error tolerance (PAC learning)</span>

In PAC learning, error tolerance refers to the ability of an algorithm to learn when the examples received have been corrupted in some way. In fact, this is a very common and important issue since in many applications it is not possible to access noise-free data. Noise can interfere with the learning process at different levels: the algorithm may receive data that have been occasionally mislabeled, or the inputs may have some false information, or the classification of the examples may have been maliciously adulterated.

In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.

In computer science, a retrieval data structure, also known as static function, is a space-efficient dictionary-like data type composed of a collection of pairs that allows the following operations:

Fixed-point computation refers to the process of computing an exact or approximate fixed point of a given function. In its most common form, we are given a function f that satisfies the condition to the Brouwer fixed-point theorem, that is: f is continuous and maps the unit d-cube to itself. The Brouwer fixed-point theorem guarantees that f has a fixed point, but the proof is not constructive. Various algorithms have been devised for computing an approximate fixed point. Such algorithms are used in economics for computing a market equilibrium, in game theory for computing a Nash equilibrium, and in dynamic system analysis.

References

  1. Ford, Lester R. Jr.; Johnson, Selmer M. (1959-05-01). "A Tournament Problem". The American Mathematical Monthly. 66 (5): 387–389. doi:10.1080/00029890.1959.11989306. ISSN   0002-9890.
  2. 1 2 Introduction to algorithms. Cormen, Thomas H. (Third ed.). Cambridge, Mass.: MIT Press. 2009. ISBN   978-0-262-27083-0. OCLC   676697295.{{cite book}}: CS1 maint: others (link)
  3. Rabin, Michael O. (1972-12-01). "Proving simultaneous positivity of linear forms". Journal of Computer and System Sciences. 6 (6): 639–650. doi: 10.1016/S0022-0000(72)80034-5 . ISSN   0022-0000.
  4. Reingold, Edward M. (1972-10-01). "On the Optimality of Some Set Algorithms". Journal of the ACM. 19 (4): 649–659. doi: 10.1145/321724.321730 . ISSN   0004-5411. S2CID   18605212.
  5. Preparata, Franco P. (1985). Computational geometry : an introduction. Shamos, Michael Ian. New York: Springer-Verlag. ISBN   0-387-96131-3. OCLC   11970840.
  6. Ben-Or, Michael (1983-12-01). "Lower bounds for algebraic computation trees". Proceedings of the fifteenth annual ACM symposium on Theory of computing - STOC '83. STOC '83. New York, NY, USA: Association for Computing Machinery. pp. 80–86. doi: 10.1145/800061.808735 . ISBN   978-0-89791-099-6. S2CID   1499957.
  7. Dobkin, David; Lipton, Richard J. (1976-06-01). "Multidimensional Searching Problems". SIAM Journal on Computing. 5 (2): 181–186. doi:10.1137/0205015. ISSN   0097-5397.
  8. Michael Steele, J; Yao, Andrew C (1982-03-01). "Lower bounds for algebraic decision trees". Journal of Algorithms. 3 (1): 1–8. doi:10.1016/0196-6774(82)90002-5. ISSN   0196-6774.
  9. 1 2 Beals, R.; Buhrman, H.; Cleve, R.; Mosca, M.; de Wolf, R. (2001). "Quantum lower bounds by polynomials". Journal of the ACM. 48 (4): 778–797. arXiv: quant-ph/9802049 . doi:10.1145/502090.502097. S2CID   1078168.
  10. Blum, M.; Impagliazzo, R. (1987). "Generic oracles and oracle classes". Proceedings of 18th IEEE FOCS. pp. 118–126.
  11. Hartmanis, J.; Hemachandra, L. (1987), "One-way functions, robustness, and non-isomorphism of NP-complete sets", Technical Report DCS TR86-796, Cornell University
  12. Tardos, G. (1989). "Query complexity, or why is it difficult to separate NPA  coNPA from PA by random oracles A?". Combinatorica. 9 (4): 385–392. doi:10.1007/BF02125350. S2CID   45372592.
  13. 1 2 3 Nisan, N. (1989). "CREW PRAMs and decision trees". Proceedings of 21st ACM STOC. pp. 327–335.
  14. Kulkarni, R. and Tal, A. On Fractional Block Sensitivity. Electronic Colloquium on Computational Complexity (ECCC). Vol. 20. 2013.
  15. 1 2 Ambainis, Andris; Balodis, Kaspars; Belovs, Aleksandrs; Lee, Troy; Santha, Miklos; Smotrovs, Juris (2017-09-04). "Separations in Query Complexity Based on Pointer Functions". Journal of the ACM. 64 (5): 32:1–32:24. arXiv: 1506.04719 . doi:10.1145/3106234. ISSN   0004-5411. S2CID   10214557.
  16. 1 2 Aaronson, Scott; Ben-David, Shalev; Kothari, Robin; Rao, Shravas; Tal, Avishay (2020-10-23). "Degree vs. Approximate Degree and Quantum Implications of Huang's Sensitivity Theorem". arXiv: 2010.12629 [quant-ph].
  17. Midrijanis, Gatis (2004), "Exact quantum query complexity for total Boolean functions", arXiv: quant-ph/0403168
  18. Midrijanis, Gatis (2005), "On Randomized and Quantum Query Complexities", arXiv: quant-ph/0501142
  19. Nisan, Noam; Szegedy, Mario (1992-07-01). "On the degree of Boolean functions as real polynomials". Proceedings of the twenty-fourth annual ACM symposium on Theory of computing - STOC '92. STOC '92. Victoria, British Columbia, Canada: Association for Computing Machinery. pp. 462–467. doi: 10.1145/129712.129757 . ISBN   978-0-89791-511-3. S2CID   6919144.
  20. Rubinstein, David (1995-06-01). "Sensitivity vs. block sensitivity of Boolean functions". Combinatorica. 15 (2): 297–299. doi:10.1007/BF01200762. ISSN   1439-6912. S2CID   41010711.
  21. Huang, Hao (2019). "Induced subgraphs of hypercubes and a proof of the Sensitivity Conjecture". Annals of Mathematics. 190 (3): 949–955. arXiv: 1907.00847 . doi:10.4007/annals.2019.190.3.6. ISSN   0003-486X. JSTOR   10.4007/annals.2019.190.3.6. S2CID   195767594.
  22. Klarreich, Erica. "Decades-Old Computer Science Conjecture Solved in Two Pages". Quanta Magazine. Retrieved 2019-07-26.
  23. Hatami, Pooya; Kulkarni, Raghav; Pankratov, Denis (2011-06-22). "Variations on the Sensitivity Conjecture". Theory of Computing. 1: 1–27. doi: 10.4086/toc.gs.2011.004 . ISSN   1557-2862. S2CID   6918061.

Surveys