Query evaluation

Last updated

In database theory, the query evaluation problem is the problem [ verification needed ] of determining the answers to a query on a database. Research in database theory aims at determining the computational complexity of answering different kinds of queries over databases, in particular over relational databases.

Contents

Formal definition

The query evaluation problem takes two inputs: the query to be answered, and the database on which to answer it. The output of the problem is the set of answers to the query on the database. If the queries are Boolean queries, i.e., queries have a yes or no answer (for example, Boolean conjunctive queries) then the query evaluation problem is a decision problem.

The query evaluation problem is usually posed for a specific class of queries and databases. For instance, one example of the query evaluation problem would be the problem of evaluating a conjunctive query on a relational database.

The computational complexity of the problem can be measured in different ways, [1] to account for the fact that the two inputs of the problem are different:

Query classes

The complexity of query evaluation can be studied for several query classes, for instance acyclic queries, conjunctive queries, unions of conjunctive queries, Datalog, regular path queries, etc., up to logical formalisms like first-order logic or monadic second-order logic.

For instance, for Boolean conjunctive queries, the complexity of query evaluation is polynomial in data complexity: it even falls in the class AC0. By contrast, the query complexity and combined complexity are NP-complete [3] by a reduction from 3-colorability.[ citation needed ]

Boolean versus non-Boolean queries

The complexity of query evaluation can be studied for queries that return answers, or for Boolean queries (yes/no queries). However, we can often reduce to the case of Boolean queries. More specifically, if the number of answers to the query is always polynomial in the database size, and if we can rewrite the query to a Boolean query for each answer, then we can reduce query evaluation for a non-Boolean query to polynomially many Boolean query evaluation problems.[ citation needed ]

Related Research Articles

In computational complexity theory, a branch of computer science, bounded-error probabilistic polynomial time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time with an error probability bounded by 1/3 for all instances. BPP is one of the largest practical classes of problems, meaning most problems of interest in BPP have efficient probabilistic algorithms that can be run quickly on real modern machines. BPP also contains P, the class of problems solvable in polynomial time with a deterministic machine, since a deterministic machine is a special case of a probabilistic machine.

In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfies a given Boolean formula. In other words, it asks whether the variables of a given Boolean formula can be consistently replaced by the values TRUE or FALSE in such a way that the formula evaluates to TRUE. If this is the case, the formula is called satisfiable. On the other hand, if no such assignment exists, the function expressed by the formula is FALSE for all possible variable assignments and the formula is unsatisfiable. For example, the formula "a AND NOT b" is satisfiable because one can find the values a = TRUE and b = FALSE, which make (a AND NOT b) = TRUE. In contrast, "a AND NOT a" is unsatisfiable.

The P versus NP problem is a major unsolved problem in theoretical computer science. Informally, it asks whether every problem whose solution can be quickly verified can also be quickly solved.

<span class="mw-page-title-main">Stephen Cook</span> American-Canadian computer scientist, contributor to complexity theory

Stephen Arthur Cook is an American-Canadian computer scientist and mathematician who has made significant contributions to the fields of complexity theory and proof complexity. He is a university professor emeritus at the University of Toronto, Department of Computer Science and Department of Mathematics.

Constraint satisfaction problems (CSPs) are mathematical questions defined as a set of objects whose state must satisfy a number of constraints or limitations. CSPs represent the entities in a problem as a homogeneous collection of finite constraints over variables, which is solved by constraint satisfaction methods. CSPs are the subject of research in both artificial intelligence and operations research, since the regularity in their formulation provides a common basis to analyze and solve problems of many seemingly unrelated families. CSPs often exhibit high complexity, requiring a combination of heuristics and combinatorial search methods to be solved in a reasonable time. Constraint programming (CP) is the field of research that specifically focuses on tackling these kinds of problems. Additionally, the Boolean satisfiability problem (SAT), satisfiability modulo theories (SMT), mixed integer programming (MIP) and answer set programming (ASP) are all fields of research focusing on the resolution of particular forms of the constraint satisfaction problem.

In computational complexity theory, P, also known as PTIME or DTIME(nO(1)), is a fundamental complexity class. It contains all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computation time, or polynomial time.

In computational complexity theory, the Cook–Levin theorem, also known as Cook's theorem, states that the Boolean satisfiability problem is NP-complete. That is, it is in NP, and any problem in NP can be reduced in polynomial time by a deterministic Turing machine to the Boolean satisfiability problem.

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

In computational complexity theory, L is the complexity class containing decision problems that can be solved by a deterministic Turing machine using a logarithmic amount of writable memory space. Formally, the Turing machine has two tapes, one of which encodes the input and can only be read, whereas the other tape has logarithmic size but can be read as well as written. Logarithmic space is sufficient to hold a constant number of pointers into the input and a logarithmic number of boolean flags, and many basic logspace algorithms use the memory in this way.

Descriptive complexity is a branch of computational complexity theory and of finite model theory that characterizes complexity classes by the type of logic needed to express the languages in them. For example, PH, the union of all complexity classes in the polynomial hierarchy, is precisely the class of languages expressible by statements of second-order logic. This connection between complexity and the logic of finite structures allows results to be transferred easily from one area to the other, facilitating new proof methods and providing additional evidence that the main complexity classes are somehow "natural" and not tied to the specific abstract machines used to define them.

MAX-3SAT is a problem in the computational complexity subfield of computer science. It generalises the Boolean satisfiability problem (SAT) which is a decision problem considered in complexity theory. It is defined as:

In computational complexity theory, LOGCFL is the complexity class that contains all decision problems that can be reduced in logarithmic space to a context-free language. This class is closed under complementation. It is situated between NL and AC1, in the sense that it contains the former and is contained in the latter. Problems that are complete for LOGCFL include many problems that can be characterized by acyclic hypergraphs:

<span class="mw-page-title-main">Circuit complexity</span> Model of computational complexity

In theoretical computer science, circuit complexity is a branch of computational complexity theory in which Boolean functions are classified according to the size or depth of the Boolean circuits that compute them. A related notion is the circuit complexity of a recursive language that is decided by a uniform family of circuits .

ACC<sup>0</sup>

ACC0, sometimes called ACC, is a class of computational models and problems defined in circuit complexity, a field of theoretical computer science. The class is defined by augmenting the class AC0 of constant-depth "alternating circuits" with the ability to count; the acronym ACC stands for "AC with counters". Specifically, a problem belongs to ACC0 if it can be solved by polynomial-size, constant-depth circuits of unbounded fan-in gates, including gates that count modulo a fixed integer. ACC0 corresponds to computation in any solvable monoid. The class is very well studied in theoretical computer science because of the algebraic connections and because it is one of the largest concrete computational models for which computational impossibility results, so-called circuit lower bounds, can be proved.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

In mathematical logic, monadic second-order logic (MSO) is the fragment of second-order logic where the second-order quantification is limited to quantification over sets. It is particularly important in the logic of graphs, because of Courcelle's theorem, which provides algorithms for evaluating monadic second-order formulas over graphs of bounded treewidth. It is also of fundamental importance in automata theory, where the Büchi–Elgot–Trakhtenbrot theorem gives a logical characterization of the regular languages.

<span class="mw-page-title-main">Georg Gottlob</span> Austrian computer scientist

Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford.

In computer science, an enumeration algorithm is an algorithm that enumerates the answers to a computational problem. Formally, such an algorithm applies to problems that take an input and produce a list of solutions, similarly to function problems. For each input, the enumeration algorithm must produce the list of all solutions, without duplicates, and then halt. The performance of an enumeration algorithm is measured in terms of the time required to produce the solutions, either in terms of the total time required to produce all solutions, or in terms of the maximal delay between two consecutive solutions and in terms of a preprocessing time, counted as the time before outputting the first solution. This complexity can be expressed in terms of the size of the input, the size of each individual output, or the total size of the set of all outputs, similarly to what is done with output-sensitive algorithms.

The Hidden Matching Problem is a computation complexity problem that can be solved using quantum protocols: Let be a positive even integer. In the Hidden Matching Problem, Alice is given and Bob is given ( denotes the family of all possible perfect matchings on nodes). Their goal is to output a tuple such that the edge belongs to the matching and .

The Yannakakis algorithm is an algorithm in database theory for computing the output of an (alpha-)acyclic conjunctive query. The algorithm is named after Mihalis Yannakakis.

References

  1. Vardi, Moshe Y. (1982-05-05). "The complexity of relational query languages (Extended Abstract)". Proceedings of the fourteenth annual ACM symposium on Theory of computing - STOC '82. New York, NY, USA: Association for Computing Machinery. pp. 137–146. doi:10.1145/800070.802186. ISBN   978-0-89791-070-5.
  2. Abiteboul, Serge; Hull, Richard; Vianu, Victor (1994-12-02). Foundations of Databases: The Logical Level (1st ed.). Reading, Mass.: Pearson. ISBN   978-0-201-53771-0.
  3. Greco, Gianluigi; Scarcello, Francesco (2014-06-18). "Counting solutions to conjunctive queries: Structural and hybrid tractability". Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS '14. New York, NY, USA: Association for Computing Machinery. pp. 132–143. doi:10.1145/2594538.2594559. ISBN   978-1-4503-2375-8.