Regular path query

Last updated

In databases and specifically in graph databases, a regular path query [1] or RPQ is a query asking for pairs of endpoints in the database that are connected by a path satisfying a certain regular expression. A similar feature exists in the SPARQL query language as "property paths".

Contents

Definition

A graph database consists of a directed graph whose edges carry a label. A regular path query is just a regular expression over the set of labels. For instance, in a graph database where vertices represent users and there is an edge label "parent" for edges from a parent to a child, the regular path query would select pairs of a node x and a descendant y of x, with a path from x to y of "parent" edges having length 1 or more.

Semantics

The answers to RPQs can consist of endpoint pairs, i.e., pairs of nodes x and y that are connected by some path satisfying the regular expression; or it can consist of the list of all paths satisfying the regular expression. However, this set of paths is generally infinite.

To ensure that the number of results is not infinite, the semantics of RPQs is sometimes defined to return only the simple paths, i.e., the paths that do not go twice via the same vertex; or the trails, i.e., the paths that do not go twice through the same edge. [2]

Complexity

The evaluation of regular path queries (RPQ), in the sense of returning all endpoint pairs, can be performed in polynomial time. To do this, for every endpoint pair, we can see the graph database as a finite automaton, also represent the regular path query as a finite automaton, and check if a suitable path exists by checking that the intersection of both languages is nonempty (i.e., solving the emptiness problem), for instance via the product automaton construction.

Other problems

Several classical problems about queries have been studied for regular path queries, such as query containment [3] and query rewriting. [4]

Extensions

Database theory research has investigated more expressive variants of RPQs:

Related Research Articles

<span class="mw-page-title-main">Minimum spanning tree</span> Least-weight tree connecting graph vertices

A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. That is, it is a spanning tree whose sum of edge weights is as small as possible. More generally, any edge-weighted undirected graph has a minimum spanning forest, which is a union of the minimum spanning trees for its connected components.

In computer science, the Floyd–Warshall algorithm is an algorithm for finding shortest paths in a directed weighted graph with positive or negative edge weights. A single execution of the algorithm will find the lengths of shortest paths between all pairs of vertices. Although it does not return details of the paths themselves, it is possible to reconstruct the paths with simple modifications to the algorithm. Versions of the algorithm can also be used for finding the transitive closure of a relation , or widest paths between all pairs of vertices in a weighted graph.

In mathematics, the transitive closureR+ of a homogeneous binary relation R on a set X is the smallest relation on X that contains R and is transitive. For finite sets, "smallest" can be taken in its usual sense, of having the fewest related pairs; for infinite sets R+ is the unique minimal transitive superset of R.

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

<span class="mw-page-title-main">Graph (discrete mathematics)</span> Vertices connected in pairs by edges

In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects are represented by abstractions called vertices and each of the related pairs of vertices is called an edge. Typically, a graph is depicted in diagrammatic form as a set of dots or circles for the vertices, joined by lines or curves for the edges.

In computer science, a topological sort or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge (u,v) from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. Precisely, a topological sort is a graph traversal in which each node v is visited only after all its dependencies are visited. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time. Topological sorting has many applications, especially in ranking problems such as feedback arc set. Topological sorting is possible even when the DAG has disconnected components.

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

Query optimization is a feature of many relational database management systems and other databases such as NoSQL and graph databases. The query optimizer attempts to determine the most efficient way to execute a given query by considering the possible query plans.

String diagrams are a formal graphical language for representing morphisms in monoidal categories, or more generally 2-cells in 2-categories. They are a prominent tool in applied category theory. When interpreted in the monoidal category of vector spaces and linear maps with the tensor product, string diagrams are called tensor networks or Penrose graphical notation. This has led to the development of categorical quantum mechanics where the axioms of quantum theory are expressed in the language of monoidal categories.

In the theory of relational databases, a Boolean conjunctive query is a conjunctive query without distinguished predicates, i.e., a query in the form , where each is a relation symbol and each is a tuple of variables and constants; the number of elements in is equal to the arity of . Such a query evaluates to either true or false depending on whether the relations in the database contain the appropriate tuples of values, i.e. the conjunction is valid according to the facts in the database.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

In mathematical logic, monadic second-order logic (MSO) is the fragment of second-order logic where the second-order quantification is limited to quantification over sets. It is particularly important in the logic of graphs, because of Courcelle's theorem, which provides algorithms for evaluating monadic second-order formulas over graphs of bounded treewidth. It is also of fundamental importance in automata theory, where the Büchi–Elgot–Trakhtenbrot theorem gives a logical characterization of the regular languages.

In graph theory, a branch of mathematics, a skew-symmetric graph is a directed graph that is isomorphic to its own transpose graph, the graph formed by reversing all of its edges, under an isomorphism that is an involution without any fixed points. Skew-symmetric graphs are identical to the double covering graphs of bidirected graphs.

<span class="mw-page-title-main">Georg Gottlob</span> Austrian computer scientist

Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford.

Clustering is the problem of partitioning data points into groups based on their similarity. Correlation clustering provides a method for clustering a set of objects into the optimum number of clusters without specifying that number in advance.

Phokion G. KolaitisACM is a computer scientist who is currently a Distinguished Research Professor at UC Santa Cruz and a Principal Research Staff Member at the IBM Almaden Research Center. His research interests include principles of database systems, logic in computer science, and computational complexity.

GQL is a standard graph query language published by ISO in April 2024.

In database theory and knowledge representation, the one of the certain answers is the set of answers to a given query consisting of the intersection of all the complete databases that are consistent with a given knowledge base. The notion of certain answer, investigated in database theory since the 1970s, is indeed defined in the context of open world assumption, where the given knowledge base is assumed to be incomplete.

In database theory, the query evaluation problem is the problem of determining the answers to a query on a database. Research in database theory aims at determining the computational complexity of answering different kinds of queries over databases, in particular over relational databases.

The Yannakakis algorithm is an algorithm in database theory for computing the output of an (alpha-)acyclic conjunctive query. The algorithm is named after Mihalis Yannakakis.

References

  1. Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Vardi, M.Y. (2000). Answering regular path queries using views. pp. 389–398. doi:10.1109/ICDE.2000.839439. ISBN   0-7695-0506-6 . Retrieved 2024-01-18.
  2. Martens, Wim; Trautner, Tina (2019-10-15). "Dichotomies for Evaluating Simple Regular Path Queries". ACM Transactions on Database Systems. 44 (4): 16:1–16:46. doi:10.1145/3331446. ISSN   0362-5915. S2CID   204728561.
  3. Calvanese, D.; De Giacomo, G.; Lenzerini, M.; Vardi, M. Y. (2003-12-01). "Reasoning on regular path queries". ACM SIGMOD Record. 32 (4): 83–92. doi:10.1145/959060.959076. ISSN   0163-5808. S2CID   1803399.
  4. Calvanese, Diego; De Giacomo, Giuseppe; Lenzerini, Maurizio; Vardi, Moshe Y. (1999-05-01). "Rewriting of regular expressions and regular path queries". Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. PODS '99. New York, NY, USA: Association for Computing Machinery. pp. 194–204. doi:10.1145/303976.303996. ISBN   978-1-58113-062-1.
  5. Figueira, Diego; Morvan, Rémi (November 2023). Semantic Tree-Width and Path-Width of Conjunctive Regular Path Queries.