Database theory

Last updated

Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems.

Contents

Theoretical aspects of data management include, among other areas, the foundations of query languages, computational complexity and expressive power of queries, finite model theory, database design theory, dependency theory, foundations of concurrency control and database recovery, deductive databases, temporal and spatial databases, real-time databases, managing uncertain data and probabilistic databases, and Web data.

Most research work has traditionally been based on the relational model, since this model is usually considered the simplest and most foundational model of interest. Corresponding results for other data models, such as object-oriented or semi-structured models, or, more recently, graph data models and XML, are often derivable from those for the relational model. [1]

Database theory helps one to understand the complexity and power of query languages and their connection to logic. Starting from relational algebra and first-order logic (which are equivalent by Codd's theorem) and the insight that important queries such as graph reachability are not expressible in this language, [2] more powerful language based on logic programming and fixpoint logic such as Datalog were studied. [3] The theory also explores foundations of query optimization and data integration. Here most work studied conjunctive queries, which admit query optimization even under constraints using the chase algorithm.

The main research conferences in the area are the ACM Symposium on Principles of Database Systems (PODS) and the International Conference on Database Theory (ICDT).

See also

Related Research Articles

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, formal language theory, the lambda calculus and type theory.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

Datalog is a declarative logic programming language. While it is syntactically a subset of Prolog, Datalog generally uses a bottom-up rather than top-down evaluation model. This difference yields significantly different behavior and properties from Prolog. It is often used as a query language for deductive databases. Datalog has been applied to problems in data integration, networking, program analysis, and more.

A deductive database is a database system that can make deductions based on rules and facts stored in its database. Datalog is the language typically used to specify facts, rules and queries in deductive databases. Deductive databases have grown out of the desire to combine logic programming with relational databases to construct systems that support a powerful formalism and are still fast and able to deal with very large datasets. Deductive databases are more expressive than relational databases but less expressive than logic programming systems such as Prolog. In recent years, deductive databases have found new application in data integration, information extraction, networking, program analysis, security, and cloud computing.

In computer science, the expressive power of a language is the breadth of ideas that can be represented and communicated in that language. The more expressive a language is, the greater the variety and quantity of ideas it can be used to represent.

Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial and scientific domains. Data integration appears with increasing frequency as the volume, complexity and the need to share existing data explodes. It has become the focus of extensive theoretical work, and numerous open problems remain unsolved. Data integration encourages collaboration between internal as well as external users. The data being integrated must be received from a heterogeneous database system and transformed to a single coherent data store that provides synchronous data across a network of files for clients. A common use of data integration is in data mining when analyzing and extracting information from existing databases that can be useful for Business information.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

<span class="mw-page-title-main">Mihalis Yannakakis</span> Greek-American computer scientist

Mihalis Yannakakis is professor of computer science at Columbia University. He is noted for his work in computational complexity, databases, and other related fields. He won the Donald E. Knuth Prize in 2005.

<span class="mw-page-title-main">Ronald Fagin</span> American mathematician and computer scientist

Ronald Fagin is an American mathematician and computer scientist, and IBM Fellow at the IBM Almaden Research Center. He is known for his work in database theory, finite model theory, and reasoning about knowledge.

Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.

<span class="mw-page-title-main">Georg Gottlob</span> Austrian computer scientist

Georg Gottlob FRS is an Austrian-Italian computer scientist who works in the areas of database theory, logic, and artificial intelligence and is Professor of Informatics at the University of Calabria. He was Professor at the University of Oxford.

Ashok K. Chandra was a computer scientist at Microsoft Research in Mountain View, California, United States, where he was a general manager at the Internet Services Research Center. Chandra received his PhD in Computer Science from Stanford University, an MS from University of California, Berkeley, and a BTech from IIT Kanpur. He was previously Director of Database and Distributed Systems at IBM Almaden Research Center.

In relational database theory, a tuple-generating dependency (TGD) is a certain kind of constraint on a relational database. It is a subclass of the class of embedded dependencies (EDs).

<span class="mw-page-title-main">Nicola Leone</span> Italian computer scientist

Nicola Leone is an Italian computer scientist who works in the areas of artificial intelligence, knowledge representation and reasoning, and database theory. Leone is currently the rector of the University of Calabria and a professor of Computer Science. Previously, he was a professor of Database Systems at the TU Wien.

The Vadalog system is a Knowledge Graph Management System (KGMS) that offers a language for performing complex logic reasoning tasks over knowledge graphs. At the same time, Vadalog delivers a platform to support the entire spectrum of data science tasks: data integration, pre-processing, statistical analysis, machine learning, algorithmic modeling, probabilistic reasoning and temporal reasoning. Its language is based on an extension of the rule-based language Datalog, Warded Datalog±, a high-performance language using an aggressive termination control strategy. Vadalog can support the entire spectrum of data science activities and tools. The system can read from and connect to multiple sources, from relational databases, such as PostgreSQL and MySQL, to graph databases, such as Neo4j, as well as make use of machine learning tools, and a web data extraction tool, OXPath. Additional Python libraries and extensions can also be easily integrated into the system.

DatalogZ is an extension of Datalog with integer arithmetic and comparisons. The decision problem of whether or not a given ground atom (fact) is entailed by a DatalogZ program is RE-complete, which can be shown by a reduction to diophantine equations.

In database theory, the query evaluation problem is the problem of determining the answers to a query on a database. Research in database theory aims at determining the computational complexity of answering different kinds of queries over databases, in particular over relational databases.

References

  1. "A website on the Theoretical Foundations of Data Management".
  2. Aho, A.V. and Ullman, J.D., 1979, January. Universality of data retrieval languages. In Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages (pp. 110-119).
  3. Maier, D., Tekle, K.T., Kifer, M. and Warren, D.S., 2018. Datalog: concepts, history, and outlook. In Declarative Logic Programming: Theory, Systems, and Applications (pp. 3-100).

General references