Codd's theorem

Last updated

Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.

Contents

The theorem is named after Edgar F. Codd, the father of the relational model for database management.

The domain independent relational calculus queries are precisely those relational calculus queries that are invariant under choosing domains of values beyond those appearing in the database itself. That is, queries that may return different results for different domains are excluded. An example of such a forbidden query is the query "select all tuples other than those occurring in relation R", where R is a relation in the database. Assuming different domains, i.e., sets of atomic data items from which tuples can be constructed, this query returns different results and thus is clearly not domain independent.

Codd's Theorem is notable since it establishes the equivalence of two syntactically quite dissimilar languages: relational algebra is a variable-free language, while relational calculus is a logical language with variables and quantification.

Relational calculus is essentially equivalent to first-order logic, [1] and indeed, Codd's Theorem had been known to logicians since the late 1940s. [2] [3]

Query languages that are equivalent in expressive power to relational algebra were called relationally complete by Codd. By Codd's Theorem, this includes relational calculus. Relational completeness clearly does not imply that any interesting database query can be expressed in relationally complete languages. Well-known examples of inexpressible queries include simple aggregations (counting tuples, or summing up values occurring in tuples, which are operations expressible in SQL but not in relational algebra) and computing the transitive closure of a graph given by its binary edge relation (see also expressive power). Codd's theorem also doesn't consider SQL nulls and the three-valued logic they entail; the logical treatment of nulls remains mired in controversy. [4] Additionally, SQL has multiset semantics as allowing duplicate rows. Nevertheless, relational completeness constitutes an important yardstick by which the expressive power of query languages can be compared.

Notes

  1. Abiteboul, Serge; Hull, Richard B.; Vianu, Victor (1995). Foundations of Databases . Addison-Wesley. ISBN   0-201-53771-0.
  2. Chin, L. H.; Tarski, A. (1948). "Remarks on Projective Algebras". Bulletin of the American Mathematical Society . 54 (1): 80–81. doi: 10.1090/S0002-9904-1948-08948-0 .
  3. Tarski, A.; Thompson, F. B. (1952). "Some General Properties of Cylindric Algebras". Bulletin of the American Mathematical Society. 58 (1): 65. doi: 10.1090/S0002-9904-1952-09549-5 .
  4. For recent work extending Codd's theorem in this direction see Franconi, Enrico; Tessaris, Sergio (2012). "On the Logic of SQL Nulls" (PDF). Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management, Ouro Preto, Brazil, June 27–30, 2012: 114–128.

Related Research Articles

In mathematics, a finitary relation over a sequence of sets X1, ..., Xn is a subset of the Cartesian product X1 × ... × Xn; that is, it is a set of n-tuples (x1, ..., xn), each being a sequence of elements xi in the corresponding Xi. Typically, the relation describes a possible connection between the elements of an n-tuple. For example, the relation "x is divisible by y and z" consists of the set of 3-tuples such that when substituted to x, y and z, respectively, make the sentence true.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every relational-database-management system. Michel Lacroix and Alain Pirotte proposed domain calculus, which is closer to first-order logic and together with Codd showed that both of these calculi are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that is part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être of relational calculus is the formalization of query optimization, which is finding more efficient manners to execute the same query in a database.

In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

Database theory encapsulates a broad range of topics related to the study and research of the theoretical realm of databases and database management systems.

Finite model theory is a subarea of model theory. Model theory is the branch of logic which deals with the relation between a formal language (syntax) and its interpretations (semantics). Finite model theory is a restriction of model theory to interpretations on finite structures, which have a finite universe.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-orientated (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

<span class="mw-page-title-main">Relation (database)</span> Set of tuples consisting of values indexed by attributes

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.

An uncertain database is a kind of database studied in database theory. The goal of uncertain databases is to manage information on which there is some uncertainty. Uncertain databases make it possible to explicitly represent and manage uncertainty on the data, usually in a succinct way.

In relational database theory, a tuple-generating dependency (TGD) is a certain kind of constraint on a relational database. It is a subclass of the class of embedded dependencies (EDs).

In relational database theory, an embedded dependency (ED) is a certain kind of constraint on a relational database. It is the most general type of constraint used in practice, including both tuple-generating dependencies and equality-generating dependencies. Embedded dependencies can express functional dependencies, join dependencies, multivalued dependencies, inclusion dependencies, foreign key dependencies, and many more besides.

In database theory, Imieliński–Lipski algebra is an extension of relational algebra onto tables with different types of null values. It is used to operate on relations with incomplete information.

References