Tuple relational calculus

Last updated

Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every relational-database-management system. Michel Lacroix and Alain Pirotte proposed domain calculus, which is closer to first-order logic and together with Codd showed that both of these calculi (as well as relational algebra) are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

Contents

Definition of the calculus

Relational database

Since the calculus is a query language for relational databases we first have to define a relational database. The basic relational building block is the domain (somewhat similar, but not equal to, a data type). A tuple is a finite sequence of attributes, which are ordered pairs of domains and values. A relation is a set of (compatible) tuples. Although these relational concepts are mathematically defined, those definitions map loosely to traditional database concepts. A table is an accepted visual representation of a relation; a tuple is similar to the concept of a row.

We first assume the existence of a set C of column names, examples of which are "name", "author", "address", etcetera. We define headers as finite subsets of C. A relational database schema is defined as a tuple S = (D, R, h) where D is the domain of atomic values (see relational model for more on the notions of domain and atomic value), R is a finite set of relation names, and

h : R → 2C

a function that associates a header with each relation name in R. (Note that this is a simplification from the full relational model where there is more than one domain and a header is not just a set of column names but also maps these column names to a domain.) Given a domain D we define a tuple over D as a partial function that maps some column names to an atomic value in D. An example would be (name : "Harry", age : 25).

t : CD

The set of all tuples over D is denoted as TD. The subset of C for which a tuple t is defined is called the domain of t (not to be confused with the domain in the schema) and denoted as dom(t).

Finally we define a relational database given a schema S = (D, R, h) as a function

db : R → 2TD

that maps the relation names in R to finite subsets of TD, such that for every relation name r in R and tuple t in db(r) it holds that

dom(t) = h(r).

The latter requirement simply says that all the tuples in a relation should contain the same column names, namely those defined for it in the schema.

Atoms

For the construction of the formulas we will assume an infinite set V of tuple variables. The formulas are defined given a database schema S = (D, R, h) and a partial function type : V ⇸ 2C, called at type assignment, that assigns headers to some tuple variables. We then define the set of atomic formulas A[S,type] with the following rules:

  1. if v and w in V, a in type(v) and b in type(w) then the formula v.a = w.b is in A[S,type],
  2. if v in V, a in type(v) and k denotes a value in D then the formula v.a = k is in A[S,type], and
  3. if v in V, r in R and type(v) = h(r) then the formula r(v) is in A[S,type].

Examples of atoms are:

The formal semantics of such atoms is defined given a database db over S and a tuple variable binding val : VTD that maps tuple variables to tuples over the domain in S:

  1. v.a = w.b is true if and only if val(v)(a) = val(w)(b)
  2. v.a = k is true if and only if val(v)(a) = k
  3. r(v) is true if and only if val(v) is in db(r)

Formulas

The atoms can be combined into formulas, as is usual in first-order logic, with the logical operators ∧ (and), ∨ (or) and ¬ (not), and we can use the existential quantifier (∃) and the universal quantifier (∀) to bind the variables. We define the set of formulasF[S,type] inductively with the following rules:

  1. every atom in A[S,type] is also in F[S,type]
  2. if f1 and f2 are in F[S,type] then the formula f1f2 is also in F[S,type]
  3. if f1 and f2 are in F[S,type] then the formula f1f2 is also in F[S,type]
  4. if f is in F[S,type] then the formula ¬ f is also in F[S,type]
  5. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula ∃ v : H ( f ) is also in F[S,type], where type[v->H] denotes the function that is equal to type except that it maps v to H,
  6. if v in V, H a header and f a formula in F[S,type[v->H]] then the formula ∀ v : H ( f ) is also in F[S,type]

Examples of formulas:

Note that the last formula states that all books that are written by C. J. Date have as their subject the relational model. As usual we omit brackets if this causes no ambiguity about the semantics of the formula.

We will assume that the quantifiers quantify over the universe of all tuples over the domain in the schema. This leads to the following formal semantics for formulas given a database db over S and a tuple variable binding val : V -> TD:

  1. f1f2 is true if and only if f1 is true and f2 is true,
  2. f1f2 is true if and only if f1 is true or f2 is true or both are true,
  3. ¬ f is true if and only if f is not true,
  4. v : H ( f ) is true if and only if there is a tuple t over D such that dom(t) = H and the formula f is true for val[v->t], and
  5. v : H ( f ) is true if and only if for all tuples t over D such that dom(t) = H the formula f is true for val[v->t].

Queries

Finally we define what a query expression looks like given a schema S = (D, R, h):

{ v : H | f(v) }

where v is a tuple variable, H a header and f(v) a formula in F[S,type] where type = { (v, H) } and with v as its only free variable. The result of such a query for a given database db over S is the set of all tuples t over D with dom(t) = H such that f is true for db and val = { (v, t) }.

Examples of query expressions are:

Semantic and syntactic restriction of the calculus

Domain-independent queries

Because the semantics of the quantifiers is such that they quantify over all the tuples over the domain in the schema it can be that a query may return a different result for a certain database if another schema is presumed. For example, consider the two schemas S1 = ( D1, R, h ) and S2 = ( D2, R, h ) with domains D1 = { 1 }, D2 = { 1, 2 }, relation names R = { "r1" } and headers h = { ("r1", {"a"}) }. Both schemas have a common instance:

db = { ( "r1", { ("a", 1) } ) }

If we consider the following query expression

{ t : {a} | t.a = t.a }

then its result on db is either { (a : 1) } under S1 or { (a : 1), (a : 2) } under S2. It will also be clear that if we take the domain to be an infinite set, then the result of the query will also be infinite. To solve these problems we will restrict our attention to those queries that are domain independent, i.e., the queries that return the same result for a database under all of its schemas.

An interesting property of these queries is that if we assume that the tuple variables range over tuples over the so-called active domain of the database, which is the subset of the domain that occurs in at least one tuple in the database or in the query expression, then the semantics of the query expressions does not change. In fact, in many definitions of the tuple calculus this is how the semantics of the quantifiers is defined, which makes all queries by definition domain independent.

Safe queries

In order to limit the query expressions such that they express only domain-independent queries a syntactical notion of safe query is usually introduced. To determine whether a query expression is safe we will derive two types of information from a query. The first is whether a variable-column pair t.a is bound to the column of a relation or a constant, and the second is whether two variable-column pairs are directly or indirectly equated (denoted t.v == s.w).

For deriving boundedness we introduce the following reasoning rules:

  1. in " v.a = w.b " no variable-column pair is bound,
  2. in " v.a = k " the variable-column pair v.a is bound,
  3. in " r(v) " all pairs v.a are bound for a in type(v),
  4. in " f1f2 " all pairs are bound that are bound either in f1 or in f2,
  5. in " f1f2 " all pairs are bound that are bound both in f1 and in f2,
  6. in " ¬ f " no pairs are bound,
  7. in " ∃ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v, and
  8. in " ∀ v : H ( f ) " a pair w.a is bound if it is bound in f and w <> v.

For deriving equatedness we introduce the following reasoning rules (next to the usual reasoning rules for equivalence relations: reflexivity, symmetry and transitivity):

  1. in " v.a = w.b " it holds that v.a == w.b,
  2. in " v.a = k " no pairs are equated,
  3. in " r(v) " no pairs are equated,
  4. in " f1f2 " it holds that v.a == w.b if it holds either in f1 or in f2,
  5. in " f1f2 " it holds that v.a == w.b if it holds both in f1 and in f2,
  6. in " ¬ f " no pairs are equated,
  7. in " ∃ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v, and
  8. in " ∀ v : H ( f ) " it holds that w.a == x.b if it holds in f and w<>v and x<>v.

We then say that a query expression { v : H | f(v) } is safe if

The restriction to safe query expressions does not limit the expressiveness since all domain-independent queries that could be expressed can also be expressed by a safe query expression. This can be proven by showing that for a schema S = (D, R, h), a given set K of constants in the query expression, a tuple variable v and a header H we can construct a safe formula for every pair v.a with a in H that states that its value is in the active domain. For example, assume that K={1,2}, R={"r"} and h = { ("r", {"a, "b"}) } then the corresponding safe formula for v.b is:

v.b = 1 ∨ v.b = 2 ∨ ∃ w ( r(w) ∧ ( v.b = w.a ∨ v.b = w.b ) )

This formula, then, can be used to rewrite any unsafe query expression to an equivalent safe query expression by adding such a formula for every variable v and column name a in its type where it is used in the expression. Effectively this means that we let all variables range over the active domain, which, as was already explained, does not change the semantics if the expressed query is domain independent.

Systems

See also

Related Research Articles

First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantified variables over non-logical objects, and allows the use of sentences that contain variables, so that rather than propositions such as "Socrates is a man", one can have expressions in the form "there exists x such that x is Socrates and x is a man", where "there exists" is a quantifier, while x is a variable. This distinguishes it from propositional logic, which does not use quantifiers or relations; in this sense, propositional logic is the foundation of first-order logic.

Lambda calculus is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation that can be used to simulate any Turing machine. It was introduced by the mathematician Alonzo Church in the 1930s as part of his research into the foundations of mathematics.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using the SQL for querying and maintaining the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Structured Query Language, abbreviated as SQL,( "sequel", S-Q-L; ) is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables.

In database theory, relational algebra is a theory that uses algebraic structures with a well-founded semantics for modeling data, and defining queries on it. The theory was introduced by Edgar F. Codd.

The relational calculus consists of two calculi, the tuple relational calculus and the domain relational calculus, that are part of the relational model for databases and provide a declarative way to specify database queries. The raison d'être of the relational calculus is the formalization of query optimization, which is finding more efficient manners to execute the same query.

In computer science, domain relational calculus (DRC) is a calculus that was introduced by Michel Lacroix and Alain Pirotte as a declarative database query language for the relational data model.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

The object–relational impedance mismatch is a set of conceptual and technical difficulties that are often encountered when a relational database management system (RDBMS) is being served by an application program written in an object-oriented programming language or style, particularly because objects or class definitions must be mapped to database tables defined by a relational schema.

SPARQL is an RDF query language—that is, a semantic query language for databases—able to retrieve and manipulate data stored in Resource Description Framework (RDF) format. It was made a standard by the RDF Data Access Working Group (DAWG) of the World Wide Web Consortium, and is recognized as one of the key technologies of the semantic web. On 15 January 2008, SPARQL 1.0 was acknowledged by W3C as an official recommendation, and SPARQL 1.1 in March, 2013.

The complexity of constraint satisfaction is the application of computational complexity theory on constraint satisfaction. It has mainly been studied for discriminating between tractable and intractable classes of constraint satisfaction problems on finite domains.

Entity–attribute–value model (EAV) is a data model to encode, in a space-efficient manner, entities where the number of attributes that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. Such entities correspond to the mathematical notion of a sparse matrix.

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

In the theory of relational databases, a Boolean conjunctive query is a conjunctive query without distinguished predicates, i.e., a query in the form , where each is a relation symbol and each is a tuple of variables and constants; the number of elements in is equal to the arity of . Such a query evaluates to either true or false depending on whether the relations in the database contain the appropriate tuples of values, i.e. the conjunction is valid according to the facts in the database.

In database theory, a conjunctive query is a restricted form of first-order queries using the logical conjunction operator. Many first-order queries can be written as conjunctive queries. In particular, a large part of queries issued on relational databases can be expressed in this way. Conjunctive queries also have a number of desirable theoretical properties that larger classes of queries do not share.

Language Integrated Query is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages, originally released as a major part of .NET Framework 3.5 in 2007.

Codd's theorem states that relational algebra and the domain-independent relational calculus queries, two well-known foundational query languages for the relational model, are precisely equivalent in expressive power. That is, a database query can be formulated in one language if and only if it can be expressed in the other.

<span class="mw-page-title-main">Relation (database)</span>

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

The following is provided as an overview of and topical guide to databases:

References