In database theory, a relation, as originally defined by E. F. Codd, [1] is a set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. [2] [3] Instead, each element is termed an attribute value. An attribute is a name paired with a domain (nowadays more commonly referred to as a type or data type ). An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.
A set of attributes in which no two distinct elements have the same name is called a heading. It follows from the above definitions that to every tuple there corresponds a unique heading, being the set of names from the tuple, paired with the domains from which the tuple elements' domains are taken. A set of tuples that all correspond to the same heading is called a body. A relation is thus a heading paired with a body, the heading of the relation being also the heading of each tuple in its body. The number of attributes constituting a heading is called the degree, which term also applies to tuples and relations. The term n-tuple refers to a tuple of degree n (n ≥ 0).
E. F. Codd used the term "relation" in its mathematical sense of a finitary relation, a set of tuples on some set of n sets S1,S2,....,Sn. [4] Thus, an n-ary relation is interpreted, under the Closed-World Assumption, as the extension of some n-adic predicate: all and only those n-tuples whose values, substituted for corresponding free variables in the predicate, yield propositions that hold true, appear in the relation.
A heading paired with a set of constraints defined in terms of that heading is called a relation schema. A relation can thus be seen as an instantiation of a relation schema if it has the heading of that schema and it satisfies the applicable constraints.
Sometimes a relation schema is taken to include a name. [5] [6] A relational database definition (database schema, sometimes referred to as a relational schema) can thus be thought of as a collection of named relation schemas. [7] [8]
In implementations, the domain of each attribute is effectively a data type [9] and a named relation schema is effectively a relation variable (relvar for short).
In SQL, a database language for relational databases, relations are represented by tables, where each row of a table represents a single tuple, and where the values of each attribute form a column.
Below is an example of a relation having three named attributes: 'ID' from the domain of integers, and 'Name' and 'Address' from the domain of strings:
ID (Integer) | Name (String) | Address (String) |
---|---|---|
102 | Yonezawa Akinori | Naha, Okinawa |
202 | Nilay Patel | Sendai, Miyagi |
104 | Murata Makoto | Kumamoto, Kumamoto |
152 | Matsumoto Yukihiro | Okinawa, Okinawa |
A predicate for this relation, using the attribute names to denote free variables, might be "Employee number ID is known as Name and lives at Address". Examination of the relation tells us that there are just four tuples for which the predicate holds true. So, for example, employee 102 is known only by that name, Yonezawa Akinori, and does not live anywhere else but in Naha, Okinawa. Also, apart from the four employees shown, there is no other employee who has both a name and an address.
Under the definition of body, the tuples of a body do not appear in any particular order - one cannot say "The tuple of 'Murata Makoto' is above the tuple of 'Matsumoto Yukihiro'", nor can one say "The tuple of 'Yonezawa Akinori' is the first tuple." A similar comment applies to the rows of an SQL table.
Under the definition of heading, the attributes of an element do not appear in any particular order either, nor, therefore do the elements of a tuple. A similar comment does not apply here to SQL, which does define an ordering to the columns of a table.
A relational database consists of named relation variables (relvars) for the purposes of updating the database in response to changes in the real world. An update to a single relvar causes the body of the relation assigned to that variable to be replaced by a different set of tuples. Relvars are classified into two classes: base relation variables and derived relation variables, the latter also known as virtual relvars but usually referred to by the short term view.
A base relation variable is a relation variable which is not derived from any other relation variables. In SQL the term base table equates approximately to base relation variable.
A view can be defined by an expression using the operators of the relational algebra or the relational calculus. Such an expression operates on one or more relations and when evaluated yields another relation. The result is sometimes referred to as a "derived" relation when the operands are relations assigned to database variables. A view is defined by giving a name to such an expression, such that the name can subsequently be used as a variable name. (Note that the expression must then mention at least one base relation variable.)
By using a Data Definition Language (DDL), it is able to define base relation variables. In SQL, CREATE TABLE
syntax is used to define base tables. The following is an example.
CREATETABLEList_of_people(IDINTEGER,NameCHAR(40),AddressCHAR(200),PRIMARYKEY(ID))
The Data Definition Language (DDL) is also used to define derived relation variables. In SQL, CREATE VIEW
syntax is used to define a derived relation variable. The following is an example.
CREATEVIEWList_of_Okinawa_peopleAS(SELECTID,Name,AddressFROMList_of_peopleWHEREAddressLIKE'%, Okinawa')
Database normalization is the process of structuring a relational database accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.
In mathematics, a finitary relation over a sequence of sets X1, ..., Xn is a subset of the Cartesian product X1 × ... × Xn; that is, it is a set of n-tuples (x1, ..., xn), each being a sequence of elements xi in the corresponding Xi. Typically, the relation describes a possible connection between the elements of an n-tuple. For example, the relation "x is divisible by y and z" consists of the set of 3-tuples such that when substituted to x, y and z, respectively, make the sentence true.
A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.
The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.
Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.
In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.
Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every relational-database-management system. Michel Lacroix and Alain Pirotte proposed domain calculus, which is closer to first-order logic and together with Codd showed that both of these calculi are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.
First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.
A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.
Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.
In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.
In relational databases, relvar is a term introduced by C. J. Date and Hugh Darwen as an abbreviation for relation variable in their 1995 paper The Third Manifesto, to avoid the confusion sometimes arising from the use of the term "relation", by the inventor of the relational model, E. F. Codd, for a variable to which a relation is assigned as well as for the relation itself. The term is used in Date's well-known database textbook An Introduction to Database Systems and in various other books authored or coauthored by him.
In a database, a table is a collection of related data organized in table format; consisting of columns and rows.
Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-oriented (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.
In relational algebra, a projection is a unary operation written as , where is a relation and are attribute names. Its result is defined as the set obtained when the components of the tuples in are restricted to the set – it discards the other attributes.
Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators to support interval data, which can be useful in temporal databases.
An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.
A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.
QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.
The following is provided as an overview of and topical guide to databases:
R is a relation on these n domains if it is a set of elements of the form (d1, d2, ..., dn) where dj ∈ Dj for each j=1,2,...,n.
... tuples have no left-to-right ordering to their attributes ...
One reason for abandoning positional concepts altogether in the relations of the relational model is that it is not at all unusual to find database relations, each of which has as many as 50, 100, or even 150 columns.
The term relation is used here in its accepted mathematical sense