Referential integrity

Last updated
An example of a database that has not enforced referential integrity. In this example, there is a foreign key (artist_id) value in the album table that references a non-existent artist -- in other words there is a foreign key value with no corresponding primary key value in the referenced table. What happened here was that there was an artist called "Aerosmith", with an artist_id of 4, which was deleted from the artist table. However, the album "Eat the Rich" referred to this artist. With referential integrity enforced, this would not have been possible. Referential integrity broken.png
An example of a database that has not enforced referential integrity. In this example, there is a foreign key (artist_id) value in the album table that references a non-existent artist — in other words there is a foreign key value with no corresponding primary key value in the referenced table. What happened here was that there was an artist called "Aerosmith", with an artist_id of 4, which was deleted from the artist table. However, the album "Eat the Rich" referred to this artist. With referential integrity enforced, this would not have been possible.

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (either in the same or a different relation), then the referenced value must exist. [1]

Contents

For referential integrity to hold in a relational database, any column in a base table that is declared a foreign key can only contain either null values or values from a parent table's primary key or a candidate key. [2] In other words, when a foreign key value is used it must reference a valid, existing primary key in the parent table. For instance, deleting a record that contains a value referred to by a foreign key in another table would break referential integrity. Some relational database management systems (RDBMS) can enforce referential integrity, normally either by deleting the foreign key rows as well to maintain integrity, or by returning an error and not performing the delete. Which method is used may be determined by a referential integrity constraint defined in a data dictionary.

The adjective 'referential' describes the action that a foreign key performs, 'referring' to a linked column in another table. In simple terms, 'referential integrity' guarantees that the target 'referred' to will be found. A lack of referential integrity in a database can lead relational databases to return incomplete data, usually with no indication of an error.

Formalization

An inclusion dependency over two (possibly identical) predicates and from a schema is written , where the , are distinct attributes (column names) of and . It implies that the tuples of values appearing in columns for facts of must also appear as a tuple of values in columns for some fact of .

Such constraint is a particular form of tuple-generating dependency (TGD) where in both the sides of the rule there is only one relational atom. [3] In first-order logic it is expressible as , where is the vector (whose size is ) of variables shared by and , and no variable appears multiple times neither in the TGD's body nor in its head.

Logical implication between inclusion dependencies can be axiomatized by inference rules [4] :193 and can be decided by a PSPACE algorithm. The problem can be shown to be PSPACE-complete by reduction from the acceptance problem for a linear bounded automaton. [4] :196 However, logical implication between dependencies that can be inclusion dependencies or functional dependencies is undecidable by reduction from the word problem for monoids. [4] :199

Declarative referential integrity

Declarative Referential Integrity (DRI) is one of the techniques in the SQL database programming language to ensure data integrity.

Meaning in SQL

A table (called the referencing table) can refer to a column (or a group of columns) in another table (the referenced table) by using a foreign key. The referenced column(s) in the referenced table must be under a unique constraint, such as a primary key. Also, self-references are possible (not fully implemented in MS SQL Server though [5] ). On inserting a new row into the referencing table, the relational database management system (RDBMS) checks if the entered key value exists in the referenced table. If not, no insert is possible. It is also possible to specify DRI actions on UPDATE and DELETE, such as CASCADE (forwards a change/delete in the referenced table to the referencing tables), NO ACTION (if the specific row is referenced, changing the key is not allowed) or SET NULL / SET DEFAULT (a changed/deleted key in the referenced table results in setting the referencing values to NULL or to the DEFAULT value if one is specified). [6]

Product-specific meaning

In Microsoft SQL Server the term DRI also applies to the assigning of permissions to users on a database object. Giving DRI permission to a database user allows them to add foreign key constraints on a table. [7]

See also

Related Research Articles

Database normalization or database normalisation is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation R and sets of attributes , X is said to functionally determineY if and only if each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

In the relational model of databases, a primary key is a specific choice of a minimal set of attributes (columns) that uniquely specify a tuple (row) in a relation (table). Informally, a primary key is "which attributes identify a record," and in simple cases constitute a single attribute: a unique ID. More formally, a primary key is a choice of candidate key ; any other candidate key is an alternate key.

A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to a certain kind of inclusion dependency constraints, specifically a constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S, and furthermore that those attributes must also be a candidate key in S. In simpler words, a foreign key is a set of attributes that references a candidate key. For example, a table called TEAM may have an attribute, MEMBER_NAME, which is a foreign key referencing a candidate key, PERSON_NAME, in the PERSON table. Since MEMBER_NAME is a foreign key, any value existing as the name of a member in TEAM must also exist as a person's name in the PERSON table; in other words, every member of a TEAM is also a PERSON.

A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values. Because a candidate key is a superkey that doesn't contain a smaller one, a relation can have multiple candidate keys, each with a different number of attributes.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

A surrogate key in a database is a unique identifier for either an entity in the modeled world or an object in the database. The surrogate key is not derived from application data, unlike a natural key.

<span class="mw-page-title-main">Join (SQL)</span> SQL clause

A join clause in the Structured Query Language (SQL) combines columns from one or more tables into a new table. The operation corresponds to a join operation in relational algebra. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER and CROSS.

In database systems, a propagation constraint "details what should happen to a related table when we update a row or rows of a target table". Tables are linked using primary key to foreign key relationships. It is possible for users to update one table in a relationship in such a way that the relationship is no longer consistent and this is known as breaking referential integrity. An example of breaking referential integrity: if a table of employees includes a department number for 'Housewares' which is a foreign key to a table of departments and a user deletes that department from the department table then Housewares employees records would refer to a non-existent department number.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

In relational databases, the log trigger or history trigger is a mechanism for automatic recording of information about changes inserting or/and updating or/and deleting rows in a database table.

In relational database theory, a tuple-generating dependency (TGD) is a certain kind of constraint on a relational database. It is a subclass of the class of embedded dependencies (EDs).

In relational database theory, an embedded dependency (ED) is a certain kind of constraint on a relational database. It is the most general type of constraint used in practice, including both tuple-generating dependencies and equality-generating dependencies. Embedded dependencies can express functional dependencies, join dependencies, multivalued dependencies, inclusion dependencies, foreign key dependencies, and many more besides.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. Chapple, Mike. "Referential Integrity". About.com. Retrieved 2011-03-20. Definition: Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table.
  2. Coronel et al. (2013). Database Systems 10th ed. Cengage Learning, ISBN   978-1-111-96960-8
  3. Kolaitis, Phokion G. "A Tutorial on Database Dependencies" (PDF). University of California Santa Cruz & IBM Research - Almaden. Retrieved 2021-12-10.
  4. 1 2 3 Abiteboul, Serge; Hull, Richard B.; Vianu, Victor (1994). "9. Inclusion Dependency". Foundations of Databases. Addison-Wesley. pp. 192–199.
  5. Microsoft Support (2007-02-11). "Error message 1785 occurs when you create a FOREIGN KEY constraint that may cause multiple cascade paths". microsoft.com. Retrieved 2009-01-24.
  6. ANSI/ISO/IEC 9075-1:2003, Information technology—Database languages—SQL
    • Part 1: Framework (SQL/Framework)
    • Part 2: Foundation (SQL/Foundation)
  7. Chigrik, Alexander (2003-08-13). "Managing Users Permissions on SQL Server". Database Journal. Retrieved 2006-12-17.