Superkey

Last updated

In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation. [1] [2] Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.

Contents

The set of all attributes is always a superkey (the trivial superkey). Tuples in a relation are by definition unique, with duplicates removed after each operation, so the set of all attributes is always uniquely valued for every tuple. A candidate key (or minimal superkey) is a superkey that can't be reduced to a simpler superkey by removing an attribute. [3]

For example, in an employee schema with attributes employeeID, name, job, and departmentID, if employeeID values are unique then employeeID combined with any or all of the other attributes can uniquely identify tuples in the table. Each combination, {employeeID}, {employeeID, name}, {employeeID, name, job}, and so on is a superkey. {employeeID} is a candidate key, since no subset of its attributes is also a superkey. {employeeID, name, job, departmentID} is the trivial superkey.

If attribute set K is a superkey of relation R, then at all times it is the case that the projection of R over K has the same cardinality as R itself.

Example

English Monarchs
Monarch NameMonarch NumberRoyal House
EdwardIIPlantagenet
EdwardIIIPlantagenet
RichardIIIPlantagenet
HenryIVLancaster

First, list out all the sets of attributes:

• {}
• {Monarch Name}  
• {Monarch Number}  
• {Royal House}
• {Monarch Name, Monarch Number}
• {Monarch Name, Royal House}
• {Monarch Number, Royal House}
• {Monarch Name, Monarch Number, Royal House}

Second, eliminate all the sets which do not meet superkey's requirement. For example, {Monarch Name, Royal House} cannot be a superkey because for the same attribute values (Edward, Plantagenet), there are two distinct tuples:

Finally, after elimination, the remaining sets of attributes are the only possible superkeys in this example:

In reality, superkeys cannot be determined simply by examining one set of tuples in a relation. A superkey defines a functional dependency constraint of a relation schema which must hold for all possible instance relations of that relation schema.

See also

Related Research Articles

Database normalization is the process of structuring a relational database accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.

Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every relational-database-management system. Michel Lacroix and Alain Pirotte proposed domain calculus, which is closer to first-order logic and together with Codd showed that both of these calculi are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, {X, Y} is a superkey—that is, the combination of all attributes in X and Y is either a candidate key or a superset thereof.

In relational database theory, a functional dependency is the following constraint between two attribute sets in a relation: Given a relation R and attribute sets , X is said to functionally determineY if each X value is associated with precisely one Y value. R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, that is, Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combination of attributes from the table to act as its primary key, or create a new attribute containing a unique ID that exists solely for this purpose.

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S; furthermore that those attributes must also be a candidate key in S.

A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.

A natural key is a type of unique key in a database formed of attributes that exist and are used in the external world outside the database. In the relational model of data, a natural key is a superkey and is therefore a functional determinant for all attributes in a relation.

In a database, a table is a collection of related data organized in table format; consisting of columns and rows.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies.

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.

In database theory, a multivalued dependency is a full constraint between two sets of attributes in a relation.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

<span class="mw-page-title-main">Relation (database)</span> Set of tuples consisting of values indexed by attributes

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

The following is provided as an overview of and topical guide to databases:

References

  1. Date, Christopher (2015). "Codd's First Relational Papers: A Critical Analysis" (PDF). warwick.ac.uk. Retrieved 2020-01-04. Note that the extract allows a "relation" to have any number of primary keys, and moreover that such keys are allowed to be "redundant" (better: reducible). In other words, what the paper calls a primary key is what later (and better) became known as a superkey, and what the paper calls a nonredundant (better: irreducible) primary key is what later became known as a candidate key or (better) just a "key".
  2. Introduction to Database Management Systems. Tata McGraw-Hill. 2005. p. 77. ISBN   9780070591196. no two tuples in any legal relation
  3. Saiedian, H. (1996-02-01). "An Efficient Algorithm to Compute the Candidate Keys of a Relational Database Schema". The Computer Journal. 39 (2): 124–132. doi:10.1093/comjnl/39.2.124. ISSN   0010-4620.

Further reading