Domain-key normal form

Last updated March 28, 2024

Domain-key normal form (DK/NF or DKNF) is a normal form used in database normalization which requires that the database contains no constraints other than domain constraints and key constraints.

The reason to use domain/key normal form is to avoid having general constraints in the database that are not clear domain or key constraints. Most databases can easily test domain and key constraints on attributes. General constraints however would normally require special database programming in the form of stored procedures (often of the trigger variety) that are expensive to maintain and expensive for the database to execute. Therefore, general constraints are split into domain and key constraints.

It's much easier to build a new database in domain/key normal form than it is to convert from databases on lesser normal forms which may contain numerous anomalies. However, successfully building a domain/key normal form database remains a difficult task, even for experienced database programmers. Thus, while the domain/key normal form eliminates the problems found in most databases, it tends to be the most costly normal form to achieve. However, failing to achieve the domain/key normal form may carry long-term, hidden costs due to anomalies which appear in databases adhering only to lower normal forms over time.

The third normal form, Boyce–Codd normal form, fourth normal form and fifth normal form are special cases of the domain/key normal form. All have either functional, multi-valued or join dependencies that can be converted into superkeys. The domains on those normal forms were unconstrained so all domain constraints are satisfied. However, transforming a higher normal form into domain/key normal form is not always a dependency-preserving transformation and therefore not always possible.

Example

A violation of DKNF occurs in the following table:

Wealthy Person
Wealthy Person	Wealthy Person Type	Net Worth in Dollars
Steve	Millionaire	124,543,621
Roderick	Billionaire	6,553,228,893
Katrina	Billionaire	8,829,462,998
Gary	Millionaire	495,565,211

(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of wealthy people; the domain for Wealthy Person Type consists of the values 'Millionaire' and 'Billionaire'; and the domain for Net Worth in Dollars consists of all integers greater than or equal to 1,000,000.)

There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from the other. The constraint dictates that a Millionaire will have a net worth of 1,000,000 to 999,999,999 inclusive, whilst a Billionaire will have a net worth of 1,000,000,000 or higher. This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain constraints and key constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars combination does not make its way into the database.

The DKNF violation could be eliminated by removing the Wealthy Person Type column. The wealthy person's status as a millionaire or billionaire is determined by their Net Worth in Dollars, as defined in the Wealthiness Status table, so no useful information is lost.

Wealthy Person
Wealthy Person	Net Worth in Dollars
Steve	124,543,621
Roderick	6,553,228,893
Katrina	8,829,462,998
Gary	495,565,211

Wealthiness Status
Status	Minimum	Maximum
Millionaire	1,000,000	999,999,999
Billionaire	1,000,000,000	999,999,999,999

Foreign keys

Relationships that are impossible to express as foreign keys are obvious violations of DKNF. For example, a "Parent ID" attribute that points to one of several referenced tables, depending on a second "Parent Type" attribute, violates DKNF.

Related Research Articles

Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. It is often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations. Denormalization differs from the unnormalized form in that denormalization benefits can only be fully realized on a data model that is otherwise normalized.

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

Second normal form (2NF) is a normal form used in database normalization. A relation is in the second normal form if it fulfills the following two requirements:

It is in first normal form.
It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies X $Y, X is a superkey—that is, X is either a candidate key or a superset thereof.$

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation R and sets of attributes $, X is said to functionally determine Y if and only if each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency X \to Y . Equivalently, the projection is a function, i.e. Y is a function of X . In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x . Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: X \to Y is called trivial if Y is a subset of X .$

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to a inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S; furthermore that those attributes must also be a candidate key in S.

A millionaire is an individual whose net worth or wealth is equal to or exceeds one million units of currency. Depending on the currency, a certain level of prestige is associated with being a millionaire. Many national currencies have, or have had at various times, a low unit value, in many cases due to past inflation. It is obviously much easier and less significant to be a millionaire in those currencies, thus a millionaire in Hong Kong or Taiwan, for example, may be merely averagely wealthy, or perhaps less wealthy than average. A millionaire in Zimbabwe in 2007 could have been extremely poor. Because of this, the term 'millionaire' generally refers to those whose assets total at least one million units of a high-value currency, such as the United States dollar, euro, or pound sterling.

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

A natural key is a type of unique key in a database formed of attributes that exist and are used in the external world outside the database. In the relational model of data, a natural key is a superkey and is therefore a functional determinant for all attributes in a relation.

Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. A database management system manages the data accordingly.

Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-orientated (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies.

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.

Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators to support interval data, which can be useful in temporal databases.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF²), is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

References

Fagin, Ronald (1981). "A Normal Form for Relational Databases That Is Based on Domains and Keys" (PDF). ACM Transactions on Database Systems. 6 (3): 387–415. CiteSeerX 10.1.1.73.373 . doi:10.1145/319587.319592. S2CID 14664427.

External links

Database Normalization Basics Archived 2007-02-05 at the Wayback Machine by Mike Chapple (About.com)
An Introduction to Database Normalization by Mike Hillyer.
Normalization by ITS, University of Texas.
A tutorial on the first 3 normal forms by Fred Coulson
Description of the database normalization basics by Microsoft

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.