Sixth normal form

Last updated

Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators (such as join) to support interval data, which can be useful in temporal databases.

Contents

The term 6NF has historically also been used to refer to another normalization degree, which today is more commonly known as domain-key normal form (DKNF) (see Other meanings).

Definition

Christopher J. Date and others have defined sixth normal form as a normal form, based on an extension of the relational algebra. [1] [2] [3]

Relational operators, such as join, are generalized to support a natural treatment of interval data, such as sequences of dates or moments in time, for instance in temporal databases. [4] [2] [3] Sixth normal form is then based on this generalized join, as follows:

A relvar R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join dependencies at all — where, as before, a join dependency is trivial if and only if at least one of the projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table] concerned. [5]

Date et al. have also given the following definition:

Relvar R is in sixth normal form (6NF) if and only if every JD [Join Dependency] of R is trivial — where a JD is trivial if and only if one of its components is equal to the pertinent heading in its entirety. [6]

Any relation in 6NF is also in 5NF.

Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to add temporal data, such as the time during which these values are, or were, valid (e.g., for historical data) but the three values may vary independently of each other and at different rates. We may, for instance, wish to trace the history of changes to Status; a review of production costs may reveal that a change was caused by a supplier changing city and hence what they charged for delivery.

For further discussion on Temporal Aggregation in SQL, see also Zimanyi. [7] For a different approach, see TSQL2. [8]

Usage

The sixth normal form is currently as of 2009 being used in some data warehouses where the benefits outweigh the drawbacks, [9] for example using anchor modeling. Although using 6NF leads to an explosion of tables, modern databases can prune the tables from select queries (using a process called 'table elimination' - so that a query can be solved without even reading some of the tables that the query refers to [10] ) where they are not required and thus speed up queries that only access several attributes.

Examples

In order for a table to be in sixth normal form, it has to be in fifth normal form first and then it requires that each table satisfies only trivial join dependencies. Let's take a simple example [11] with a table already in 5NF: Here, in the users table, every attribute is non null and the primary key is the username:

Users_table
UsernameDepartmentStatus

This table is in 5NF because each join dependency is implied by the unique candidate key of the table (Username). More specifically, the only possible join dependencies are: {username, status}, {username, department}.

The 6NF version would look like this:

Users
UsernameStatus
Users_dept
UsernameDepartment

So, from one table in 5NF, 6NF produces two tables.

Following is another example:

TABLE 1
Medic IDMedic NameOccupationTypePractice in years
1Smith JamesOrthopedicSpecialist23
2Miller MichaelOrthopedicProbationer4
3Thomas LindaNeurologistProbationer5
4Scott NancyOrthopedicResident1
5Allen BrianNeurologistSpecialist12
6Turner StevenOphthalmologistProbationer3
7Collins KevinOphthalmologistSpecialist7
8King DonaldNeurologistResident1
9Harris SarahOphthalmologistResident2

The join dependencies of the table are {medic name, occupation}, {medic name, practice in years} and {medic name, type}. Hence we could see that such table is 2NF (due to the appearance of transitive dependency). The following tables try to bring it to 6NF:

TABLE 2.1
Medic IDMedic Name
1Smith James
2Miller Michael
3Thomas Linda
4Scott Nancy
5Allen Brian
6Turner Steven
7Collins Kevin
8King Donald
9Harris Sarah
TABLE 2.2
Medic IDOccupation
1Orthopedic
2Orthopedic
3Neurologist
4Orthopedic
5Neurologist
6Ophthalmologist
7Ophthalmologist
8Neurologist
9Ophthalmologist
TABLE 2.3
Medic IDPractice in years
123
24
35
41
512
63
77
81
92
TABLE 2.4
Medic IDType
1Specialist
2Probationer
3Probationer
4Resident
5Specialist
6Probationer
7Specialist
8Resident
9Resident

Other meanings

Sixth normal form (6NF) is sometimes instead used as a synonym for domain-key normal form (DKNF). This usage predates Date et al.'s work. [12]

See also

Related Research Articles

Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. It is often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations. Denormalization differs from the unnormalized form in that denormalization benefits can only be fully realized on a data model that is otherwise normalized.

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

<span class="mw-page-title-main">Christopher J. Date</span> British database researcher

Chris John Date is an independent author, lecturer, researcher, and consultant, specializing in relational database theory.

<span class="mw-page-title-main">Hugh Darwen</span> English academic and writer about computers

Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967 to 2004, and has been involved in the development of the relational model.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

Second normal form (2NF) is a normal form used in database normalization. A relation is in the second normal form if it fulfills the following two requirements:

  1. It is in first normal form.
  2. It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation. A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, X is a superkey—that is, X is either a candidate key or a superset thereof.

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.

In relational databases, relvar is a term introduced by C. J. Date and Hugh Darwen as an abbreviation for relation variable in their 1995 paper The Third Manifesto, to avoid the confusion sometimes arising from the use of the term "relation", by the inventor of the relational model, E. F. Codd, for a variable to which a relation is assigned as well as for the relation itself. The term is used in Date's well-known database textbook An Introduction to Database Systems and in various other books authored or coauthored by him.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies.

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.

In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary using an aggregate function.

Nikos Lorentzos is a Greek professor of Informatics. He is a specialist on the Relational Model of Database Management, having made contributions in the field of temporal databases, where he has co-authored a book with Hugh Darwen and Christopher J Date.

The nested set model is a technique for representing nested set collections in relational databases.

Richard Thomas Snodgrass is an American computer scientist and writer and is professor emeritus at the University of Arizona. He is best known for his work on temporal databases, query language design, query optimization and evaluation, storage structures, database design, and ergalics.

The following is provided as an overview of and topical guide to databases:

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

References

  1. Date, Darwen & Lorentzos 2003.
  2. 1 2 Date, Darwen & Lorentzos 2014.
  3. 1 2 Harrington 2009, pp. 125–126.
  4. Date, Darwen & Lorentzos 2003, pp. 141–160.
  5. Date, Darwen & Lorentzos 2003, p. 176.
  6. Date, Darwen & Lorentzos 2014, p. 213.
  7. Zimanyi 2006.
  8. Snodgrass.
  9. See the Anchor Modeling website for a website that describes a data warehouse modelling method based on the sixth normal form
  10. What is Table Elimination? - MariaDB Knowledge Base
  11. Example provided by: http://www.anattatechnologies.com/q/2011/07/normalization-6nf/
  12. dbdebunk.

Bibliography

Further reading