Second normal form

Last updated

Second normal form (2NF), in database normalization, is a normal form. A relation is in the second normal form if it fulfills the following two requirements:

Contents

  1. It is in first normal form.
  2. It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation (i.e. it lacks partial dependencies). A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

Put simply, a relation (or table) is in 2NF if:

  1. It is in 1NF and has a single attribute unique identifier (UID) (in which case every non key attribute is dependent on the entire UID), or
  2. It is in 1NF and has a multi-attribute unique identifier, and every regular attribute (not part of the UID) is dependent on all attributes in the multi-attribute UID, not just one attribute (or part) of the UID.

If any regular (non-prime) attributes are predictable (dependent) on another (non-prime) attribute, that is addressed in third normal form.

History

The second normal form was originally defined by E. F. Codd in 1971. [1]

Decomposition of 1NF into 2NF

To make a 1NF relation a 2NF relation, remove the functionally dependent attributes in the partial dependencies of the first normal form relation, and place those partial dependency dependent attributes in a relation where their corresponding determinant attributes are an entire candidate key.

Example

The following relation does not satisfy 2NF because:


In other words, since {Manufacturer country} is a non-prime attribute functionally dependent on a proper subset of a candidate key, the relation is in violation of 2NF.

Electric toothbrush models
ManufacturerModelManufacturer country
ForteX-PrimeItaly
ForteUltracleanItaly
Dent-o-FreshEZbrushUSA
BrushmasterSuperBrushUSA
KobayashiST-60Japan
HochToothmasterGermany
HochX-PrimeGermany

To make the design conform to 2NF, it is necessary to have two relations. To create these relations:

  1. Remove the functionally dependent attributes in the partial dependencies of the first normal form relation. In this example, {Manufacturer country} is the functionally dependent attribute which will be removed.
  2. Place those partial dependency-dependent attributes (i.e. {Manufacturer country}) in a relation where their corresponding determinant attributes are a candidate key (i.e. {Manufacturer}).

As seen below, {Manufacturer country} is removed from the original table:

Electric toothbrush models
ManufacturerModel
ForteX-Prime
ForteUltraclean
Dent-o-FreshEZbrush
BrushmasterSuperBrush
KobayashiST-60
HochToothmaster
HochX-Prime

As seen below, the partial dependency is put into a new relation where the dependency can exist without being a partial dependency:

Electric toothbrush manufacturers
ManufacturerManufacturer country
ForteItaly
Dent-o-FreshUSA
BrushmasterUSA
KobayashiJapan
HochGermany

It may now be desirable to add a numeric identifier to each row in the "Electric toothbrush models" table to ensure it remains in 1NF, or, the table may be left as is, allowing each unique combination of values in each row to be usable as a primary key, thus allowing the columns {Manufacturer, Model} in this table to be used as the candidate key of the table.

See also

Related Research Articles

Database normalization is the process of structuring a relational database accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, {X, Y} is a superkey—that is, the combination of all attributes in X and Y is either a candidate key or a superset thereof.

In relational database theory, a functional dependency is the following constraint between two attribute sets in a relation: Given a relation R and attribute sets , X is said to functionally determineY if each X value is associated with precisely one Y value. R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, that is, Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combination of attributes from the table to act as its primary key, or create a new attribute containing a unique ID that exists solely for this purpose.

A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.

Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. A database management system manages the data accordingly.

In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies.

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.

Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators to support interval data, which can be useful in temporal databases.

<span class="mw-page-title-main">Transitive dependency</span>

A transitive dependency is an indirect dependency relationship between software components. This kind of dependency is held by virtue of a transitive relation from a component that the software depends on directly.

Domain-key normal form is a normal form used in database normalization which requires that the database contains no constraints other than domain constraints and key constraints.

Relational Model/Tasmania (RM/T) was published by Edgar F. Codd in 1979 and is the name given to a number of extensions to his original relational model (RM) published in 1970. The overall goal of the RM/T was to define some fundamental semantic units, at "atomic" and "molecular" levels, for data modelling. Codd writes: "the result is a model with a richer variety of objects than the original relational model, additional insert-update-delete rules and some additional operators that make the algebra more powerful."

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Elementary key normal form (EKNF) is a subtle enhancement on third normal form, thus EKNF tables are in 3NF by definition. This happens when there is more than one unique compound key and they overlap. Such cases can cause redundant information in the overlapping column(s).

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

References

  1. Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems", New York City, May 24–25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.

Further reading