Second normal form

Last updated

Second normal form (2NF) is a normal form used in database normalization. A relation is in the second normal form if it fulfills the following two requirements:

Contents

  1. It is in first normal form.
  2. It does not have any non-prime attribute that is functionally dependent on any proper subset of any candidate key of the relation (i.e. it lacks partial dependencies). A non-prime attribute of a relation is an attribute that is not a part of any candidate key of the relation.

Put simply, a relation (or table) is in 2NF if:

  1. It is in 1NF and has a single attribute unique identifier (UID)(in which case every non key attribute is dependent on the entire UID), or
  2. It is in 1NF and has a multi-attribute unique identifier, and every regular attribute (not part of the UID) is dependent on all attributes in the multi-attribute UID, not just one attribute (or part) of the UID.

If any regular (non-prime) attributes are predictable (dependent) on another (non-prime) attribute, that is addressed in third normal form.

History

The second normal form was originally defined by E. F. Codd in 1971. [1]

2NF and candidate keys

A functional dependency on a proper subset of any candidate key (UID) is a violation of 2NF. In addition to the primary key, the relation may contain other candidate keys; it is necessary to establish that no non-prime attributes (regular/non-UID attributes) have part-key dependencies (they're on any of these candidate keys/UIDs). Or put simply, if any regular attributes (non-UID attributes) are predictable from one attribute of the UID (or part of the UID), then it is not in 2NF.

Decomposition of 1NF into 2NF

To make a 1NF relation a 2NF relation, remove the functionally dependent attributes in the partial dependencies of the first normal form relation, and place those partial dependency dependent attributes in a relation where their corresponding determinant attributes are an entire candidate key.

Example

The following relation does not satisfy 2NF because:

In other words, since {Manufacturer country} is a non-prime attribute functionally dependent on a proper subset of a candidate key, the relation is in violation of 2NF.

Electric toothbrush models
ManufacturerModelManufacturer country
ForteX-PrimeItaly
ForteUltracleanItaly
Dent-o-FreshEZbrushUSA
BrushmasterSuperBrushUSA
KobayashiST-60Japan
HochToothmasterGermany
HochX-PrimeGermany

To make the design conform to 2NF, it is necessary to have two relations. To create these relations:

  1. Remove the functionally dependent attributes in the partial dependencies of the first normal form relation. In this example, {Manufacturer country} is the functionally dependent attribute which will be removed.
  2. Place those partial dependency-dependent attributes (i.e. {Manufacturer country}) in a relation where their corresponding determinant attributes are a candidate key (i.e. {Manufacturer}).

As seen below, {Manufacturer country} is removed from the original table:

Electric toothbrush models
ManufacturerModel
ForteX-Prime
ForteUltraclean
Dent-o-FreshEZbrush
BrushmasterSuperBrush
KobayashiST-60
HochToothmaster
HochX-Prime

As seen below, the partial dependency is put into a new relation where the dependency can exist without being a partial dependency:

Electric toothbrush manufacturers
ManufacturerManufacturer country
ForteItaly
Dent-o-FreshUSA
BrushmasterUSA
KobayashiJapan
HochGermany

See also

Related Research Articles

Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. It is often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations. Denormalization differs from the unnormalized form in that denormalization benefits can only be fully realized on a data model that is otherwise normalized.

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

In mathematics, a finitary relation over a sequence of sets X1, ..., Xn is a subset of the Cartesian product X1 × ... × Xn; that is, it is a set of n-tuples (x1, ..., xn), each being a sequence of elements xi in the corresponding Xi. Typically, the relation describes a possible connection between the elements of an n-tuple. For example, the relation "x is divisible by y and z" consists of the set of 3-tuples such that when substituted to x, y and z, respectively, make the sentence true.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977, 4NF is the next level of normalization after Boyce–Codd normal form (BCNF). Whereas the second, third, and Boyce–Codd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type of dependency known as a multivalued dependency. A table is in 4NF if and only if, for every one of its non-trivial multivalued dependencies XY, X is a superkey—that is, X is either a candidate key or a superset thereof.

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation R and sets of attributes , X is said to functionally determineY if and only if each X value in R is associated with precisely one Y value in R; R is then said to satisfy the functional dependency XY. Equivalently, the projection is a function, i.e. Y is a function of X. In simple words, if the values for the X attributes are known, then the values for the Y attributes corresponding to x can be determined by looking them up in any tuple of R containing x. Customarily X is called the determinant set and Y the dependent set. A functional dependency FD: XY is called trivial if Y is a subset of X.

A candidate key, or simply a key, of a relational database is any set of columns that have a unique combination of values in each row, with the additional constraint that removing any column could produce duplicate combinations of values.

Database design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate. With this information, they can begin to fit the data to the database model. A database management system manages the data accordingly.

In the relational data model a superkey is a set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as originally defined.

Fifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.

Sixth normal form (6NF) is a term in relational database theory, used in two different ways.

A transitive dependency is an indirect dependency relationship between software components. This kind of dependency is held by virtue of a transitive relation from a component that the software depends on directly.

<span class="mw-page-title-main">Raymond F. Boyce</span> American computer scientist (1946–1974)

Raymond F. Boyce (1946–1974) was an American computer scientist who was known for his research in relational databases. He is best known for his work co-developing the SQL database language and Boyce-Codd normal form.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Elementary key normal form (EKNF) is a subtle enhancement on third normal form, thus EKNF tables are in 3NF by definition. This happens when there is more than one unique compound key and they overlap. Such cases can cause redundant information in the overlapping column(s).

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

References

  1. Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems", New York City, May 24–25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.

Further reading