Unnormalized form

Last updated

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), [1] is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

Contents

"Unnormalized form" should not be confused with denormalization, where normalization is deliberately compromised for selected tables in a relational database.

History

In 1970, E. F. Codd proposed the relational data model, now[ when? ] widely accepted as the standard data model. [2] At that time, office automation was the major use of data storage systems, which resulted in the proposal of many UNF/NF2 data models like the Schek model, Jaeschke models (non-recursive and recursive algebra), and the nested table data model (NTD). [1] IBM organized the first international workshop exclusively on this topic in 1987 which was held in Darmstadt, Germany. [1] Moreover, a lot of research has been done and journals have been published to address the shortcomings of the relational model. Since the turn of the millennium, NoSQL databases have become popular owing to the demands of Web 2.0.

Relational form

Normalization to first normal form requires the initial data to be viewed as relations. [3] In database systems relations are represented as tables. The relation view implies some constraints on the tables:

This definition does not preclude columns having sets or relations as values, e.g. nested tables. This is the major difference to first normal form.

NoSQL databases like document databases typically does not conform to the relational view. For example, an JSON or XML database might support duplicate records and intrinsic ordering. Such database can be described as non-relational. But there are also database models which support the relational view, but does not embrace first normal form. [4] Such models are called non-first normal form relations (abbreviated NFR, N1NF or NF2).

Example with a table valued column

CustomerCust_IDTransactions
Abraham1
Tr. IDDateAmount
128902003-10-1487
129042003-10-1550
Isaac2
Tr_IDDateAmount
128982003-10-1421
Jacob3
Tr_IDDateAmount
129072003-10-1518
149202003-11-2070
150032003-11-2760

This table represent a relation where one of the columns (Transactions) is itself relation-valued. This is a valid relation but does not conform to first normal form which does not allow nested relations. The table is therefore unnormalized.

Modern applications

As of 2016, companies like Google, Amazon and Facebook deal with large amounts of data that are difficult to store efficiently. They use NoSQL databases, which are based on the principles of the unnormalized relational model, to deal with the storage issue. [5] Some examples of NoSQL databases are MongoDB, Apache Cassandra and Redis. These databases are more scalable and easier to query with as they do not involve expensive operations like JOIN.[ citation needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

Denormalization is a strategy used on a previously-normalized database to increase performance. In computing, denormalization is the process of trying to improve the read performance of a database, at the expense of losing some write performance, by adding redundant copies of data or by grouping data. It is often motivated by performance or scalability in relational database software needing to carry out very large numbers of read operations. Denormalization differs from the unnormalized form in that denormalization benefits can only be fully realized on a data model that is otherwise normalized.

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E. F. Codd in 1971.

Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was defined in 1971 by Edgar F. Codd, an English computer scientist who invented the relational model for database management.

A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value. The type of a record defines which fields the record contains.

A table is a collection of related data held in a table format within a database. It consists of columns and rows.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfil the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain types of anomalies not dealt with by 3NF as originally defined.

An entity–attribute–value model (EAV) is a data model optimized for the space-efficient storage of sparse—or ad-hoc—property or data values, intended for situations where runtime usage patterns are arbitrary, subject to user variation, or otherwise unforeseeable using a fixed design. The use-case targets applications which offer a large or rich system of defined property types, which are in turn appropriate to a wide set of entities, but where typically only a small, specific selection of these are instantiated for a given entity. Therefore, this type of data model relates to the mathematical notion of a sparse matrix. EAV is also known as object–attribute–value model, vertical database model, and open schema.

<span class="mw-page-title-main">Raymond F. Boyce</span> American computer scientist (1946–1974)

Raymond F. Boyce (1946–1974) was an American computer scientist who was known for his research in relational databases. He is best known for his work co-developing the SQL database language and Boyce-Codd normal form.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

The nested set model is a technique for representing nested set collections in relational databases.

<span class="mw-page-title-main">Relation (database)</span> Set of tuples consisting of values indexed by attributes

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1, d2, ..., dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

References

  1. 1 2 3 Kitagawa, Hiroyuki; Kunii, Tosiyasu L. (1990-02-06). The Unnormalized Relational Data Model. Springer. pp. 1, 5, 7, 10. ISBN   978-4-431-70049-4.
  2. "IBM Archives: Edgar F. Codd". April 23, 2003.
  3. Codd, E. F. (1970). A Relational Model of Data for. Large Shared Data Banks. IBM Research Laboratory, San Jose, California.
  4. Operations and the Properties on Non-First-Normal-Form Relational Databases H. Arisawa, K. Moriya, T. Miura Published in VLDB 1983
  5. Moniruzzaman, A. B. M.; Hossain, Syed Akhter (2013). "NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison". International Journal of Database Theory and Application. 6. arXiv: 1307.0191 .