Relational database

Last updated

A relational database (RDB [1] ) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. [2]

Contents

A database management system used to maintain relational databases is a relational database management system (RDBMS).

Many relational database systems are equipped with the option of using SQL (Structured Query Language) for querying and updating the database. [3]

History

The concept of relational database was defined by E. F. Codd at IBM in 1970. Codd introduced the term relational in his research paper "A Relational Model of Data for Large Shared Data Banks". [2] In this paper and later papers, he defined what he meant by relation. One well-known definition of what constitutes a relational database system is composed of Codd's 12 rules.

However, no commercial implementations of the relational model conform to all of Codd's rules, [4] so the term has gradually come to describe a broader class of database systems, which at a minimum:

  1. Present the data to the user as relations (a presentation in tabular form, i.e. as a collection of tables with each table consisting of a set of rows and columns);
  2. Provide relational operators to manipulate the data in tabular form.

In 1974, IBM began developing System R, a research project to develop a prototype RDBMS. [5] [6] The first system sold as an RDBMS was Multics Relational Data Store (June 1976).[ citation needed ] Oracle was released in 1979 by Relational Software, now Oracle Corporation. [7] Ingres and IBM BS12 followed. Other examples of an RDBMS include IBM Db2, SAP Sybase ASE, and Informix. In 1984, the first RDBMS for Macintosh began being developed, code-named Silver Surfer, and was released in 1987 as 4th Dimension and known today as 4D. [8]

The first systems that were relatively faithful implementations of the relational model were from:

The most common definition of an RDBMS is a product that presents a view of data as a collection of rows and columns, even if it is not based strictly upon relational theory. By this definition, RDBMS products typically implement some but not all of Codd's 12 rules.

A second school of thought argues that if a database does not implement all of Codd's rules (or the current understanding on the relational model, as expressed by Christopher J. Date, Hugh Darwen and others), it is not relational. This view, shared by many theorists and other strict adherents to Codd's principles, would disqualify most DBMSs as not relational. For clarification, they often refer to some RDBMSs as truly-relational database management systems (TRDBMS), naming others pseudo-relational database management systems (PRDBMS).[ citation needed ]

As of 2009, most commercial relational DBMSs employ SQL as their query language. [13]

Alternative query languages have been proposed and implemented, notably the pre-1996 implementation of Ingres QUEL.

Relational model

A relational model organizes data into one or more tables (or "relations") of columns and rows, with a unique key identifying each row. Rows are also called records or tuples. [14] Columns are also called attributes. Generally, each table/relation represents one "entity type" (such as customer or product). The rows represent instances of that type of entity (such as "Lee" or "chair") and the columns represent values attributed to that instance (such as address or price).

For example, each row of a class table corresponds to a class, and a class corresponds to multiple students, so the relationship between the class table and the student table is "one to many" [15]

Keys

Each row in a table has its own unique key. Rows in a table can be linked to rows in other tables by adding a column for the unique key of the linked row (such columns are known as foreign keys). Codd showed that data relationships of arbitrary complexity can be represented by a simple set of concepts. [2]

Part of this processing involves consistently being able to select or modify one and only one row in a table. Therefore, most physical implementations have a unique primary key (PK) for each row in a table. When a new row is written to the table, a new unique value for the primary key is generated; this is the key that the system uses primarily for accessing the table. System performance is optimized for PKs. Other, more natural keys may also be identified and defined as alternate keys (AK). Often several columns are needed to form an AK (this is one reason why a single integer column is usually made the PK). Both PKs and AKs have the ability to uniquely identify a row within a table. Additional technology may be applied to ensure a unique ID across the world, a globally unique identifier, when there are broader system requirements.

The primary keys within a database are used to define the relationships among the tables. When a PK migrates to another table, it becomes a foreign key (FK) in the other table. When each cell can contain only one value and the PK migrates into a regular entity table, this design pattern can represent either a one-to-one or one-to-many relationship. Most relational database designs resolve many-to-many relationships by creating an additional table that contains the PKs from both of the other entity tables  the relationship becomes an entity; the resolution table is then named appropriately and the two FKs are combined to form a PK. The migration of PKs to other tables is the second major reason why system-assigned integers are used normally as PKs; there is usually neither efficiency nor clarity in migrating a bunch of other types of columns.

Relationships

Relationships are a logical connection between different tables (entities), established on the basis of interaction among these tables. These relationships can be modelled as an entity-relationship model.

Transactions

In order for a database management system (DBMS) to operate efficiently and accurately, it must use ACID transactions. [16] [17] [18]

Stored procedures

Part of the programming within a RDBMS is accomplished using stored procedures (SPs). Often procedures can be used to greatly reduce the amount of information transferred within and outside of a system. For increased security, the system design may grant access to only the stored procedures and not directly to the tables. Fundamental stored procedures contain the logic needed to insert new and update existing data. More complex procedures may be written to implement additional rules and logic related to processing or selecting the data.

Terminology

Relational database terminology Relational database terms.svg
Relational database terminology

The relational database was first defined in June 1970 by Edgar Codd, of IBM's San Jose Research Laboratory. [2] Codd's view of what qualifies as an RDBMS is summarized in Codd's 12 rules. A relational database has become the predominant type of database. Other models besides the relational model include the hierarchical database model and the network model.

The table below summarizes some of the most important relational database terms and the corresponding SQL term:

SQL termRelational database termDescription
Row Tuple or record A data set representing a single item
Column Attribute or fieldA labeled element of a tuple, e.g. "Address" or "Date of birth"
Table Relation or Base relvar A set of tuples sharing the same attributes; a set of columns and rows
View or result set Derived relvarAny set of tuples; a data report from the RDBMS in response to a query

Relations or tables

In a relational database, a relation is a set of tuples that have the same attributes. A tuple usually represents an object and information about that object. Objects are typically physical objects or concepts. A relation is usually described as a table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and conform to the same constraints.

The relational model specifies that the tuples of a relation have no specific order and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived from a query. Similarly, queries identify tuples for updating or deleting.

Tuples by definition are unique. If the tuple contains a candidate or primary key then obviously it is unique; however, a primary key need not be defined for a row or record to be a tuple. The definition of a tuple requires that it be unique, but does not require a primary key to be defined. Because a tuple is unique, its attributes by definition constitute a superkey.

Base and derived relations

All data are stored and accessed via relations. Relations that store data are called "base relations", and in implementations are called "tables". Other relations do not store data, but are computed by applying relational operations to other relations. These relations are sometimes called "derived relations". In implementations these are called "views" or "queries". Derived relations are convenient in that they act as a single relation, even though they may grab information from several relations. Also, derived relations can be used as an abstraction layer.

Domain

A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an element of the specified set. The character string "ABC", for instance, is not in the integer domain, but the integer value 123 is. Another example of domain describes the possible values for the field "CoinFace" as ("Heads","Tails"). So, the field "CoinFace" will not accept input values like (0,1) or (H,T).

Constraints

Constraints are often used to make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rules in the database and support subsequent data use within the application layer. SQL implements constraint functionality in the form of check constraints. Constraints restrict the data that can be stored in relations. These are usually defined using expressions that result in a Boolean value, indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple (restricting combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there are constraints (domain constraints). The two principal rules for the relational model are known as entity integrity and referential integrity .

Primary key

Every relation/table has a primary key, this being a consequence of a relation being a set. [19] A primary key uniquely specifies a tuple within a table. While natural attributes (attributes used to describe the data being entered) are sometimes good primary keys, surrogate keys are often used instead. A surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for instance, in a table of information about students at a school they might all be assigned a student ID in order to differentiate them). The surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability to uniquely identify a tuple. Another common occurrence, especially in regard to N:M cardinality is the composite key. A composite key is a key made up of two or more attributes within a table that (together) uniquely identify a record. [20]

Foreign key

Foreign key refers to a field in a relational table that matches the primary key column of another table. It relates the two keys. Foreign keys need not have unique values in the referencing relation. A foreign key can be used to cross-reference tables, and it effectively uses the values of attributes in the referenced relation to restrict the domain of one or more attributes in the referencing relation. The concept is described formally as: "For all tuples in the referencing relation projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same attributes such that the values in each of the referencing attributes match the corresponding values in the referenced attributes."

Stored procedures

A stored procedure is executable code that is associated with, and generally stored in, the database. Stored procedures usually collect and customize common operations, like inserting a tuple into a relation, gathering statistical information about usage patterns, or encapsulating complex business logic and calculations. Frequently they are used as an application programming interface (API) for security or simplicity. Implementations of stored procedures on SQL RDBMS's often allow developers to take advantage of procedural extensions (often vendor-specific) to the standard declarative SQL syntax. Stored procedures are not part of the relational database model, but all commercial implementations include them.

Index

An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on a relation. Queries that filter using those attributes can find matching tuples directly using the index (similar to Hash table lookup), without having to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which the information you are looking for is found, so that you do not have to read the entire book to find what you are looking for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some combination of data distribution, relation size, and typical access pattern. Indices are usually implemented via B+ trees, R-trees, and bitmaps. Indices are usually not considered part of the database, as they are considered an implementation detail, though indices are usually maintained by the same group that maintains the other parts of the database. The use of efficient indexes on both primary and foreign keys can dramatically improve query performance. This is because B-tree indexes result in query times proportional to log(n) where n is the number of rows in a table and hash indexes result in constant time queries (no size dependency as long as the relevant part of the index fits into memory).

Relational operations

Queries made against the relational database, and the derived relvars in the database are expressed in a relational calculus or a relational algebra. In his original relational algebra, Codd introduced eight relational operators in two groups of four operators each. The first four operators were based on the traditional mathematical set operations:

The remaining operators proposed by Codd involve special operations specific to relational databases:

Other operators have been introduced or proposed since Codd's introduction of the original eight including relational comparison operators and extensions that offer support for nesting and hierarchical data, among others.

Normalization

Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of procedures designed to eliminate non-simple domains (non-atomic values) and the redundancy (duplication) of data, which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of normalization applied to databases are called the normal forms.

RDBMS

The general structure of a relational database RDBMS structure.png
The general structure of a relational database

Connolly and Begg define database management system (DBMS) as a "software system that enables users to define, create, maintain and control access to the database". [21] RDBMS is an extension of that initialism that is sometimes used when the underlying database is relational.

An alternative definition for a relational database management system is a database management system (DBMS) based on the relational model. Most databases in widespread use today are based on this model. [22]

RDBMSs have been a common option for the storage of information in databases used for financial records, manufacturing and logistical information, personnel data, and other applications since the 1980s. Relational databases have often replaced legacy hierarchical databases and network databases, because RDBMS were easier to implement and administer. Nonetheless, relational stored data received continued, unsuccessful challenges by object database management systems in the 1980s and 1990s, (which were introduced in an attempt to address the so-called object–relational impedance mismatch between relational databases and object-oriented application programs), as well as by XML database management systems in the 1990s. [23] However, due to the expanse of technologies, such as horizontal scaling of computer clusters, NoSQL databases have recently become popular as an alternative to RDBMS databases. [24]

Distributed relational databases

Distributed Relational Database Architecture (DRDA) was designed by a workgroup within IBM in the period 1988 to 1994. DRDA enables network connected relational databases to cooperate to fulfill SQL requests. [25] [26] The messages, protocols, and structural components of DRDA are defined by the Distributed Data Management Architecture.

List of database engines

According to DB-Engines, in December 2024 the most popular systems on the db-engines.com web site were: [27]

  1. Oracle Database
  2. MySQL
  3. Microsoft SQL Server
  4. PostgreSQL
  5. Snowflake
  6. IBM Db2
  7. SQLite
  8. Microsoft Access
  9. Databricks
  10. MariaDB

According to research company Gartner, in 2011, the five leading proprietary software relational database vendors by revenue were Oracle (48.8%), IBM (20.2%), Microsoft (17.0%), SAP including Sybase (4.6%), and Teradata (3.7%). [28]

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

Database normalization is the process of structuring a relational database accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data and defining queries on it with well founded semantics. The theory was introduced by Edgar F. Codd.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combination of attributes from the table to act as its primary key, or create a new attribute containing a unique ID that exists solely for this purpose.

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S; furthermore that those attributes must also be a candidate key in S.

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

In the relational data model a superkey is any set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, non-key attributes are functionally dependent on the superkey.

In a database, a table is a collection of related data organized in table format; consisting of columns and rows.

<span class="mw-page-title-main">Null (SQL)</span> Marker used in SQL databases to indicate a value does not exist

In SQL, null or NULL is a special marker used to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model, E. F. Codd, SQL null serves to fulfill the requirement that all true relational database management systems (RDBMS) support a representation of "missing information and inapplicable information". Codd also introduced the use of the lowercase Greek omega (ω) symbol to represent null in database theory. In SQL, NULL is a reserved word used to identify this marker.

Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-oriented (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

<span class="mw-page-title-main">Relation (database)</span> Set of tuples consisting of values indexed by attributes

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

QUEL is a relational database query language, based on tuple relational calculus, with some similarities to SQL. It was created as a part of the Ingres DBMS effort at University of California, Berkeley, based on Codd's earlier suggested but not implemented Data Sub-Language ALPHA. QUEL was used for a short time in most products based on the freely available Ingres source code, most notably in an implementation called POSTQUEL supported by POSTGRES. As Oracle and DB2 gained market share in the early 1980s, most companies then supporting QUEL moved to SQL instead. QUEL continues to be available as a part of the Ingres DBMS, although no QUEL-specific language enhancements have been added for many years.

The following is provided as an overview of and topical guide to databases:

In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.

A database object is a structure for storing, managing and presenting application- or user-specific data in a database. Depending on the database management system (DBMS), many different types of database objects can exist. The following is a list of the most common types of database objects found in most relational databases (RDBMS):

References

  1. Hastings, Jordan (2003). Portable Software Tools for Managing and Referencing Taxonomies. Digital Mapping Techniques '03 Workshop Proceedings. Vol. U.S. Geological Survey Open-File Report 03–471. 2. Relational Database Technology and Taxonomic Representation. Archived from the original on 2014-10-21. Retrieved 2024-04-06 via United States Geological Survey.
  2. 1 2 3 4 Codd, E. F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM . 13 (6): 377–387. doi: 10.1145/362384.362685 . S2CID   207549016.
  3. Ambler, Scott (21 March 2023). "Relational Databases 101: Looking at the Whole Picture".[ better source needed ]
  4. Date, Chris (5 May 2005). Database in depth: relational theory for practitioners. O'Reilly. ISBN   0-596-10012-4.
  5. Funding a Revolution: Government Support for Computing Research. National Academies Press. 8 Jan 1999. ISBN   0309062780.
  6. Sumathi, S.; Esakkirajan, S. (13 Feb 2008). Fundamentals of Relational Database Management Systems. Springer. ISBN   978-3540483977. The product was called SQL/DS (Structured Query Language/Data Store) and ran under the DOS/VSE operating system environment
  7. "Oracle Timeline" (PDF). Profit Magazine. 12 (2). Oracle: 26. May 2007. Retrieved 2013-05-16.
  8. "New Database Software Program Moves Macintosh Into The Big Leagues". tribunedigital-chicagotribune. 28 June 1987. Retrieved 2016-03-17.
  9. Hershey, W.R.; Easthope, C.H. (1 December 1972). "A set theoretic data structure and retrieval language". ACM SIGIR Forum. 7 (4). Association for Computing Machinery: 45–55. doi:10.1145/1095495.1095500 . Retrieved 4 January 2024.
  10. SIGFIDET '74: Proceedings of the 1974 ACM SIGFIDET (Now SIGMOD) Workshop on Data Description, Access and Control: Data Models: Data-Structure-Set versus Relational. Association for Computing Machinery. 1 January 1975. doi:10.1145/800297. ISBN   978-1-4503-7418-7 . Retrieved 4 January 2024.
  11. Notley, M.G. (1972). The Peterlee IS/1 System. IBM United Kingdom Scientific Centre. Retrieved 4 January 2024.
  12. Todd, Stephen (1976). "The Peterlee Relational Test Vehicle - A System Overview". IBM Systems Journal. 15 (4): 285–308. doi:10.1147/sj.154.0285.
  13. Ramakrishnan, Raghu; Donjerkovic, Donko; Ranganathan, Arvind; Beyer, Kevin S.; Krishnaprasad, Muralidhar (1998). "SRQL: Sorted Relational Query Language" (PDF). E Proceedings of SSDBM.
  14. "A Relational Database Overview". oracle.com.
  15. "A universal relation model for a nested database", The Nested Universal Relation Database Model, Lecture Notes in Computer Science, vol. 595, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 109–135, 1992, doi:10.1007/3-540-55493-9_5, ISBN   978-3-540-55493-6 , retrieved 2020-11-01
  16. "Gray to be Honored With A. M. Turing Award This Spring". Microsoft PressPass. 1998-11-23. Archived from the original on 6 February 2009. Retrieved 2009-01-16.
  17. Gray, Jim (September 1981). "The Transaction Concept: Virtues and Limitations" (PDF). Proceedings of the 7th International Conference on Very Large Databases. Cupertino, CA: Tandem Computers. pp. 144–154. Retrieved 2006-11-09.
  18. Gray, Jim, and Reuter, Andreas, Distributed Transaction Processing: Concepts and Techniques. Morgan Kaufmann, 1993. ISBN   1-55860-190-2.
  19. Date (1984), p. 268.
  20. Connolly, Thomas M; Begg, Carolyn E (2015). Database systems: a practical approach to design, implementation, and management (global ed.). Boston Columbus Indianapolis: Pearson. p. 416. ISBN   978-1-292-06118-4.
  21. Connolly, Thomas M.; Begg, Carolyn E. (2014). Database Systems – A Practical Approach to Design Implementation and Management (6th ed.). Pearson. p. 64. ISBN   978-1292061184.
  22. Pratt, Philip J.; Last, Mary Z. (2014-09-08). Concepts of Database Management (8 ed.). Course Technology. p. 29. ISBN   9781285427102.
  23. Feuerlich, George (21 April 2010). Dateso 10; Database Trends and Directions: Current Challenges and Opportunities (1st ed.). Prague, Sokolovsk: MATFYZPRESS. pp. 163–174. ISBN   978-80-7378-116-3.
  24. "NoSQL databases eat into the relational database market". 4 March 2015. Retrieved 2018-03-14.
  25. Reinsch, R. (1988). "Distributed database for SAA". IBM Systems Journal. 27 (3): 362–389. doi:10.1147/sj.273.0362.
  26. Distributed Relational Database Architecture Reference. IBM Corp. SC26-4651-0. 1990.
  27. "DB-Engines Ranking of Relational DBMS". DB-Engines. Retrieved 2024-12-01.
  28. "Oracle the clear leader in $24 billion RDBMS market". Eye on Oracle. 2012-04-12. Retrieved 2013-03-01.

Sources