Relational Model/Tasmania

Last updated

Relational Model/Tasmania (RM/T) was published by Edgar F. Codd in 1979 and is the name given to a number of extensions to his original relational model (RM) published in 1970. The overall goal of the RM/T was to define some fundamental semantic units, at "atomic" and "molecular" levels, for data modelling. Codd writes: "the result is a model with a richer variety of objects than the original relational model, additional insert-update-delete rules and some additional operators that make the algebra more powerful." [1]

Contents

RM history

Between 1968 and 1988 Codd published over 30 papers on the relational model (RM) - the most famous of which is his 1970 paper. Up to 1978 the papers describe RM Version 1 (RM/V1). In early 1979 Codd first presented some new ideas, called RM/T ('T' for Tasmania), at an invited talk for the Australian Computer Science Conference in Hobart, Tasmania. Later that year the ACM journal published a paper on RM/T, in which Codd acknowledges the influence of Schmid & Swensen (1975) and Wiederhold (1977).

A later version of RM/T (we shall call it here "RM/D") was described by Chris Date in Date (1983) in which Date and Codd improved and refined RM/T, adding an entity type called designative. Although Codd writes nothing about this new type, Date offers a rationale in Date (1983, page 262). Date revised this 1983 article in Date (1995), which additionally compares the RM/T model with the E/R model.

Following a disappointing uptake of RM/T by the database industry, Codd decided to introduce the RM/T model more gradually. He planned to release a sequence of RM versions: RM/V2, RM/V3 etc. each time progressively including some of the ideas of the original RM/T into the new version. Perhaps this explains why there is no obvious mapping of concepts between RM/T and RM/V2. For example, there is no reference to associative or designative entity types in Codd's 1990 book that defines RM/V2. On the other hand, the book extends and builds on the existing body of query language issues, many of which were addressed by Codd in several papers throughout the 1980s.

Summary of RM/T

Introducing some of the new concepts of RM/T:

Surrogates
A surrogate is a unique value assigned to each entity. If two relations use the same surrogate value then they represent the same entity in the modelled universe. The surrogate value can be any unique string or number but cannot be assigned or changed by the database user. For example, a SQL SEQUENCE is often used to generate numerical surrogate values. This use of surrogate was first introduced in Hall, Owlett and Todd in 1976. [2]
Entities and Nonentities
An entity is some thing in the modelled universe and is typically identified by a surrogate. A nonentity is some thing that is not an entity and does not have its own identifying surrogate. An independent entity has its own surrogate. A dependent entity has a surrogate but it belongs to another entity, i.e. the surrogate is a foreign key.
Atomic Semantics
The RM/T addresses atomic semantics by describing how the original RM relation can be used to describe entities with attributes. An entity is represented as an Entity-relation or E-relation and its attributes (or immediate properties) are stored in separate Property-relations or P-relations. Each E-relation shares its surrogate with the associated P-relations.
E-relations
Mark the existence of an entity. An E-relation is a relation (table) storing only the surrogates for a particular entity type. A surrogate value entered into the E-relation table implies the corresponding existence of an entity of that type in the modelled world. For example, the E-relation "Employee" is a table containing the surrogates of all entities of type Employee.
P-relations
Store the attribute values of an entity. A P-relation is a relation (table) storing the surrogate and one or more attributes of an entity. The surrogate value of a P-relation is that of the corresponding E-relation; it plays the role (K-role) of the primary key for that P-relation. For example, the P-relation "Employee_Number" is a table with two columns: one containing the surrogate value of the "Employee" E-relation, the other containing the employee number.
Note that by performing an OUTER NATURAL JOIN on the RM/T "Employee" E-relation and "Employee_Person" P-relation we can construct the RM/V1 "Employee" relation. This illustrates why the E-relation and P-relation concepts of RM/T are more atomic than the relation concept of RM/V1.
Molecular Semantics
The RM/T addresses molecular semantics by taking the original RM and categorising the relations into several entity types, increasing the information captured by the semantic data model. However Codd does not define a notation for diagramming his new semantics. Each entity may play several roles at once and thus belong to one or more of the following entity types:
  • Characteristic – subordinate entities that describe kernel entities.
  • Associative – superordinate entities that interrelate kernel entities.
  • Kernel – entities that are neither characteristic or associative.
Codd goes on to introduce subtyping of entities, giving yet another qualifier for entities:
  • Inner – entities that are not subtypes of another entity.
Hence Codd speaks of inner kernel and inner associative entities.
The following definition is based on the RM/D model in Date (1983); it does not appear in Codd (1979):
  • Designative – entities that contain a designation. A designative entity is at the many end of a one-to-many relationship between two independent entities. For example, a writer may write many books, hence a one-to-many relationship between writer and book entities; the book is the designative entity because it contains a designation (or designative reference) to the writer - namely the primary key of the writer entity. Note that an associative entity contains at least two designations. For example, we can regard a booking as either an entity that associates a person with a flight, or as an entity that designates a person and designates a flight. Hence a designative entity must contain at least one designation whereas an associative entity must contain at least two designations.
Associations
These are what we might otherwise call relationships between entities or non-entities.
The value E-null is used when deleteting entities from the RM/T model; all associations that have surrogates referring to a non-existing entity are assigned the value E-null, meaning the entity is unknown.
Associative Entity and Nonentity Association
An associative entity is an entity that represents an association between two independent entities; the associative entity is an entity in itself because it has a surrogate. A nonentity association is similar to an associative entity however it has no surrogate. This lack of a surrogate stops the nonentity association from having, for example, any descriptive characteristic entities.
Directed Graph Relations
Several directed graph relations are defined to capture further semantic features of the RM/T model. These graphs are named as follows:
  • PG-relation (Property Graph) stores property relationships
  • CG-relation (Characteristic Graph) stores characteristic relationships
  • AG-relation (Association Graph) stores association relationships
  • UGI-relation (Unconditional Generalisation by Inclusion) stores generalisation by inclusion relationships
  • AGI-relation (Alternative Generalisation by Inclusion) stores generalisation by alternative relationships
  • US-relation (Unconditional Successor) stores unconditional successor relationships
  • AS-relation (Alternative Successor) stores alternative successor relationships
  • KG-relation (Cover Membership) stores cover membership relationships
  • UP-relation (Unconditional Precedence) stores unconditional succession of event relationships
  • AP-relation (Alternative Precedence) stores alternative succession of event relationships
RM/T Catalog
The Catalog is a meta-model storing the descriptions of the relations themselves. The RM/T Catalog comprises the following relations:
  • CATR (R-surrogate, relname, RelType) describes relations
  • CATRA (RA-surrogate, R-surrogate, A-surrogate) relates relations and attributes
  • CATA (A-surrogate, attname, UserKey) describes attributes
  • CATAD (AD-surrogate, A-surrogate, D-surrogate) relates attributes and domains
  • CATD (D-surrogate, domname, VType, Ordering) describes domains
  • CATC (C-surrogate, pername) describes categories
  • CATRC (RC-surrogate, R-surrogate, C-surrogate) relates relations and categories
where
  • relname is the textual name of a relation. e.g. "Address"
  • attname is the textual name of an attribute. e.g. "Street"
  • domname is the textual name of a domain. e.g. "Salary"
  • pername is the category label (from the PER-domain)
  • RN-domain is the domain of all relnames in the database
  • PER-domain is the domain of all category labels
  • E-domain is the domain of all surrogates in the database
  • E-attribute is any attribute that plays the role of a surrogate (from the E-domain)
  • E-null is the "entity unknown" surrogate (from the E-domain)
  • R-surrogate is the relation surrogate (from the E-domain)
  • A-surrogate is the attribute surrogate (from the E-domain)
  • D-surrogate is the domain surrogate (from the E-domain)
  • C-surrogate is the category label surrogate (from the E-domain)
  • RA-surrogate is the relation-attribute surrogate (from the E-domain)
  • AD-surrogate is the attribute-domain surrogate (from the E-domain)
  • RC-surrogate is the relation-category-label surrogate (from the E-domain)
  • RelType is the type of object represented by the relation
  • UserKey shows whether the attribute participates in a user-defined key
  • VType is the syntactic type of the value
  • Ordering shows whether the operator > is applicable between values of the domain
Operators
Numerous operators are defined on names, sets and graphs. See Codd's 1979 paper for details.

RM/T today

There is little mention of RM/T today and no articles have appeared recently. Peckam and Maryanski (1988) wrote about RM/T in their study of semantic data models. Codd published his book in 1990 but wrote nothing more about RM/T. RM/V1 and RM/V2 have a chapter each in Date and Darwen (1992) and the Date (1983) article was updated in (1995) and now contains a long overdue comparison of the E/R model and RM/T. Date's most recent reflections can be found on the Web at Date (1999), The Database Relational Model (2001) and Date on RM/T (2003).

RM/T contributed to the body of knowledge called semantic data modeling and semantic object modeling and continues to influence new data modellers. See the paper by Hammer and McLeod (1981), the book by Knoenke (2001) and implementation by Grabczewski et alia (2004).

Related Research Articles

In mathematics, a finitary relation over a sequence of sets X1, ..., Xn is a subset of the Cartesian product X1 × ... × Xn; that is, it is a set of n-tuples (x1, ..., xn), each being a sequence of elements xi in the corresponding Xi. Typically, the relation describes a possible connection between the elements of an n-tuple. For example, the relation "x is divisible by y and z" consists of the set of 3-tuples such that when substituted to x, y and z, respectively, make the sentence true.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

The relational model (RM) is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

<span class="mw-page-title-main">Data model</span> Model that organizes elements of data and how they relate to one another and to real-world entities.

A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to the properties of real-world entities. For instance, a data model may specify that the data element representing a car be composed of a number of other elements which, in turn, represent the color and size of the car and define its owner.

In database theory, relational algebra is a theory that uses algebraic structures for modeling data, and defining queries on it with a well founded semantics. The theory was introduced by Edgar F. Codd.

Tuple calculus is a calculus that was created and introduced by Edgar F. Codd as part of the relational model, in order to provide a declarative database-query language for data manipulation in this data model. It formed the inspiration for the database-query languages QUEL and SQL, of which the latter, although far less faithful to the original relational model and calculus, is now the de facto standard database-query language; a dialect of SQL is used by nearly every relational-database-management system. Michel Lacroix and Alain Pirotte proposed domain calculus, which is closer to first-order logic and together with Codd showed that both of these calculi are equivalent in expressive power. Subsequently, query languages for the relational model were called relationally complete if they could express at least all of these queries.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

<span class="mw-page-title-main">Entity–relationship model</span> Model or diagram describing interrelated things

An entity–relationship model describes interrelated things of interest in a specific domain of knowledge. A basic ER model is composed of entity types and specifies relationships that can exist between entities.

<span class="mw-page-title-main">Data modeling</span> Creating a model of the data in a system

Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. It may be applied as part of broader Model-driven engineering (MDE) concept.

<span class="mw-page-title-main">IDEF1X</span>

Integration DEFinition for information modeling (IDEF1X) is a data modeling language for the development of semantic data models. IDEF1X is used to produce a graphical information model which represents the structure and semantics of information within an environment or system.

Boyce–Codd normal form is a normal form used in database normalization. It is a slightly stricter version of the third normal form (3NF). By using BCNF, a database will remove all redundancies based on functional dependencies.

<span class="mw-page-title-main">Database model</span> Type of data model

A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

<span class="mw-page-title-main">Semantic data model</span> Database model

A semantic data model (SDM) is a high-level semantics-based database description and structuring formalism for databases. This database model is designed to capture more of the meaning of an application environment than is possible with contemporary database models. An SDM specification describes a database in terms of the kinds of entities that exist in the application environment, the classifications and groupings of those entities, and the structural interconnections among them. SDM provides a collection of high-level modeling primitives to capture the semantics of an application environment. By accommodating derived information in a database structural specification, SDM allows the same information to be viewed in several ways; this makes it possible to directly accommodate the variety of needs and processing requirements typically present in database applications. The design of the present SDM is based on our experience in using a preliminary version of it. SDM is designed to enhance the effectiveness and usability of database systems. An SDM database description can serve as a formal specification and documentation tool for a database; it can provide a basis for supporting a variety of powerful user interface facilities, it can serve as a conceptual database model in the database design process; and, it can be used as the database model for a new kind of database management system.

<span class="mw-page-title-main">Relation (database)</span> Set of tuples consisting of values indexed by attributes

In database theory, a relation, as originally defined by E. F. Codd, is a set of tuples (d1,d2,...,dn), where each element dj is a member of Dj, a data domain. Codd's original definition notwithstanding, and contrary to the usual definition in mathematics, there is no ordering to the elements of the tuples of a relation. Instead, each element is termed an attribute value. An attribute is a name paired with a domain. An attribute value is an attribute name paired with an element of that attribute's domain, and a tuple is a set of attribute values in which no two distinct elements have the same name. Thus, in some accounts, a tuple is described as a function, mapping names to values.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Knowledge extraction is the creation of knowledge from structured and unstructured sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to information extraction (NLP) and ETL, the main criterion is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge or the generation of a schema based on the source data.

The following is provided as an overview of and topical guide to databases:

Semantic queries allow for queries and analytics of associative and contextual nature. Semantic queries enable the retrieval of both explicitly and implicitly derived information based on syntactic, semantic and structural information contained in data. They are designed to deliver precise results or to answer more fuzzy and wide open questions through pattern matching and digital reasoning.

In database theory, Imieliński–Lipski algebra is an extension of relational algebra onto tables with different types of null values. It is used to operate on relations with incomplete information.

<span class="mw-page-title-main">Knowledge graph</span> Type of knowledge base

In knowledge representation and reasoning, a knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the semantics or relationships underlying these entities.

References

  1. Codd, Edgar F. (1979). "Extending the database relational model to capture more meaning". ACM Transactions on Database Systems. 4 (4): 397–434. doi: 10.1145/320107.320109 .
  2. Hall, P. A. V.; Owlett, J.; Todd, S. J. P. (1976). "Relations and Entities". In Nijssen, G. M. (ed.). Modelling in Data Base Management Systems. North Holland.

Further reading