Temporal database

Last updated

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.

Contents

More specifically the temporal aspects usually include valid time, transaction time and/or decision time.

Types

Uni-temporal

A uni-temporal database has one axis of time, either the validity range or the system time range.

Bi-temporal

A bi-temporal database has two axes of time:

Tri-temporal

A tri-temporal database has three axes of time:

This approach introduces additional complexities.

Temporal databases are in contrast to current databases (not to be confused with currently available databases), which store only facts which are believed to be true at the current time.

Features

Temporal databases support managing and accessing temporal data by providing one or more of the following features: [1] [2]

History

With the development of SQL and its attendant use in real-life applications, database users realized that when they added date columns to key fields, some issues arose. For example, if a table has a primary key and some attributes, adding a date to the primary key to track historical changes can lead to creation of more rows than intended. Deletes must also be handled differently when rows are tracked in this way. In 1992, this issue was recognized but standard database theory was not yet up to resolving this issue, and neither was the then-newly formalized SQL-92 standard.

Richard Snodgrass proposed in 1992 that temporal extensions to SQL be developed by the temporal database community. In response to this proposal, a committee was formed to design extensions to the 1992 edition of the SQL standard (ANSI X3.135.-1992 and ISO/IEC 9075:1992); those extensions, known as TSQL2, were developed during 1993 by this committee. [3] In late 1993, Snodgrass presented this work to the group responsible for the American National Standard for Database Language SQL, ANSI Technical Committee X3H2 (now known as NCITS H2). The preliminary language specification appeared in the March 1994 ACM SIGMOD Record. Based on responses to that specification, changes were made to the language, and the definitive version of the TSQL2 Language Specification was published in September, 1994 [4]

An attempt was made to incorporate parts of TSQL2 into the new SQL standard SQL:1999, called SQL3. Parts of TSQL2 were included in a new substandard of SQL3, ISO/IEC 9075-7, called SQL/Temporal. [3] The TSQL2 approach was heavily criticized by Chris Date and Hugh Darwen. [5] The ISO project responsible for temporal support was canceled near the end of 2001.

As of December 2011, ISO/IEC 9075, Database Language SQL:2011 Part 2: SQL/Foundation included clauses in table definitions to define "application-time period tables" (valid time tables), "system-versioned tables" (transaction time tables) and "system-versioned application-time period tables" (bitemporal tables). A substantive difference between the TSQL2 proposal and what was adopted in SQL:2011 is that there are no hidden columns in the SQL:2011 treatment, nor does it have a new data type for intervals; instead two columns with datestamps (DS) or date-timestamps (DTS) can be bound together using a PERIOD FOR declaration. Another difference is replacement of the controversial (prefix) statement modifiers from TSQL2 with a set of temporal predicates. [1]

Other features of SQL:2011 standard related to temporal databases are automatic time period splitting, temporal primary keys, temporal referential integrity, temporal predicates with Allen's interval algebra and time-sliced and sequenced queries.

Example

For illustration, consider the following short biography of a fictional man, John Doe:

John Doe was born on 1975-04-03 in the Kids Hospital of Medicine County, as son of Jack Doe and Jane Doe who lived in Smallville. Jack Doe proudly registered the birth of his first-born on April 4, 1975 at the Smallville City Hall. John grew up as a joyful boy, turned out to be a brilliant student and graduated with honors in 1993. After graduation, he went to live on his own in Bigtown. Although he moved out on 1994-08-26, he forgot to register the change of address officially. It was only at the turn of the seasons that his mother reminded him that he had to register, which he did a few days later on 1994-12-27. Although John had a promising future, his story ends tragically. John Doe was accidentally hit by a truck on 2001-04-01. The coroner reported his date of death on the very same day.

Using a non-temporal database

To store the life of John Doe in a current (non-temporal) database we use a table person (name, address). (In order to simplify, name is defined as the primary key of person.)

John's father officially reported his birth on 1975-04-04. On this date a Smallville official inserted the following entry in the database: Person(John Doe, Smallville). Note that the date itself is not stored in the database.

After graduation, John moves out, but forgets to register his new address. John's entry in the database is not changed until 1994-12-27, when he finally reports it. A Bigtown official updates his address in the database. The person table now contains Person(John Doe, Bigtown). Note that the information of John living in Smallville has been overwritten, so it is no longer possible to retrieve that information from the database. An official accessing the database on 1994-12-28, would be told that John lives in Bigtown. More technically: if a database administrator ran the query SELECTADDRESSFROMPERSONWHERENAME='John Doe' on 1994-12-26, the result would be Smallville. Running the same query 2 days later would result in Bigtown.

Until his death, the database would state that he lived in Bigtown. On 2001-04-01, the coroner deletes the John Doe entry from the database. After this, running the above query would return no result at all.

DateReal world eventDatabase actionWhat the database shows
1975-04-03John is bornNothingThere is no person called John Doe
1975-04-04John's father officially reports John's birthInserted:Person(John Doe, Smallville)John Doe lives in Smallville
1994-08-26After graduation, John moves to Bigtown, but forgets to register his new addressNothingJohn Doe lives in Smallville
1994-12-26NothingNothingJohn Doe lives in Smallville
1994-12-27John registers his new addressUpdated:Person(John Doe, Bigtown)John Doe lives in Bigtown
2001-04-01John diesDeleted:Person(John Doe)There is no person called John Doe

Using a single axis: valid time or transaction time

Valid time is the time for which a fact is true in the real world. A valid time period may be in the past, span the current time, or occur in the future.

For the example above, to record valid time, the person table has two fields added, valid_from and valid_to. These specify the period when a person's address is valid in the real world. On 1975-04-04, John's father registered his son's birth. An official then inserts a new entry into the database stating that John lives in Smallville from April 3. Note that although the data was inserted on the fourth, the database states that the information is valid since the third. The official does not yet know if or when John will move to another place, so the valid_to field is set to infinity (∞). The entry in the database is:

Person(John Doe, Smallville, 1975-04-03, ∞)

On 1994-12-27, John reports his new address in Bigtown where he has been living since 1994-08-26. A new database entry is made to record this fact:

Person(John Doe, Bigtown, 1994-08-26, ∞)

The original entry Person (John Doe, Smallville, 1975-04-03, ∞) is not deleted, but has the valid_to attribute updated to reflect that it is now known that John stopped living in Smallville on 1994-08-26. The database now contains two entries for John Doe:

Person(John Doe, Smallville, 1975-04-03, 1994-08-26) Person(John Doe, Bigtown, 1994-08-26, ∞)

When John dies his current entry in the database is updated stating that John does not live in Bigtown any longer. The database now looks like this

Person(John Doe, Smallville, 1975-04-03, 1994-08-26) Person(John Doe, Bigtown, 1994-08-26, 2001-04-01)

Using two axes: valid time and transaction time

Transaction time records the time period during which a database entry is accepted as correct. This enables queries that show the state of the database at a given time. Transaction time periods can only occur in the past or up to the current time. In a transaction time table, records are never deleted. Only new records can be inserted, and existing ones updated by setting their transaction end time to show that they are no longer current.

To enable transaction time in the example above, two more fields are added to the Person table: transaction_from and transaction_to. Here, transaction_from is the time a transaction was made, and transaction_to is the time that the transaction was superseded (which may be infinity if it has not yet been superseded). This makes the table into a bitemporal table.

What happens if the person's address as stored in the database is incorrect? Suppose an official accidentally entered the wrong address or date? Or, suppose the person lied about their address for some reason. Upon discovery of the error, the officials update the database to correct the information recorded.

For example, from 1995-06-01 to 2000-09-03, John Doe moved to Beachy. But to avoid paying Beachy's exorbitant residence tax, he never reported it to the authorities. Later during a tax investigation, it is discovered on 2-Feb-2001 that he was in fact in Beachy during those dates. To record this fact, the existing entry about John living in Bigtown must be split into two separate records, and a new record inserted recording his residence in Beachy. The database would then appear as follows:

Person(John Doe, Smallville, 1975-04-03, 1994-08-26) Person(John Doe, Bigtown, 1994-08-26, 1995-06-01) Person(John Doe, Beachy, 1995-06-01, 2000-09-03) Person(John Doe, Bigtown, 2000-09-03, 2001-04-01)

However, this leaves no record that the database ever claimed that he lived in Bigtown during 1995-06-01 to 2000-09-03.This might be important to know for auditing reasons, or to use as evidence in the official's tax investigation. Transaction time allows capturing this changing knowledge in the database, since entries are never directly modified or deleted. Instead, each entry records when it was entered and when it was superseded (or logically deleted). The database contents then look like this:

Name, City, Valid from, Valid till, Entered, Superseded
Person(John Doe, Smallville, 1975-04-03,  ∞,           1975-04-04,  1994-12-27) Person(John Doe, Smallville, 1975-04-03,  1994-08-26,  1994-12-27,  ∞         ) Person(John Doe, Bigtown,    1994-08-26,  ∞,           1994-12-27,  2001-02-02) Person(John Doe, Bigtown,    1994-08-26,  1995-06-01,  2001-02-02,  ∞         ) Person(John Doe, Beachy,     1995-06-01,  2000-09-03,  2001-02-02,  ∞         ) Person(John Doe, Bigtown,    2000-09-03,  ∞,           2001-02-02,  2001-04-01) Person(John Doe, Bigtown,    2000-09-03,  2001-04-01,  2001-04-01,  ∞         )

The database records not only what happened in the real world, but also what was officially recorded at different times.

Using three axes: valid time, decision time, and transaction time

Decision time is an alternative to the transaction time period for recording the time at which a database entry may be accepted as correct. This enables queries that show the officially recognized facts at a given time, even if there was a delay in committing those facts to the database. Support for decision time preserves the entire history and prevents the loss of information during updates. [6]

Decision time periods can only occur in the past or up to the transaction time. As in a transaction time table, records are never deleted. Only new records can be inserted, and existing ones updated by setting their decision end time to show that they are no longer current.

To enable decision time, two more fields are added to a database table: decision_from and decision_to. Here, decision_from is the time a decision was made, and decision_to is the time that the decision was superseded (which may be infinity if it has not yet been superseded). When combined with transaction time, this makes the table into a tritemporal table.

The following is a list of real-world events that occurred between the United States presidential elections of 1964 and 1976:

DateDecision makerReal world event
1964-11-03 Electoral College Election of 1964
1968-11-05Electoral College Election of 1968
1972-11-07Electoral College Election of 1972
1973-08-10 Spiro Agnew Agnew resigns
1973-08-12 Richard Nixon Nixon nominates Ford
1973-12-06 Congress Congress confirms Ford
1974-08-09Richard Nixon Nixon resigns
1974-08-20 Gerald Ford Ford nominates Rockefeller
1974-12-19CongressCongress confirms Rockefeller
1976-11-02Electoral College Election of 1976

Suppose there is a constant 7-day delay between the decision time and the transaction time committed to the database. Then following the election of 1976 the database contents would be:

<nowiki/>               President, Vice president, Valid from, Valid till, Decision from, Decision to, Transaction from, Transaction to ----------------------------------------------------------------------------------------------------------------------------------  Administration(Lyndon Johnson,    Hubert Humphrey, 1965-01-20, 1969-01-20,  1964-11-03,           ∞, 1964-11-10,           ∞) Administration( Richard Nixon,        Spiro Agnew, 1969-01-20, 1973-01-20,  1968-11-05,           ∞, 1968-11-12,           ∞) Administration( Richard Nixon,        Spiro Agnew, 1973-01-20, 1977-01-20,  1972-11-07,           ∞, 1972-11-14, 1973-12-17) Administration( Richard Nixon,        Spiro Agnew, 1973-01-20, 1977-01-20,  1972-11-07,  1973-10-10, 1973-10-17,           ∞) Administration( Richard Nixon,        Spiro Agnew, 1973-01-20, 1973-10-10,  1973-10-10,           ∞, 1973-08-17,           ∞) Administration( Richard Nixon,           (Vacant), 1973-10-10, 1977-01-20,  1973-10-10,           ∞, 1973-08-17, 1973-12-13) Administration( Richard Nixon,        Gerald Ford,          ∞, 1977-01-20,  1973-12-10,           ∞, 1973-08-19, 1973-12-13) Administration( Richard Nixon,           (Vacant), 1973-10-10, 1977-01-20,  1973-10-10,  1973-12-06, 1973-12-13,           ∞) Administration( Richard Nixon,           (Vacant), 1973-10-10, 1973-12-06,  1973-12-06,           ∞, 1973-12-13,           ∞) Administration( Richard Nixon,        Gerald Ford,          ∞, 1977-01-20,  1973-10-12,  1973-12-06, 1973-12-13,           ∞) Administration( Richard Nixon,        Gerald Ford, 1973-12-06, 1977-01-20,  1973-12-06,           ∞, 1973-12-13, 1974-08-15) Administration( Richard Nixon,        Gerald Ford, 1973-12-06, 1977-01-20,  1973-12-06,  1974-08-08, 1974-08-15,           ∞) Administration( Richard Nixon,        Gerald Ford, 1973-12-06, 1974-08-09,  1974-08-08,           ∞, 1974-08-15,           ∞) Administration(   Gerald Ford,           (Vacant), 1974-08-09, 1977-01-20,  1974-08-08,           ∞, 1974-08-15, 1974-12-26) Administration(   Gerald Ford, Nelson Rockefeller,          ∞, 1977-01-20,  1974-08-20,           ∞, 1974-08-27, 1974-12-26) Administration(   Gerald Ford,           (Vacant), 1974-08-09, 1977-01-20,  1974-08-08,  1974-12-19, 1974-12-26,           ∞) Administration(   Gerald Ford,           (Vacant), 1974-08-09, 1974-12-19,  1974-12-19,           ∞, 1974-12-26,           ∞) Administration(   Gerald Ford, Nelson Rockefeller,          ∞, 1977-01-20,  1974-08-20,  1974-12-19, 1974-12-26,           ∞) Administration(   Gerald Ford, Nelson Rockefeller, 1974-12-19, 1977-01-20,  1974-12-19,           ∞, 1974-12-26,           ∞) Administration(  Jimmy Carter,     Walter Mondale, 1977-01-20, 1981-01-20,  1976-11-02,           ∞, 1976-11-09,           ∞)

Consider the question of who would be president and vice president for a valid time of 1977-01-01:

Bitemporal modelling

A bitemporal model contains both valid and transaction time. This provides both historical and rollback information. Historical information (e.g.: "Where did John live in 1992?") is provided by the valid time. Rollback (e.g.: "In 1992, where did the database believe John lived?") is provided by the transaction time. The answers to these example questions may not be the same  the database may have been altered since 1992, causing the queries to produce different results.

The valid time and transaction time do not have to be the same for a single fact. For example, consider a temporal database storing data about the 18th century. The valid time of these facts is somewhere between 1701 and 1800. The transaction time would show when the facts were inserted into the database (for example 1998-01-21).

Schema evolution

A challenging issue is the support of temporal queries in a transaction time database under evolving schema. In order to achieve perfect archival quality it is of key importance to store the data under the schema version under which they first appeared. However, even the most simple temporal query rewriting the history of an attribute value would be required to be manually rewritten under each of the schema versions, potentially hundreds as in the case of MediaWiki. [7] This process would be particularly taxing for users. A proposed solution is to provide automatic query rewriting, [8] [9] although this is not part of SQL or similar standards.

Approaches to minimize the complexities of schema evolution are to:

Implementations in notable products

The following implementations provide temporal features in a relational database management system (RDBMS).

Non-relational, NoSQL database management systems that provide temporal features including the following:

Temporal databases were one of the earliest forms of data version control, and influenced the development of modern data versioning systems. [18]

Alternatives

Example of slowly changing dimension (SCD) model Scd model.png
Example of slowly changing dimension (SCD) model

Slowly changing dimensions can be used to model temporal relations.

Further reading

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

A relational database is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A database management system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems are equipped with the option of using SQL for querying and updating the database.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

An SQL INSERT statement adds one or more records to any single table in a relational database.

In temporal databases, valid-time (VT),like for example an event datetime, is the time period during which a database fact is valid in the modeled reality.

The following tables compare general and technical information for a number of relational database management systems. Please see the individual products' articles for further information. Unless otherwise specified in footnotes, comparisons are based on the stable versions without any add-ons, extensions or external programs.

Object–relational impedance mismatch is a set of difficulties going between data in relational data stores and data in domain-driven object models. Relational Database Management Systems (RDBMS) is the standard method for storing data in a dedicated database, while object-orientated (OO) programming is the default method for business-centric design in programming languages. The problem lies in neither relational databases nor OO programming, but in the conceptual difficulty mapping between the two logic models. Both logical models are differently implementable using database servers, programming languages, design patterns, or other technologies. Issues range from application to enterprise scale, whenever stored relational data is used in domain-driven object models, and vice versa. Object-oriented data stores can trade this problem for other implementation difficulties.

SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it was a major revision of the standard. Aside from a few minor incompatibilities, the SQL-89 standard is forward-compatible with SQL-92.

Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators to support interval data, which can be useful in temporal databases.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

pureXML is the native XML storage feature in the IBM Db2 data server. pureXML provides query languages, storage technologies, indexing technologies, and other features to support XML data. The word pure in pureXML was chosen to indicate that Db2 natively stores and natively processes XML data in its inherent hierarchical structure, as opposed to treating XML data as plain text or converting it into a relational format.

Richard Thomas Snodgrass is an American computer scientist and writer and is professor emeritus at the University of Arizona. He is best known for his work on temporal databases, query language design, query optimization and evaluation, storage structures, database design, and ergalics.

SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

GQL is a proposed standard graph query language. In September 2019 a proposal for a project to create a new standard graph query language was approved by a vote of national standards bodies which are members of ISO/IEC Joint Technical Committee 1(ISO/IEC JTC 1). JTC 1 is responsible for international Information Technology standards. GQL is intended to be a declarative database query language, like SQL.

In temporal databases, decision time is the time when a decision was made about a fact stored in a database. It is used to keep a history of decisions about valid times.

References

  1. 1 2 3 Kulkarni, Krishna, and Jan-Eike Michels. "Temporal features in SQL: 2011". ACM SIGMOD Record 41.3 (2012): 34-43.
  2. 1 2 Saracco, Cynthia M.; Nicola, Matthias; Gandhi, Lenisha (3 April 2012). "A matter of time: Temporal data management in DB2 10". IBM . Archived from the original on 2012-10-25. Retrieved 2020-10-27.
  3. 1 2 Snodgrass, 1999, p. 9
  4. Richard T. Snodgrass. "TSQL2 Temporal Query Language". www.cs.arizona.edu. Computer Science Department of the University of Arizona. Retrieved 14 July 2009.
  5. Hugh Darwen, C.J. Date, “An overview and Analysis of Proposals Based on the TSQL2 Approach”, In Date on Database: Writings 2000-2006, C.J. Date, Apress, 2006, pp. 481-514
  6. Mario A. Nascimento, Margaret H. Eich, “Decision Time in Temporal Databases”, In Proceedings of the Second International Workshop on Temporal Representation and Reasoning, 1995, pp. 157-162
  7. Schema Evolution Benchmark - Schema Evolution
  8. Hyun J. Moon; Carlo A. Curino; Alin Deutsch; C.-Y. Hou & Carlo Zaniolo (2008). Managing and querying transaction-time databases under schema evolution. Very Large Data Base VLDB.
  9. Hyun J. Moon; Carlo A. Curino & Carlo Zaniolo (2010). Scalable Architecture and Query Optimization for Transaction-time DBs with Evolving Schemas. SIGMOD.
  10. Anthony B. Coates (2015). Why Banks Care About Bitemporality. MarkLogic World 2015.
  11. "System-Versioned Tables".
  12. Paquier, Michael (1 November 2012). "Postgres 9.2 highlight: range types". Michael Paquier - Open source developer based in Japan. Archived from the original on 2016-04-23.
  13. Katz, Jonathan S. "Range Types: Your Life Will Never Be The Same" (PDF). Retrieved 14 July 2014.
  14. Al-Kateb, Mohammed et al. "Temporal Query Processing in Teradata". EDBT/ICDT ’13 March 18–22, 2013, Genoa, Italy
  15. Temporal in SQL Server 2016 , retrieved 2019-07-19
  16. "terminusdb/terminusdb-server". GitHub. Retrieved 2020-09-04.
  17. Bridgwater, Adrian (24 November 2014). "Data Is Good, 'Bidirectionalized Bitemporal' Data Is Better". Forbes .
  18. Bhardwaj, Anant; Bhattacherjee, Souvik; Chavan, Amit; Deshpande, Amol; Elmore, Aaron J.; Madden, Samuel; Parameswaran, Aditya G. (2014-09-02). "DataHub: Collaborative Data Science & Dataset Version Management at Scale". arXiv: 1409.0798 [cs.DB].