Valid time

Last updated

In temporal databases, valid-time is the time period when an event happened or something was true in the real world, or more formally when a fact was valid in the modeled reality.

Contents

The valid-time period is an interval based on event times, which are referred to as event datetime in data vault. [1] [2] Other names are application-time period [1] or real-world timeline. [1] SQL:2011 supports valid time through so-called application time-period tables. [3] [4] [5] [6] In a database table, valid-time is often represented by two extra table-columns, such as start_validtime and end_validtime. The time interval is closed at its lower bound (denoted by [) and open at its upper bound (denoted by )).

In integration layers (for example a data warehouse), the valid time is controlled by the source system which delivers data to the data warehouse. [7] For many reasons, the valid timeline is different from the transaction timeline (which is when data arrives in the warehouse), and it is important that the data warehouse is capable of unambiguously reporting what actually happened in the past by combining these two timelines. [7] In bitemporal data models, valid time and transaction time can be represented two-dimensionally in a Cartesian coordinate system. When data are delivered from the integration layer and is to be represented in a presentation layer (often in a dimensional model or wide table) it is often desirable to have the data on only one timeline.

History

The term valid time was coined by Richard T. Snodgrass and his doctoral student (1986). [8]

As of December 2011, ISO/IEC 9075, Database Language SQL:2011 Part 2: SQL/Foundation included clauses in table definitions to define "application-time period tables" (that is, valid-time tables).

Example

[ Needs an additional row: "John's death registered". ]

DateWhat happened in the real worldDatabase actionWhat the database shows
1975-04-03John is bornNothingThere is no person called John Doe
1975-04-04John's father officially reports John's birthInserted:Person(John Doe, Smallville)John Doe lives in Smallville
1994-08-26After graduation, John moves to Bigtown, but forgets to register his new addressNothingJohn Doe lives in Smallville
1994-12-26NothingNothingJohn Doe lives in Smallville
1994-12-27John registers his new addressUpdated:Person(John Doe, Bigtown)John Doe lives in Bigtown
2001-04-01John diesDeleted:Person(John Doe)There is no person called John Doe

Valid time is the time for which a fact is true in the real world. In the example above, the Person table gets two extra fields, valid_from and valid_to, specifying when a person's address was valid in the real world. On 1975-04-04, John's father proudly registered his son's birth. An official will then insert a new entry to the database stating that John lives in Smallville from April, 3rd. Notice that although the data was inserted on the 4th, the database states that the information is valid since the 3rd. The official does not yet know if or when John will ever move to another place so in the database the valid_to is filled with infinity (∞) or a very late date (like for example 2300-01-01). Resulting in this entry in the database:

Person(John Doe, Smallville, 1975-04-03, ∞)

On 1994-12-27 John reports his new address in Bigtown where he has been living since 1994-08-26. The Bigtown official does not change the address of the current entry of John Doe in the database. He adds a new one:

Person (John Doe, Big Town, 1994-08-26, ∞)

The original entry Person (John Doe, Smallville, 1975-04-03, ∞) is then updated (not removed!). Since it is now known that John stopped living in Smallville on 1994-08-26, the valid_to entry can be filled in. The database now contains two entries for John Doe

Person(John Doe, Smallville, 1975-04-03, 1994-08-26)
Person(John Doe, Bigtown, 1994-08-26, ∞)

When John dies the database is once more updated. The current entry will be updated stating the date of death as the last valid_to for Bigtown, as John does not live in Bigtown any longer. No new entry is being added. The database now looks like this:

1975-04-03-, 1994-08-26)
Person(John Doe, Bigtown, 1994-08-26, 2001-04-01)

See also

Related Research Articles

<span class="mw-page-title-main">Database</span> Organized collection of data in computing

In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and analyze the data. The DBMS additionally encompasses the core facilities provided to administer the database. The sum total of the database, the DBMS and the associated applications can be referred to as a database system. Often the term "database" is also used loosely to refer to any of the DBMS, the database system or an application associated with the database.

<span class="mw-page-title-main">PostgreSQL</span> Free and open-source object relational database management system

PostgreSQL, also known as Postgres, is a free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transactions with atomicity, consistency, isolation, durability (ACID) properties, automatically updatable views, materialized views, triggers, foreign keys, and stored procedures. It is supported on all major operating systems, including Linux, FreeBSD, OpenBSD, macOS, and Windows, and handles a range of workloads from single machines to data warehouses or web services with many concurrent users.

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, online analytical processing, or OLAP, is an approach to quickly answer multi-dimensional analytical (MDA) queries. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, report writing and data mining. Typical applications of OLAP include business reporting for sales, marketing, management reporting, business process management (BPM), budgeting and forecasting, financial reporting and similar areas, with new applications emerging, such as agriculture.

<span class="mw-page-title-main">Extract, transform, load</span> Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is extracted from an input source, transformed, and loaded into an output data container. The data can be collated from one or more sources and it can also be output to one or more destinations. ETL processing is typically executed using software applications but it can also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on recurring schedules either as single jobs or aggregated into a batch of jobs.

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.

In temporal databases, transaction time is the time when some data has been loaded into a database. The time when a transaction is valid can be called the transaction time-period. It is a technical timeline controlled by a integration layer. More formally, it is the point-in-time during which a fact stored in the database is considered to be true.

Btrieve is a database developed by Pervasive Software. The architecture of Btrieve has been designed with record management in mind. This means that Btrieve only deals with the underlying record creation, data retrieval, record updating and data deletion primitives. Together with the MicroKernel Database Engine it uses ISAM, Indexed Sequential Access Method, as its underlying storage mechanism.

<span class="mw-page-title-main">Fact table</span> Business data structure

In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, these are arranged as a fact constellation schema. A fact table typically has two types of columns: those that contain facts and those that are a foreign key to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non-additive, and semi-additive measures.

In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. The result is a delta-driven dataset.

Sixth normal form (6NF) is a normal form used in relational database normalization which extends the relational algebra and generalizes relational operators to support interval data, which can be useful in temporal databases.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

In the data warehouse practice of extract, transform, load (ETL), an early fact or early-arriving fact, also known as late-arriving dimension or late-arriving data, denotes the detection of a dimensional natural key during fact table source loading, prior to the assignment of a corresponding primary key or surrogate key in the dimension table. Hence, the fact which cites the dimension arrives early, relative to the definition of the dimension value. An example could be backdating or making corrections to data.

<span class="mw-page-title-main">Data vault modeling</span> Database modeling method

Datavault or data vault modeling is a database modeling method that is designed to provide long-term historical storage of data coming in from multiple operational systems. It is also a method of looking at historical data that deals with issues such as auditing, tracing of data, loading speed and resilience to change as well as emphasizing the need to trace where all the data in the database came from. This means that every row in a data vault must be accompanied by record source and load date attributes, enabling an auditor to trace values back to the source. The concept was published in 2000 by Dan Linstedt.

Richard Thomas Snodgrass is an American computer scientist and writer and is professor emeritus at the University of Arizona. He is best known for his work on temporal databases, query language design, query optimization and evaluation, storage structures, database design, and ergalics.

Bitemporal modeling is a specific case of temporal database information modeling technique designed to handle historical data along two different timelines. This makes it possible to rewind the information to "as it actually was" in combination with "as it was recorded" at some point in time. In order to be able to do so, information cannot be discarded even if it is erroneous. Within, for example, financial reporting it is often desirable to be able to recreate an old report both as it actually looked at the time of creation and as it should have looked given corrections made to the data after its creation.

SQL:2011 or ISO/IEC 9075:2011 is the seventh revision of the ISO (1987) and ANSI (1986) standard for the SQL database query language. It was formally adopted in December 2011. The standard consists of 9 parts which are described in detail in SQL. The next version is SQL:2016.

The following is provided as an overview of and topical guide to databases:

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

In temporal databases, decision time is the time when a decision was made about a fact stored in a database. It is used to keep a history of decisions about valid times.

References

  1. 1 2 3 "A not-so-gentle follow-up on bitemporal data challenges - Roelant Vos".
  2. "The Events API basics | Akeneo APIs". api.akeneo.com. Retrieved 2024-02-10.
  3. "Illuminated Computing | Survey of SQL:2011 Temporal Features". illuminatedcomputing.com. Retrieved 2024-06-18.
  4. "Application-period temporal tables". www.ibm.com. Retrieved 2024-06-18.
  5. "Application-Time Periods". MariaDB KnowledgeBase. Retrieved 2024-06-18.
  6. "SAP Help Portal". help.sap.com. Retrieved 2024-06-18.
  7. 1 2 "A gentle introduction to bitemporal data challenges - Roelant Vos".
  8. Richard T. Snodgrass and Ilsoo Ahn, "Temporal Databases," IEEE Computer 19(9), September, 1986, pp. 35-42.