Log trigger

Last updated

In relational databases, the log trigger or history trigger is a mechanism for automatic recording of information about changes inserting or/and updating or/and deleting rows in a database table.

Contents

It is a particular technique for change data capturing, and in data warehousing for dealing with slowly changing dimensions.

Definition

Suppose there is a table which we want to audit. This table contains the following columns:

Column1, Column2, ..., Columnn

The column Column1 is assumed to be the primary key.

These columns are defined to have the following types:

Type1, Type2, ..., Typen

The Log Trigger works writing the changes (INSERT, UPDATE and DELETE operations) on the table in another, history table, defined as following:

CREATETABLEHistoryTable(Column1Type1,Column2Type2,::ColumnnTypen,StartDateDATETIME,EndDateDATETIME)

As shown above, this new table contains the same columns as the original table, and additionally two new columns of type DATETIME: StartDate and EndDate. This is known as tuple versioning. These two additional columns define a period of time of "validity" of the data associated with a specified entity (the entity of the primary key), or in other words, it stores how the data were in the period of time between the StartDate (included) and EndDate (not included).

For each entity (distinct primary key) on the original table, the following structure is created in the history table. Data is shown as example.

example Example log trigger.png
example

Notice that if they are shown chronologically the EndDate column of any row is exactly the StartDate of its successor (if any). It does not mean that both rows are common to that point in time, since -by definition- the value of EndDate is not included.

There are two variants of the Log trigger, depending how the old values (DELETE, UPDATE) and new values (INSERT, UPDATE) are exposed to the trigger (it is RDBMS dependent):

Old and new values as fields of a record data structure

CREATETRIGGERHistoryTableONOriginalTableFORINSERT,DELETE,UPDATEASDECLARE@NowDATETIMESET@Now=GETDATE()/* deleting section */UPDATEHistoryTableSETEndDate=@NowWHEREEndDateISNULLANDColumn1=OLD.Column1/* inserting section */INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(NEW.Column1,NEW.Column2,...,NEW.Columnn,@Now,NULL)

Old and new values as rows of virtual tables

CREATETRIGGERHistoryTableONOriginalTableFORINSERT,DELETE,UPDATEASDECLARE@NowDATETIMESET@Now=GETDATE()/* deleting section */UPDATEHistoryTableSETEndDate=@NowFROMHistoryTable,DELETEDWHEREHistoryTable.Column1=DELETED.Column1ANDHistoryTable.EndDateISNULL/* inserting section */INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)SELECT(Column1,Column2,...,Columnn,@Now,NULL)FROMINSERTED

Compatibility notes

Data warehousing

According with the slowly changing dimension management methodologies, The log trigger falls into the following:

Implementation in common RDBMS

IBM Db2

Source: [1]

-- Trigger for INSERTCREATETRIGGERDatabase.TableInsertAFTERINSERTONDatabase.OriginalTableREFERENCINGNEWASNFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;INSERTINTODatabase.HistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(N.Column1,N.Column2,...,N.Columnn,Now,NULL);END;-- Trigger for DELETECREATETRIGGERDatabase.TableDeleteAFTERDELETEONDatabase.OriginalTableREFERENCINGOLDASOFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;UPDATEDatabase.HistoryTableSETEndDate=NowWHEREColumn1=O.Column1ANDEndDateISNULL;END;-- Trigger for UPDATECREATETRIGGERDatabase.TableUpdateAFTERUPDATEONDatabase.OriginalTableREFERENCINGNEWASNOLDASOFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;UPDATEDatabase.HistoryTableSETEndDate=NowWHEREColumn1=O.Column1ANDEndDateISNULL;INSERTINTODatabase.HistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(N.Column1,N.Column2,...,N.Columnn,Now,NULL);END;

Microsoft SQL Server

Source: [2]

CREATETRIGGERTableTriggerONOriginalTableFORDELETE,INSERT,UPDATEASDECLARE@NOWDATETIMESET@NOW=CURRENT_TIMESTAMPUPDATEHistoryTableSETEndDate=@nowFROMHistoryTable,DELETEDWHEREHistoryTable.ColumnID=DELETED.ColumnIDANDHistoryTable.EndDateISNULLINSERTINTOHistoryTable(ColumnID,Column2,...,Columnn,StartDate,EndDate)SELECTColumnID,Column2,...,Columnn,@NOW,NULLFROMINSERTED

MySQL

DELIMITER$$/* Trigger  for INSERT */CREATETRIGGERHistoryTableInsertAFTERINSERTONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(New.Column1,New.Column2,...,New.Columnn,N,NULL);END;/* Trigger for DELETE */CREATETRIGGERHistoryTableDeleteAFTERDELETEONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();UPDATEHistoryTableSETEndDate=NWHEREColumn1=OLD.Column1ANDEndDateISNULL;END;/* Trigger for UPDATE */CREATETRIGGERHistoryTableUpdateAFTERUPDATEONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();UPDATEHistoryTableSETEndDate=NWHEREColumn1=OLD.Column1ANDEndDateISNULL;INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(New.Column1,New.Column2,...,New.Columnn,N,NULL);END;

Oracle

CREATEORREPLACETRIGGERTableTriggerAFTERINSERTORUPDATEORDELETEONOriginalTableFOREACHROWDECLARENowTIMESTAMP;BEGINSELECTCURRENT_TIMESTAMPINTONowFROMDual;UPDATEHistoryTableSETEndDate=NowWHEREEndDateISNULLANDColumn1=:OLD.Column1;IF:NEW.Column1ISNOTNULLTHENINSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(:NEW.Column1,:NEW.Column2,...,:NEW.Columnn,Now,NULL);ENDIF;END;

Historic information

Typically, database backups are used to store and retrieve historic information. A database backup is a security mechanism, more than an effective way to retrieve ready-to-use historic information.

A (full) database backup is only a snapshot of the data in specific points of time, so we could know the information of each snapshot, but we can know nothing between them. Information in database backups is discrete in time.

Using the log trigger the information we can know is not discrete but continuous, we can know the exact state of the information in any point of time, only limited to the granularity of time provided with the DATETIME data type of the RDBMS used.

Advantages

Disadvantages

Examples of use

Getting the current version of a table

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHEREEndDateISNULL

It should return the same resultset of the whole original table.

Getting the version of a table in a certain point of time

Suppose the @DATE variable contains the point or time of interest.

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHERE@Date>=StartDateAND(@Date<EndDateOREndDateISNULL)

Getting the information of an entity in a certain point of time

Suppose the @DATE variable contains the point or time of interest, and the @KEY variable contains the primary key of the entity of interest.

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHEREColumn1=@KeyAND@Date>=StartDateAND(@Date<EndDateOREndDateISNULL)

Getting the history of an entity

Suppose the @KEY variable contains the primary key of the entity of interest.

SELECTColumn1,Column2,...,Columnn,StartDate,EndDateFROMHistoryTableWHEREColumn1=@KeyORDERBYStartDate

Getting when and how an entity was created

Suppose the @KEY variable contains the primary key of the entity of interest.

SELECTH2.Column1,H2.Column2,...,H2.Columnn,H2.StartDateFROMHistoryTableASH2LEFTOUTERJOINHistoryTableASH1ONH2.Column1=H1.Column1ANDH2.Column1=@KeyANDH2.StartDate=H1.EndDateWHEREH2.EndDateISNULL

Immutability of primary keys

Since the trigger requires that primary key being the same throughout time, it is desirable to either ensure or maximize its immutability, if a primary key changed its value, the entity it represents would break its own history.

There are several options to achieve or maximize the primary key immutability:

Alternatives

Sometimes the Slowly changing dimension is used as a method, this diagram is an example:

Scd model Scd model.png
Scd model

See also

Notes

The Log trigger was written by Laurence R. Ugalde [3] to automatically generate history of transactional databases.

Log trigger on GitHub

Related Research Articles

Database normalization is the process of structuring a relational database accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by British computer scientist Edgar F. Codd as part of his relational model.

A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970.

Structured Query Language (SQL) is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling structured data, i.e., data incorporating relations among entities and variables.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called NoSQL systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow JSON.

In the relational model of databases, a primary key is a designated attribute (column) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combination of attributes from the table to act as its primary key, or create a new attribute containing a unique ID that exists solely for this purpose.

A foreign key is a set of attributes in a table that refers to the primary key of another table, linking these two tables. In the context of relational databases, a foreign key is subject to an inclusion dependency constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S; furthermore that those attributes must also be a candidate key in S.

<span class="mw-page-title-main">Referential integrity</span> Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

<span class="mw-page-title-main">Data definition language</span> Syntax for defining data structures

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP. If you see a .ddl file, that means the file contains a statement to create a table. Oracle SQL Developer contains the ability to export from an ERD generated with Data Modeler to either a .sql file or a .ddl file.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases can be uni-temporal, bi-temporal or tri-temporal.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time said table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

A relational database management system uses SQL MERGE statements to INSERT new records or UPDATE or DELETE existing records depending on whether condition matches. It was officially introduced in the SQL:2003 standard, and expanded in the SQL:2008 standard.

In data management and data warehousing, a slowly changing dimension (SCD) is a dimension that stores data which, while generally stable, may change over time, often in an unpredictable manner. This contrasts with a rapidly changing dimension, such as transactional parameters like customer ID, product ID, quantity, and price, which undergo frequent updates. Common examples of SCDs include geographical locations, customer details, or product attributes.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

In relational database management systems, a unique key is a candidate key. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. In SQL, the unique keys have a UNIQUE constraint assigned to them in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source database management system designed to handle large volumes of data across multiple commodity servers. The system prioritizes availability and scalability over consistency, making it particularly suited for systems with high write throughput requirements due to its LSM tree indexing storage layer. As a wide-column database, Cassandra supports flexible schemas and efficiently handles data models with numerous sparse columns. The system is optimized for applications with well-defined data access patterns that can be incorporated into the schema design. Cassandra supports computer clusters which may span multiple data centers, featuring asynchronous and masterless replication. It enables low-latency operations for all clients and incorporates Amazon's Dynamo distributed storage and replication techniques, combined with Google's Bigtable data storage engine model.

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, TimesTen in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. "Database Fundamentals" by Nareej Sharma et al. (First Edition, Copyright IBM Corp. 2010)
  2. "Microsoft SQL Server 2008 - Database Development" by Thobias Thernström et al. (Microsoft Press, 2009)
  3. "R. Ugalde, Laurence; Log trigger". GitHub. Retrieved 2022-06-26.