Log trigger

Last updated

In relational databases, the log trigger or history trigger is a mechanism for automatic recording of information about changes inserting or/and updating or/and deleting rows in a database table.

Contents

It is a particular technique for change data capturing, and in data warehousing for dealing with slowly changing dimensions.

Definition

Suppose there is a table which we want to audit. This table contains the following columns:

Column1, Column2, ..., Columnn

The column Column1 is assumed to be the primary key.

These columns are defined to have the following types:

Type1, Type2, ..., Typen

The Log Trigger works writing the changes (INSERT, UPDATE and DELETE operations) on the table in another, history table, defined as following:

CREATETABLEHistoryTable(Column1Type1,Column2Type2,::ColumnnTypen,StartDateDATETIME,EndDateDATETIME)

As shown above, this new table contains the same columns as the original table, and additionally two new columns of type DATETIME: StartDate and EndDate. This is known as tuple versioning. These two additional columns define a period of time of "validity" of the data associated with a specified entity (the entity of the primary key), or in other words, it stores how the data were in the period of time between the StartDate (included) and EndDate (not included).

For each entity (distinct primary key) on the original table, the following structure is created in the history table. Data is shown as example.

example Example log trigger.png
example

Notice that if they are shown chronologically the EndDate column of any row is exactly the StartDate of its successor (if any). It does not mean that both rows are common to that point in time, since -by definition- the value of EndDate is not included.

There are two variants of the Log trigger, depending how the old values (DELETE, UPDATE) and new values (INSERT, UPDATE) are exposed to the trigger (it is RDBMS dependent):

Old and new values as fields of a record data structure

CREATETRIGGERHistoryTableONOriginalTableFORINSERT,DELETE,UPDATEASDECLARE@NowDATETIMESET@Now=GETDATE()/* deleting section */UPDATEHistoryTableSETEndDate=@NowWHEREEndDateISNULLANDColumn1=OLD.Column1/* inserting section */INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(NEW.Column1,NEW.Column2,...,NEW.Columnn,@Now,NULL)

Old and new values as rows of virtual tables

CREATETRIGGERHistoryTableONOriginalTableFORINSERT,DELETE,UPDATEASDECLARE@NowDATETIMESET@Now=GETDATE()/* deleting section */UPDATEHistoryTableSETEndDate=@NowFROMHistoryTable,DELETEDWHEREHistoryTable.Column1=DELETED.Column1ANDHistoryTable.EndDateISNULL/* inserting section */INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)SELECT(Column1,Column2,...,Columnn,@Now,NULL)FROMINSERTED

Compatibility notes

Data warehousing

According with the slowly changing dimension management methodologies, The log trigger falls into the following:

Implementation in common RDBMS

IBM Db2 [1]

-- Trigger for INSERTCREATETRIGGERDatabase.TableInsertAFTERINSERTONDatabase.OriginalTableREFERENCINGNEWASNFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;INSERTINTODatabase.HistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(N.Column1,N.Column2,...,N.Columnn,Now,NULL);END;-- Trigger for DELETECREATETRIGGERDatabase.TableDeleteAFTERDELETEONDatabase.OriginalTableREFERENCINGOLDASOFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;UPDATEDatabase.HistoryTableSETEndDate=NowWHEREColumn1=O.Column1ANDEndDateISNULL;END;-- Trigger for UPDATECREATETRIGGERDatabase.TableUpdateAFTERUPDATEONDatabase.OriginalTableREFERENCINGNEWASNOLDASOFOREACHROWMODEDB2SQLBEGINDECLARENowTIMESTAMP;SETNOW=CURRENTTIMESTAMP;UPDATEDatabase.HistoryTableSETEndDate=NowWHEREColumn1=O.Column1ANDEndDateISNULL;INSERTINTODatabase.HistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(N.Column1,N.Column2,...,N.Columnn,Now,NULL);END;

Microsoft SQL Server [2]

CREATETRIGGERTableTriggerONOriginalTableFORDELETE,INSERT,UPDATEASDECLARE@NOWDATETIMESET@NOW=CURRENT_TIMESTAMPUPDATEHistoryTableSETEndDate=@nowFROMHistoryTable,DELETEDWHEREHistoryTable.ColumnID=DELETED.ColumnIDANDHistoryTable.EndDateISNULLINSERTINTOHistoryTable(ColumnID,Column2,...,Columnn,StartDate,EndDate)SELECTColumnID,Column2,...,Columnn,@NOW,NULLFROMINSERTED

MySQL

DELIMITER$$/* Trigger  for INSERT */CREATETRIGGERHistoryTableInsertAFTERINSERTONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(New.Column1,New.Column2,...,New.Columnn,N,NULL);END;/* Trigger for DELETE */CREATETRIGGERHistoryTableDeleteAFTERDELETEONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();UPDATEHistoryTableSETEndDate=NWHEREColumn1=OLD.Column1ANDEndDateISNULL;END;/* Trigger for UPDATE */CREATETRIGGERHistoryTableUpdateAFTERUPDATEONOriginalTableFOREACHROWBEGINDECLARENDATETIME;SETN=now();UPDATEHistoryTableSETEndDate=NWHEREColumn1=OLD.Column1ANDEndDateISNULL;INSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(New.Column1,New.Column2,...,New.Columnn,N,NULL);END;

Oracle

CREATEORREPLACETRIGGERTableTriggerAFTERINSERTORUPDATEORDELETEONOriginalTableFOREACHROWDECLARENowTIMESTAMP;BEGINSELECTCURRENT_TIMESTAMPINTONowFROMDual;UPDATEHistoryTableSETEndDate=NowWHEREEndDateISNULLANDColumn1=:OLD.Column1;IF:NEW.Column1ISNOTNULLTHENINSERTINTOHistoryTable(Column1,Column2,...,Columnn,StartDate,EndDate)VALUES(:NEW.Column1,:NEW.Column2,...,:NEW.Columnn,Now,NULL);ENDIF;END;

Historic information

Typically, database backups are used to store and retrieve historic information. A database backup is a security mechanism, more than an effective way to retrieve ready-to-use historic information.

A (full) database backup is only a snapshot of the data in specific points of time, so we could know the information of each snapshot, but we can know nothing between them. Information in database backups is discrete in time.

Using the log trigger the information we can know is not discrete but continuous, we can know the exact state of the information in any point of time, only limited to the granularity of time provided with the DATETIME data type of the RDBMS used.

Advantages

Disadvantages

Examples of use

Getting the current version of a table

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHEREEndDateISNULL

It should return the same resultset of the whole original table.

Getting the version of a table in a certain point of time

Suppose the @DATE variable contains the point or time of interest.

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHERE@Date>=StartDateAND(@Date<EndDateOREndDateISNULL)

Getting the information of an entity in a certain point of time

Suppose the @DATE variable contains the point or time of interest, and the @KEY variable contains the primary key of the entity of interest.

SELECTColumn1,Column2,...,ColumnnFROMHistoryTableWHEREColumn1=@KeyAND@Date>=StartDateAND(@Date<EndDateOREndDateISNULL)

Getting the history of an entity

Suppose the @KEY variable contains the primary key of the entity of interest.

SELECTColumn1,Column2,...,Columnn,StartDate,EndDateFROMHistoryTableWHEREColumn1=@KeyORDERBYStartDate

Getting when and how an entity was created

Suppose the @KEY variable contains the primary key of the entity of interest.

SELECTH2.Column1,H2.Column2,...,H2.Columnn,H2.StartDateFROMHistoryTableASH2LEFTOUTERJOINHistoryTableASH1ONH2.Column1=H1.Column1ANDH2.Column1=@KeyANDH2.StartDate=H1.EndDateWHEREH2.EndDateISNULL

Immutability of primary keys

Since the trigger requires that primary key being the same throughout time, it is desirable to either ensure or maximize its immutability, if a primary key changed its value, the entity it represents would break its own history.

There are several options to achieve or maximize the primary key immutability:

Alternatives

Sometimes the Slowly changing dimension is used as a method, this diagram is an example:

Scd model Scd model.png
Scd model

See also

Notes

The Log trigger was written by Laurence R. Ugalde [3] to automatically generate history of transactional databases.

Log trigger on GitHub

Related Research Articles

Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as part of his relational model.

A relational database is a digital database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relational database systems have an option of using the SQL for querying and maintaining the database.

The relational model (RM) for database management is an approach to managing data using a structure and language consistent with first-order predicate logic, first described in 1969 by English computer scientist Edgar F. Codd, where all data is represented in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational database.

SQL is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data, i.e. data incorporating relations among entities and variables. SQL offers two main advantages over older read–write APIs such as ISAM or VSAM. Firstly, it introduced the concept of accessing many records with one single command. Secondly, it eliminates the need to specify how to reach a record, e.g. with or without an index.

First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement. SQL-92 does not support creating or using table-valued columns, which means that using only the "traditional relational database features" most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called no sql systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow json.

Extract, transform, load Procedure in computing

In computing, extract, transform, load (ETL) is a three-phase process where data is first extracted then transformed and finally loaded into an output data container. The data can be collated from one or more sources and it can also be outputted to one or more destinations. ETL processing is typically executed using software applications but it can be also be done manually by system operators. ETL software typically automates the entire process and can be run manually or on reoccurring schedules either as single jobs or aggregated into a batch of jobs.

A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to a certain kind of inclusion dependency constraints, specifically a constraint that the tuples consisting of the foreign key attributes in one relation, R, must also exist in some other relation, S, and furthermore that those attributes must also be a candidate key in S. In simpler words, a foreign key is a set of attributes that references a candidate key. For example, a table called TEAM may have an attribute, MEMBER_NAME, which is a foreign key referencing a candidate key, PERSON_NAME, in the PERSON table. Since MEMBER_NAME is a foreign key, any value existing as the name of a member in TEAM must also exist as a person's name in the PERSON table; in other words, every member of a TEAM is also a PERSON.

Referential integrity Where all data references are valid

Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute, then the referenced value must exist.

In the context of SQL, data definition or data description language (DDL) is a syntax for creating and modifying database objects such as tables, indices, and users. DDL statements are similar to a computer programming language for defining data structures, especially database schemas. Common examples of DDL statements include CREATE, ALTER, and DROP.

An SQL INSERT statement adds one or more records to any single table in a relational database.

A database trigger is procedural code that is automatically executed in response to certain events on a particular table or view in a database. The trigger is mostly used for maintaining the integrity of the information on the database. For example, when a new record is added to the employees table, new records should also be created in the tables of the taxes, vacations and salaries. Triggers can also be used to log historical data, for example to keep track of employees' previous salaries.

A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without having to search every row in a database table every time a database table is accessed. Indexes can be created using one or more columns of a database table, providing the basis for both rapid random lookups and efficient access of ordered records.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It's also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

A slowly changing dimension (SCD) in data management and data warehousing is a dimension which contains relatively static data which can change slowly but unpredictably, rather than according to a regular schedule. Some examples of typical slowly changing dimensions are entities as names of geographical locations, customers, or products.

Virtuoso Universal Server Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is an "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

In relational database management systems, a unique key is a candidate key that is not the primary key of the relation. All the candidate keys of a relation can uniquely identify the records of the relation, but only one of them is used as the primary key of the relation. The remaining candidate keys are called unique keys because they can uniquely identify a record in a relation. Unique keys can consist of multiple columns. Unique keys are also called alternate keys. Unique keys are an alternative to the primary key of the relation. Generally, the unique keys have a UNIQUE constraint assigned to it in order to prevent duplicates. Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

Apache Cassandra Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

PL/SQL is Oracle Corporation's procedural extension for SQL and the Oracle relational database. PL/SQL is available in Oracle Database, Times Ten in-memory database, and IBM Db2. Oracle Corporation usually extends PL/SQL functionality with each successive release of the Oracle Database.

The following is provided as an overview of and topical guide to databases:

The syntax of the SQL programming language is defined and maintained by ISO/IEC SC 32 as part of ISO/IEC 9075. This standard is not freely available. Despite the existence of the standard, SQL code is not completely portable among different database systems without adjustments.

References

  1. "Database Fundamentals" by Nareej Sharma et al. (First Edition, Copyright IBM Corp. 2010)
  2. "Microsoft SQL Server 2008 - Database Development" by Thobias Thernström et al. (Microsoft Press, 2009)
  3. "R. Ugalde, Laurence; Log trigger". GitHub. Retrieved 2022-06-26.