Log trigger

In relational databases, the Log trigger is a mechanism for automatic recording of information about changes -inserting, updating and deleting rows- in a database table.

It is a particular technique for change data capturing, and -in data warehousing- for dealing with slowly changing dimensions.

Definition

Suppose there is a table named TableX, this is the table which we want to audit. This table contains the columns:

Column1, Column2, ..., Columnn

The column Column1 is assumed to be the primary key.

These columns are defined to have the following types:

Type1, Type2, ..., Typen

The Log Trigger works writing the changes (INSERT, UPDATE and DELETE operations) on the TableX table in another, "parallel" table, named here as LogTable and defined as following:

CREATE TABLE LogTable (
   Column1   Type1,
   Column2   Type2,
      :        :
   Columnn   Typen,

   StartDate DATETIME,
   EndDate   DATETIME
)

As shown above, this new table contains the same columns than the original table, and additionally two new columns of type DATETIME: StartDate and EndDate. This is known as tuple versioning. These additional columns define a period of time of "validity" of the data associated with an specified entity (the entity of the primary key), or in another words, it stores how the data were in the period of time between the StartDate (included) and EndDate (not included).

For each entity (distinct primary key) on the original table, the following structure is created in the parallel table. Data is shown as example.

Notice that if they are shown chronologically the EndDate column of any row is exactly the StartDate of its sucesor (if any). It does not mean that both rows are common to that point in time, since -by definition- the value of EndDate is not included.

The Log trigger is:

CREATE TRIGGER LoggingTableX ON TableX FOR INSERT, DELETE, UPDATE AS
DECLARE @Now DATETIME
SET @Now = GETDATE()

/* deleting section */

UPDATE LogTable SET EndDate = @Now
  FROM LogTable, DELETED
 WHERE LogTable.Column1 = DELETED.Column1
   AND LogTable.EndDate IS NULL

/* inserting section */

INSERT INTO LogTable
       (Column1, Column2, ..., Columnn, StartDate, EndDate)
SELECT (Column1, Column2, ..., Columnn, @Now, NULL)
  FROM INSERTED

Compatibility notes

The function GetDate() is used to get the system date and time, an specific RDBMS could either use another function name, or get this information by another way.
Several RDBMS (DB2, MySQL) do not support that the same trigger can be attached to more than one operation (INSERT, DELETE, UPDATE). In such a case a trigger must be created for each operation; For an INSERT operation only the inserting section must be specified, for a DELETE operation only the deleting section must be specified, and for an UPDATE operation both sections must be present, just as it is shown above (the deleting section first, then the inserting section), because an UPDATE operation is logically represented as a DELETE operation followed by an INSERT operation.
In the code shown, the logical tables are called DELETED and INSERTED. On an specific RDBMS they could have different names. Another RDBMS (DB2) even let the name of these logical tables be specified.
In the code shown, comments are in C/C++ style, they could not be supported by an specific RDBMS, or a different syntax should be used.
Several RDBMS require that the body of the trigger is enclosed between BEGIN and END keywords.

Data warehousing

According with the slowly changing dimension management methodologies, The log trigger falls into the following:

Type 2 (tuple versioning variant)
Type 4 (use of history tables)

Historic information

Typically, database backups are used to store and retrieve historic information. A database backup is a security mechanism, more than an effective way to retrieve ready-to-use historic information.

A (full) database backup is only an snapshot of the data in specific points of time, so we could know the information of each snapshot, but we can know nothing between them. Information in database backups is discrete in time.

Using the log trigger the information we can know is not discrete but continuous, we can know the exact state of the information in any point of time, only limited to the granularity of time provided with the DATETIME data type of the RDBMS used.

Advantages

It is simple.
It is not a commercial product, it works with available features in common [[Relational database management system|RDBMS].
It is automatic, once it is created, it works with no further human intervention.
It is not required to have good knowledge about the tables of the database, or the data model.
Changes in current programming are not required.
Changes in the current tables are not required, because log data of ant table is stored in a differen table.
It works for both programming queries and ad-hoc queries.
Only changes (INSERT, UPDATE and DELETE operations) are registered, so the growing rate of the log tables are proportional to the changes.
It is not necessary apply the trigger to all the tables on database, it can be applied to certain tables, or certain columns of a table.

Disadvantages

It does not automatically store information about the user producing the changes (information system user, not database user). This information might be provided explicitly. It could be enforced in information systems, but not in ad-hoc queries.

Examples of use

Getting the current version of a table

SELECT Column1, Column2, ..., Columnn
  FROM LogTable
 WHERE EndDate IS NULL

It should return the same resultset of the whole original table.

Getting the version of a table in a certain point of time

Suppose the @DATE variable contains the point or time of interest.

SELECT  Column1, Column2, ..., Columnn
  FROM  LogTable
 WHERE  @Date >= StartDate
   AND (@Date < EndDate OR EndDate IS NULL)

Getting the information of an entity in a certain point of time

Suppose the @DATE variable contains the point or time of interest, and the @KEY variable contains the primary key of the entity of interest.

SELECT  Column1, Column2, ..., Columnn
  FROM  LogTable
 WHERE  Column1 = @Key
   AND  @Date >= StartDate
   AND (@Date <  EndDate OR EndDate IS NULL)

Getting the history of an entity

Suppose the @KEY variable contains the primary key of the entity of interest.

SELECT Column1, Column2, ..., Columnn, StartDate, EndDate
  FROM LogTable
 WHERE Column1 = @Key
 ORDER BY StartDate

Getting when and how an entity was created