This is the second part of a multi-part post series I’m going to write about Code Generation using C#. In the previous part I’ve shown how to invoke C# Scripts (and CSX files) from PowerShell and Visual Studio. In this part, I’ll create a simple extractor to read the physical schema of a SQL Server Database and save it as a JSON file.

Creating the Project

The first step is creating a .NET project.

Running CSX scripts do not require csproj (or Visual Studio), but it’s very helpful to have a csproj for many reasons:

  • You get intellisense (mostly on CS, not so much on CSX - so try to use CSX as little as possible)
  • You can check for compilation-errors
  • You can configure a Console App (or any other entry point) to launch and debug your scripts
  • csproj is required for automatically restoring NuGet packages (although if you already have the NuGet packages you may just use them without csproj)

All libraries which I’ll use in my generator (Dapper and Newtonsoft) are compatible with netstandard2.0.
This means that the project type can be either .NET Framework (net472, which used to be called “full framework”) or .NET Core (netcore21 or netcore31) - both should work.

NuGet Packages

I’ll use two third-party libraries: Dapper (micro-ORM, for loading objects from SQL Queries) and Newtonsoft.Json (JSON library, just for serialization). All we need is their dlls, but the easiest way to getting those is by installing their NuGet packages to our csproj (which will automatically install those packages to anyone who tries to build our csproj, so it’s hassle-free collaboration among multiple developers).
If you’re not using Visual Studio IDE you can install those packages by running
dotnet add package Dapper and dotnet add package Newtonsoft.Json
in the same folder of your csproj.
If you’re using Visual Studio you can use the Package Manager Console and run this:

Install-Package Dapper
Install-Package Newtonsoft.Json

Sql Server Schema objects

Next, I’ll create my objects that will represent my physical SQL database. One important thing to remember is that C# scripting engines do NOT allow the use of namespaces.


using System;
using System.Collections.Generic;

public class SqlServerTable
    public string Database { get; set; }
    public string TableSchema { get; set; }
    public string TableName { get; set; }

    /// <summary>
    /// Can be "TABLE" or "VIEW"
    /// </summary>
    public string TableType { get; set; }

    public string TableDescription { get; set; }

    public List<SqlServerColumn> Columns { get; set; } = new List<SqlServerColumn>();

    /// <summary>
    /// FKs which point from THIS (Child) table to the primary key of OTHER (Parent) tables
    /// </summary>
    public List<SqlServerForeignKey> ForeignKeys { get; set; } = new List<SqlServerForeignKey>();

    /// <summary>
    /// FKs which point from OTHER (Child) tables to the primary key of THIS (Parent) table
    /// </summary>
    public List<SqlServerForeignKey> ChildForeignKeys { get; set; } = new List<SqlServerForeignKey>();


I’ll omit other classes for brevity, but you can refer to all code here (classes SqlServerColumn.cs, SqlServerForeignKey.cs, SqlServerForeignKeyMember.cs ).

Schema Reader

Finally, I’ll create the class which will read SQL metadata - SqlServerSchemaReader.cs:

using Dapper;
using System;
using System.Data;
using System.IO;
using System.Linq;

public class SqlServerSchemaReader
  public Func<IDbConnection> CreateDbConnection { get; set; }

  public SqlServerSchemaReader(Func<IDbConnection> createDbConnection)
    CreateDbConnection = createDbConnection;

  public void ExportSchemaToJSON(string outputJsonSchema)
    Console.WriteLine("Reading Database...");

    using (var cn = CreateDbConnection())
      var tables = cn.Query<SqlServerTable>(@"
          t.TABLE_CATALOG as [Database], 
          t.TABLE_SCHEMA as [TableSchema], 
          t.TABLE_NAME as [TableName], 
          ep.value as [TableDescription]
		    INNER JOIN sys.schemas sc ON t.TABLE_SCHEMA = sc.[name]
          ... full code omitted for brevity - please refer to: 

      var allColumns = cn.Query<SqlServerColumn>(@"
          ... full code omitted for brevity... 

      var fks = cn.Query<SqlServerForeignKey>(@"
          ... full code omitted for brevity... 

      var fkCols = cn.Query<SqlServerForeignKeyMember>(@"
          ... full code omitted for brevity... 
      foreach (var fk in fks)
        fk.Columns = fkCols.Where(c => 
            c.ForeignKeyConstraintName == fk.ForeignKeyConstraintName && 
            c.FKTableSchema == fk.FKTableSchema
        ).OrderBy(c => c.PKColumnOrdinalPosition).ToList();

      foreach (var table in tables)
        table.Columns = allColumns.Where(c => c.TableSchema == table.TableSchema && c.TableName == table.TableName).ToList();
        foreach(var column in table.Columns)
          column.ClrType = GetClrType(table, column);
        table.Columns.ForEach(c => { c.Database = null; c.TableSchema = null; c.TableName = null; });

        // We copy FKs and remove redundant properties of the parent object (table) which we're attaching this FK into
        table.ForeignKeys = Clone(fks.Where(fk => fk.FKTableSchema == table.TableSchema && fk.FKTableName == table.TableName).ToList());
        table.ForeignKeys.ForEach(fk => { fk.FKTableSchema = null; fk.FKTableName = null; });

        // We copy FKs and remove redundant properties of the parent object (table) which we're attaching this FK into
        table.ChildForeignKeys = Clone(fks.Where(fk => fk.PKTableSchema == table.TableSchema && fk.PKTableName == table.TableName).ToList());
        table.ChildForeignKeys.ForEach(fk => { fk.PKTableSchema = null; fk.PKTableName = null; });


      SqlServerDatabaseSchema schema = new SqlServerDatabaseSchema()
        LastRefreshed = DateTimeOffset.Now,
        Tables = tables,

      Console.WriteLine($"Saving into {outputJsonSchema}...");
      File.WriteAllText(outputJsonSchema, Newtonsoft.Json.JsonConvert.SerializeObject(schema, Newtonsoft.Json.Formatting.Indented));


  string GetClrType(SqlServerTable table, SqlServerColumn column)
    string sqlDataType = column.SqlDataType;
    switch (sqlDataType)
      case "bigint":
        return typeof(long).FullName;
      case "smallint":
        return typeof(short).FullName;
      case "int":
        return typeof(int).FullName;
      case "uniqueidentifier":
        return typeof(Guid).FullName;
      case "smalldatetime":
      case "datetime":
      case "datetime2":
      case "date":
      case "time":
        return typeof(DateTime).FullName;
      case "datetimeoffset":
        return typeof(DateTimeOffset).FullName;
      case "float":
        return typeof(double).FullName;
      case "real":
        return typeof(float).FullName;
      case "numeric":
      case "smallmoney":
      case "decimal":
      case "money":
        return typeof(decimal).FullName;
      case "tinyint":
        return typeof(byte).FullName;
      case "bit":
        return typeof(bool).FullName;
      case "image":
      case "binary":
      case "varbinary":
      case "timestamp":
        return typeof(byte[]).FullName;
      case "nvarchar":
      case "varchar":
      case "nchar":
      case "char":
      case "text":
      case "ntext":
      case "xml":
        return typeof(string).FullName;
        Console.WriteLine($"Unknown sqlDataType for {table.TableName}.{column.ColumnName}: {sqlDataType}");
        return null;

      // Vendor-specific types
      case "hierarchyid":
        return "Microsoft.SqlServer.Types.SqlHierarchyId"; // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.34+)
      case "geography":
        return "Microsoft.SqlServer.Types.SqlGeography";  // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.32+)
      case "geometry":
        return "Microsoft.SqlServer.Types.SqlGeometry";  // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.33)+

  public static T Clone<T>(T source)
    var serialized = JsonConvert.SerializeObject(source);
    return JsonConvert.DeserializeObject<T>(serialized);



The idea is that we should put as much as possible into .cs files (and as little as possible in csx script), since intellisense and compile-time checks work better than in the CSX. The script file RefreshDatabaseSchema.csx will basically load libraries, define connection strings and paths, and invoke SqlServerSchemaReader.cs:

/// <summary>
/// This CSX Script will invoke SqlServerSchemaReader, which extracts the schema of SQL database and saves into a JSON file.
/// The easiest way to launch csi.exe (which is shipped with Visual Studio) to run this script is by using PowerShell script RefreshDatabaseSchema.ps1
/// You can do that from Visual Studio (see instructions in RefreshDatabaseSchema.ps1) or you can just execute "Powershell RefreshDatabaseSchema.ps1"
/// </summary>

// System libraries
#r "System.Data.dll"

// Load 3rd-party libraries by their relative paths, relative to "$Env:userprofile\.nuget\packages\"
#r "dapper\2.0.35\lib\netstandard2.0\Dapper.dll"
#r "newtonsoft.json\12.0.3\lib\netstandard2.0\Newtonsoft.Json.dll"

// CS files are better than CSX because Intellisense and Compile-time checks works better. 
#load "SqlServerTable.cs"
#load "SqlServerColumn.cs"
#load "SqlServerForeignKey.cs"
#load "SqlServerForeignKeyMember.cs"
#load "SqlServerDatabaseSchema.cs"
#load "SqlServerSchemaReader.cs"

using System;
using System.IO;
using System.Runtime.CompilerServices;
using System.Data;
using System.Data.SqlClient;

// Helpers to get the location of the current CSX script
public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);

// location relative to the CSX script
string outputJsonSchema = Path.GetFullPath(Path.Combine(GetScriptFolder(), "AdventureWorksSchema.json")); 
string connectionString = @"Data Source=MYWORKSTATION\SQLEXPRESS;
                            Initial Catalog=AdventureWorks;
                            Integrated Security=True;";

Func<IDbConnection> connectionFactory = () => new SqlConnection(connectionString);
var reader = new SqlServerSchemaReader(connectionFactory);


Last, I’ll create a PowerShell to invoke the CSX file, which is useful because it can locate the csi.exe in multiple locations and because it can provide to csi.exe the location of per-user NuGet packages, so that CSX can load libraries by their relative-locations, without having to hard-code user-specific folders. RefreshDatabaseSchema.ps1:

# To Execute Powershell Scripts from Visual Studio:
# 1) Right-button PS1 file - "Open With...""
# 2) Configure:
#      Program: Powershell.exe
#      Arguments: -noexit -File %1
#      Friendly Name: Execute PowerShell Script

# To execute CSX scripts you'll need CSI.EXE (C# REPL) which is shipped with Visual Studio
# but can also be installed by using the NuGet package Microsoft.Net.Compilers.Toolset -

# For more info about launching CSX scripts from PowerShell or from Visual Studio, check

$dir = Split-Path $MyInvocation.MyCommand.Path 
$script = Join-Path $dir ".\RefreshDatabaseSchema.csx"

# Locate CSI.EXE by searching common paths
$csi = ( 
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Professional\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\15.0\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Roslyn\csi.exe"
) | Where-Object { Test-Path $_ } | Select-Object -first 1

if (!$csi)
    Write-Host "---------------------------------------" -for red
    Write-Host "Can't find csi.exe" -for red
    Write-Host "Please fix search paths above, or install NuGet Microsoft.Net.Compilers.Toolset" -for red
    Write-Host "---------------------------------------" -for red
    Exit 1

$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()

Write-host "Starting csi.exe $script ..." -for yellow
& $csi /lib:"$Env:userprofile\.nuget\packages\" $script

Write-Host "Finished in $($stopwatch.Elapsed.TotalMilliSeconds) milliseconds"

# Since I configured "-noexit" parameter in Visual Studio I don't need this
#if ($host.Name -notmatch 'ISE') { Write-Host -NoNewLine "(Just press Enter to exit)" -for cyan; read-host; }  

Running Powershell:

Result file AdventureWorksSchema.json:

Full Source code available here

This is the second part of a multi-part post series I’m going to write about Code Generation using C#:

Many newcomers (and even experienced developers who never had to manage their own SQL Servers) usually don’t understand the difference between Full Recovery mode and Simple Recovery mode, and more than that they don’t know how to manage their SQL databases so that their LDF files won’t grow indefinitely.

Full Recovery vs Simple Recovery

Simple Recovery uses transaction log only for uncommited transactions, so this file is very small, and doesn’t grow forever (allocated space is reused), so there is no Log backup for Simple Recovery.

Full Recovery tracks all past transactions in the transaction log, so log grows forever unless we do maintenance .

The major advantage of Full Recovery is that it allows us to use point in time restore, which allows to restore to an exact time, reverting (or inspecting) pontual database changes.

I strongly recommend Full recovery for any serious production environment.

Ola Hallengren Maintenance Scripts

Ola Hallengren is an experienced DBA who has developed these amazing SQL Server Maintenance scripts, comprised of scripts for running backups, integrity checks, and index and statistics maintenance on all editions of Microsoft SQL Server.

Download the scripts here, run then on your server, and it will install many useful stored procedures into your master database.

I strongly recommend those scripts so that you don’t need to reinvent the wheel.

Backup Databases

The following script will backup all your user databases (applies both to Full Recovery or Simple Recovery)

-- Full backup (keeping 7 last days)
EXECUTE master.dbo.DatabaseBackup
@Databases = 'USER_DATABASES',
@Directory = 'C:\Backup',
@BackupType = 'FULL',
@Verify = 'Y',
@Compress = 'N',
@CheckSum = 'Y',
@CleanupTime = 168 -- 168hs =  7-day retention

OBS: “FULL” is for “FULL backup” (as opposed to DIFFERENTIAL backup), and has NOTHING to do with “Full Recovery”.
OBS: USER_DATABASES can be replaced by a single database

Backup Transaction Log

The following script will backup transaction log for your user databases (applies only to Full Recovery)

We should periodically backup the transaction log, so that it doesn’t grow indefinitely. Backing up the transaction log does NOT automatic shrink the transaction log, but ALLOWS us to shrink that log because SQL can only consider part of the transaction log as “unused space” if the respective transactions were backed up before.

-- Transaction Log Backup
EXECUTE master.dbo.DatabaseBackup
@Databases = 'USER_DATABASES',
@Directory = 'C:\Backup',
@BackupType = 'LOG',
@ChangeBackupType = 'Y'

OBS: Simple Recovery uses transaction log only for uncommited transactions, so there is no Log backup for Simple Recovery
OBS: USER_DATABASES can be replaced by a single database

Switching from Full to Simple Recovery


OBS: The transaction log won’t shrink automatically! See below.

Switching from Simple to Full Recovery:


Shrink the Transaction Log file (LDF)

If the database was in Full Recovery and didn’t had log maintenance, probably the log file became huge.

Shrink the Transaction Log for a database in Simple Recovery Mode

If the database is running in Simple Recovery, you can shrink easily:

-- Shrink the file
USE yourdb;
DBCC SHRINKFILE(yourdb_log, 10); -- 10MB should be enough for us

OBS: yourdb_log is the LOGICAL NAME of the log file, and is not necessarily the same name as the physical file in disk, specially if your database was renamed or restored from a backup with different name.
e.g. if we create a database “Northwind” and later rename or restore into a database “Northwind2012” , the internal name of your log file will be Northwind_log, even if the physical file is called Northwind2012_log.ldf or whatever.
To obtain the logical name of your log file, use this command:
SELECT name FROM sys.database_files where type_desc='LOG'

Shrink the Transaction Log for a database in Full Recovery Mode

If the database is running in Full Recovery, shrinking may be harder. First of all, note that the transaction log has a pointer that indicates which part of the log file was backed up and which part wasn’t backed up. Shrinking can only remove parts which were backup up, so first of all, make a transaction log backup (see above Backup Transaction Log).

To shrink the transaction log, you should use the same shrinking command as used for Simple Recovery (above), however, it’s common that we get an annoying error where the log doesn’t shrink and you get an error Cannot shrink log file 2 (My DB_Log) because the logical log file located at the end of the file is in use.

To prevent/avoid this problem, just backup the transaction log of your database and shrink the log file immediately after:

use master;
EXECUTE master.dbo.DatabaseBackup
@Databases = 'yourdb',
@Directory = 'C:\Backup',
@BackupType = 'LOG',
@ChangeBackupType = 'Y';

USE yourdb;
DBCC SHRINKFILE(yourdb_log, 200);

If the transaction log is already too large and you can’t even backup, just truncate the log file by temporarily changing to simple recovery mode:


USE yourdb;
DBCC SHRINKFILE(yourdb_log, 200);


Finally, after shrinking the file, configure it for auto-growth:

USE [master];

  (NAME = yourdb_log, SIZE = 300MB, FILEGROWTH = 50MB);

Maintenance Jobs

Based on the script’s author suggestion, this my suggested schedule for your servers:

User databases:

  • Integrity check (weekly)

    EXECUTE dbo.DatabaseIntegrityCheck
    @Databases = 'USER_DATABASES',
    @CheckCommands = 'CHECKDB',
    @PhysicalOnly = 'Y'
  • Index & Statistics maintenance (weekly)

    EXECUTE dbo.IndexOptimize @Databases = 'USER_DATABASES',
    @FragmentationLow = NULL,
    @FragmentationLevel1 = 5,
    @FragmentationLevel2 = 30,
    @UpdateStatistics = 'ALL',
    @OnlyModifiedStatistics = 'Y' 
  • Full backup (daily)

    EXECUTE dbo.DatabaseBackup
    @Databases = 'USER_DATABASES',
    @Directory = 'C:\Backup',
    @BackupType = 'FULL',
    @Verify = 'Y',
    @Compress = 'Y',
    @CheckSum = 'Y',
    @CleanupTime = 168 -- 168hs =  7-day retention
  • Transaction log (daily or hourly [or even more frequent] if we want to set up database mirroring)

    EXECUTE dbo.DatabaseBackup
    @Databases = 'USER_DATABASES',
    @Directory = 'C:\Backup',
    @BackupType = 'LOG',
    @ChangeBackupType = 'Y'

System databases:

  • Full backup (daily)
  • Integrity check (daily)


  • sp_delete_backuphistory (weekly)
  • sp_purge_jobhistory (weekly)
  • CommandLog cleanup (weekly)
  • Output file cleanup (weekly)

TO DO: use xp_cmdshell to zip backups, because SQL Express can’t compress backups

TO DO: modify DatabaseBackup script to automatically shrink log file immediately after a LOG backup.

On this post I just wanted to make a cheat sheet and scratch the surface. If you want to gain a deep understanding of Transaction Log backup and shrinking, please refer to this answer and the references linked there.

Auditing Tables

Auditing tables are used to track transactions against a particular table or tables, and sometimes can be used to audit even read-only queries (SELECTS) on your tables (but this is not the subject of this post). SQL Server has an out-of-the-box audit feature, and some other alternatives, however you may prefer a custom solution, where you can have more control and better understanding of the audit tables.

One popular trigger-based solution for this problem is described in this article (with SQL scripts for generating the audit tables (also called shadow tables) and triggers. This solution (from the article) creates one record for each operation (Insert, Update, Delete), obviously with some added columns like the date of the operation, and the user who made the operation. The problem with that design is that it’s difficult to find the state of a particular record at a given time - obviously you can use TOP 1 and ORDER BY to find the state of a single record at any point in time, but that gets difficult when you have to join versioned tables, or even finding a set of records that existed at that given time. So it’s not a good solution for versioning data.

Another problem is that unless your application uses Windows Authentication (or that you are still in 90’s when it was common that each user of client-server applications had dedicated database connections to the database), logging the database user that made the operation is useless - you probably want to know which application user made the operation.

There are other solutions that may (or may not) save some space by tracking only the modified columns, but they also face the same problems that I’ve mentioned earlier.

Personally I believe it’s much better to waste some disk space in favor of something that gives me a better performance and makes daily development easier, because developer’s time is usually much more expensive than disk space.

Data Versioning

When people think about versioning data, they usually think of storing the versioned records in the same table as your main table (where your active records exist). Please don’t. That would radically increase the complexity on your queries in order to make sure the latest version of each record is being used and that you are not duplicating results because of past records. That’s very error-prone. It will also hurt performance of your database. The most common error of versioning in database design is to keep past prices in the same table as current prices. The best place to store past versions of your data is in a separate table.

Similarly, using soft deletes (that famous IsDeleted flag) is a bad idea for the same reasons. The correct place to place your historical data is in a separate report database, and not inside your transactional application. If you understand this design and follow this rule, be careful on what you consider a deleted record: what should be deleted (and possibly moved into your audit/historical table) are records that shouldn’t exist and were created by mistake or similar acts. A duplicate customer is a good example of something that should be deleted, specially because it would force you to either merge the related entities into the correct record or cascade delete them. A batch of accounting entries (posts) that were incorrectly calculated (and were not yet used in your monthly balance or taxes) should be deleted (and should be probably logged into your audit table). A student that is inactive (because he is not enrolled into any courses) should not be deleted.
In general, if an entity can come back to life (like the student) it shouldn’t be deleted (and flagging as inactive is perfectly correct and should not be confused with a soft delete), but if the entity was just wrong and could be recreated somehow (like the accounting entries) or is already created (like versioned records, or like the duplicated customer) then it should be deleted (and not soft deleted, which will cause you more harm than good). A good smell that you are incorrectly using soft deletes is when your application button says “Delete” and you are not really deleting. If you just inactivating the record, probably the button should reflect that.

In summary, your transactional tables should keep only active data - not deleted records, and not past revisions. Don’t be lazy: create new tables (and new CRUDs) for historical data - It will take you a few minutes but will save you countless hours later, so I’m sure it’s a good investment.

Let’s kill two birds with a stone, and use audit tables also for versioning data

Since Versioning Tables and Audit Tables have much in common, I decided that I would use a single structure for both. And that led me to make a few changes in the triggers/audit tables from this article.

This is my proposed design for audit-tables:

  • Like other solutions, each audit table has the same columns as the audited table, but a new identity primary key.
  • For tracking the user who made an operation I keep both ID (int) and Username (varchar) columns. When I can identify the application user I have both his ID and his Name or Login. When I can’t identify the user who is doing the transaction (when something happens outside the application) I track the SQL user that was used, his hostname and IP.
  • Each audit row has both columns for tracking when that record revision started existing (it could be either a new record, or modified from a previous state), and also for tracking when that record revision stopped existing (it could be either a deleted record, or modified to a newer state).
  • AuditStartDate tracks the starting moment of the record revision, AuditEndDate tracks the ending moment for that revision.
  • AuditStartUserID, AuditStartUserName, AuditEndUserID and AuditEndUserName are the User ID and User Name that respectively put the record into that state and the one that removed the record from that state.
  • AuditStartOperation is I (INSERTED) if the tracked record is new (first revision) or U (UPDATED) if the tracked record already existed before and was just updated.
  • AuditEndOperation is D (DELETED) if the tracked record ceased existing because it was deleted, or U if the tracked record just was updated to a new state.
  • AuditStartTransactionGUID and AuditEndTransactionGUID are just unique identifiers that I use to know which operations happened in the same transaction, and mostly for connecting the previous state of a record to the next state. (more on that later).
  • As you may have noticed, I don’t have RevisionID for numbering the revisions of each record. That would force me to refer to the audit table itself, and maybe it could even generate some deadlocks. I just decided that I don’t need it. I can renumber my audit records whenever I need.

A visual example to make things clear

Richard Drizin created the product.
When some record is created, the auditing table will create a record which will track all the information that was inserted into the audited table (highlighted fields in screenshot), and will also add some tracking information (non-highlighted fields) that contain the operation (Insert), the date when it was inserted, and the user who inserted.

Mickey Mouse updated the product (changed the unit price).
When some record is updated, the auditing table will create a new record which will track the new state for the record, and should also mark that the previous revision is not valid anymore. The highlighted fields on the top-right are tracking information for the new revision, which is the same tracking information that is used for updating the end-of-life of the previous revision (highlighted on bottom left). Please note that the date and the transaction which were used on the new revision are exactly the same that were used for marking the end of the lifetime of the past revision - this gives you an easy and elegant way to link the previous state to the new state, and using the exact same datetime is important to have contiguous time periods. Also note that the EndOperation of revision 1 was marked as “U” since that revision was not deleted, but updated into a new state.

Donald Duck deleted the product.
When some record is deleted, no new revisions are created, however the previous active revision must be marked to inform that it’s not valid anymore. The highlighted fields are the ones which were updated in previous revision, and show the user who deleted, and the deletion date.

Queries will be as simple as this:

-- To find the ACTIVE version
SELECT * FROM [Audit_Products] WHERE GETDATE() BETWEEN AuditStartDate AND AuditEndDate

 -- To find the version that existed at any given time
SELECT * FROM [Audit_Products] WHERE @SomeDate BETWEEN AuditStartDate AND AuditEndDate.
-- AuditEndOperation would indicate if that version is still active (NULL), if it was DELETED ('D') or if it was somehow UPDATED ('U')

-- To find the first version
SELECT * FROM [Audit_Products] WHERE AuditStartOperation='I'

-- To find the last version (even if deleted)
SELECT * FROM [Audit_Products] WHERE AuditEndDate='9999-12-31' OR AuditEndOperation='D'

Please note that some of those queries suppose that you are using surrogate keys, which guarantee that under normal conditions each key will have only one insert and at most one delete. And afterall, using surrogate keys is almost always (if not always) a good choice.

Please also note that SQL BETWEEN is INCLUSIVE, so if you use BETWEEN it is possible (although very unlikely, because you would have to search for the exact moment that some record was updated) that you get 2 different revisions. I used BETWEEN to make the article easier to understand, but when searching for the state of records in a past point in time it’s safer to use half-open interval [AuditStartDate, AuditEndDate) : @SomeDate >= AuditStartDate AND @SomeDate < AuditEndDate.

Also, please note that if you run GETDATE() >= AuditStartDate AND GETDATE()< AuditEndDate you could still have problems, because each time you run GETDATE() (even in a single statement) could lead to a different value. But if you want to get current records instead of using GETDATE() you can also check for the AuditEndOperation.

So in summary these are fail-proof queries:

-- To find the ACTIVE version
SELECT * FROM [Audit_Products] WHERE AuditEndOperation IS NULL

 -- To find the version that existed at any given time
SELECT * FROM [Audit_Products] WHERE @SomeDate>=AuditStartDate AND @SomeDate<AuditEndDate.
-- AuditEndOperation would indicate if that version is still active (NULL), if it was DELETED ('D') or if it was somehow UPDATED ('U')

-- To find the first version
SELECT * FROM [Audit_Products] WHERE AuditStartOperation='I'

-- To find the last version (even if deleted)
SELECT * FROM [Audit_Products] WHERE AuditEndDate='9999-12-31' OR AuditEndOperation='D'
-- or even
SELECT * FROM [Audit_Products] WHERE AuditEndOperation IS NULL OR AuditEndOperation='D'

The implementation

For tracking in your tables which system user (and not database user) made an operation, you must somehow pass that information from your application to your database connection. At first I was using SQL Context Info for passing information about the current logged user, but then I decided to use temporary tables for that, to avoid the complexity of binary serialization. This is how I pass information to my triggers:

CREATE PROCEDURE [dbo].[sp_SetContextInfo]
 @UserID INT,
 @Username varchar(128) = NULL
	CREATE TABLE #session ([Username] varchar(128), [UserID] int NOT NULL)
	INSERT INTO #session VALUES (@Username, @UserID)

This is how I receive information from my triggers:

CREATE PROCEDURE [dbo].[sp_GetContextInfo]
 @Username varchar(128) OUTPUT,
	SET @UserID = 0
	SET @Username = NULL
	SET @TransactionGUID = NEWID()
	IF OBJECT_ID('tempdb..#session') IS NOT NULL BEGIN -- Get @Username and @UserID given by the application
		SELECT @Username = Username, @UserID = COALESCE(UserID, 0), @TransactionGUID = COALESCE(TransactionGUID, NEWID())
		FROM #session
	IF (@Username IS NULL) -- if no application user was given, get sql user, hostname and ip
		SELECT @Username = '[' + SYSTEM_USER   + '] ' + RTRIM(CAST(hostname AS VARCHAR))
                + ' (' + RTRIM(CAST(CONNECTIONPROPERTY('client_net_address') AS VARCHAR)) + ')'
		from master..sysprocesses where spid = @@spid

This is how I pass information about the current user to the database connections (using C# and Entity Framework 6), so that every change can be tracked down to the correct user:

namespace NorthwindAudit
    partial class NorthwindAuditDB

        /// <summary>
        /// Currently logged user that is using the connection. For auditing purposes.
        /// </summary>
        public string Username { get; set; }
        /// <summary>
        /// Currently logged user that is using the connection. For auditing purposes.
        /// </summary>
        public int UserID { get; set; }

        // modify your constructor to force developer to pass the username and userid.
        public NorthwindAuditDB(string Username, int UserID) : this()
            this.Username = Username;
            this.UserID = UserID;
            this.Configuration.LazyLoadingEnabled = true;

            // you may want to disable this if you have some batch jobs that dont run on users context... but I like to enforce that caller always provide some user
            if (this.UserID == 0 || this.Username == null)
                throw new ArgumentNullException("You must provide the application user, for auditing purposes");

            this.Database.Connection.StateChange += new System.Data.StateChangeEventHandler(Connection_StateChange);

        //pass the application user to the SQL when the connection opens (because the connection could already have been used by another DbContext)
        void Connection_StateChange(object sender, System.Data.StateChangeEventArgs e)
            // State changed to Open
            if (e.CurrentState == ConnectionState.Open && e.OriginalState != ConnectionState.Open)
                SetConnectionUser(this.UserID, this.Username);
        void SetConnectionUser(int userID, string username)
            // Create local temporary context table
            var cmd = this.Database.Connection.CreateCommand();
            cmd.CommandText = "IF OBJECT_ID('tempdb..#session') IS NOT NULL DROP TABLE #session";

            if (userID != 0 && username != null)
                cmd.CommandText = "CREATE TABLE #session ([Username] varchar(128), [UserID] int NOT NULL, [TransactionGUID] UNIQUEIDENTIFIER NOT NULL)";
                cmd.CommandText = "INSERT INTO #session ([Username], [UserID], [TransactionGUID]) VALUES (@Username, @UserID, NEWID())";
                cmd.Parameters.Add(new SqlParameter("@UserID", userID));
                cmd.Parameters.Add(new SqlParameter("@Username", username ?? ""));

        // This probably is not necessary, but I like to check that the session table matches the provided user. 
        // I haven't made stress testing for concurrency issues, so better safe than sorry.
        public override int SaveChanges()

            if (this.UserID == 0 || this.Username == null)
                throw new ArgumentNullException("You must provide a user for the connection, for auditing purposes");

            #region Just in case! Double checking that table #session was created and that it matches the user for the context
            bool wasClosed = false;
            if (this.Database.Connection.State == ConnectionState.Closed)
                wasClosed = true;
            var cmd = this.Database.Connection.CreateCommand();
            cmd.CommandText = "EXEC [dbo].[sp_GetContextInfo] @UserID OUTPUT, @Username OUTPUT, @TransactionGUID OUTPUT";
            var parm1 = new SqlParameter("@UserID", SqlDbType.Int); parm1.Direction = ParameterDirection.Output; cmd.Parameters.Add(parm1);
            var parm2 = new SqlParameter("@Username", SqlDbType.VarChar, 128); parm2.Direction = ParameterDirection.Output; cmd.Parameters.Add(parm2);
            var parm3 = new SqlParameter("@TransactionGUID", SqlDbType.UniqueIdentifier); parm3.Direction = ParameterDirection.Output; cmd.Parameters.Add(parm3);

            //Error: ExecuteNonQuery requires an open and available Connection

            if (wasClosed)

            if (parm1.Value == null || ((int)parm1.Value) == 0 || parm2.Value == null || string.IsNullOrEmpty((string)parm2.Value))
                throw new ArgumentNullException("You must provide a user for the connection, for auditing purposes");
            if (((int)parm1.Value) != this.UserID || ((string)parm2.Value) != this.Username)
                throw new ArgumentNullException("The user provided in #session table does not match the user provided on the connection (DbContext)");

            return base.SaveChanges();


This is a sample of Audit Table for Northwind Orders table:

CREATE TABLE [audit].[Audit_dboProducts](
	[Audit_dboProductsID] [int] IDENTITY(1,1) NOT NULL,
	[ProductID] [int] NOT NULL,
	[CategoryID] [int] NULL,
	[Discontinued] [bit] NOT NULL,
	[ProductName] [nvarchar](40) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
	[QuantityPerUnit] [nvarchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
	[ReorderLevel] [smallint] NULL,
	[SupplierID] [int] NULL,
	[UnitPrice] [money] NULL,
	[UnitsInStock] [smallint] NULL,
	[UnitsOnOrder] [smallint] NULL,
	[AuditStartDate] [datetime] NOT NULL,
	[AuditEndDate] [datetime] NOT NULL,
	[AuditStartOperation] [char](1) COLLATE Latin1_General_CI_AS NOT NULL,
	[AuditEndOperation] [char](1) COLLATE Latin1_General_CI_AS NULL,
	[AuditStartUserID] [int] NOT NULL,
	[AuditStartUsername] [varchar](128) COLLATE Latin1_General_CI_AS NOT NULL,
	[AuditEndUserID] [int] NULL,
	[AuditEndUsername] [varchar](128) COLLATE Latin1_General_CI_AS NULL,
	[AuditStartTransactionGUID] [uniqueidentifier] NOT NULL,
	[AuditEndTransactionGUID] [uniqueidentifier] NULL,
	[Audit_dboProductsID] ASC

This is a sample of Audit Trigger for Northwind Orders table:

ALTER TRIGGER [dbo].[trAuditProducts] ON [dbo].[Products]
WITH EXECUTE AS 'audituser'
	SET NOCOUNT ON -- Trigger cannot affect the "rows affected" counter, or else it would break Entity Framework
	-- Logged User
	DECLARE @Username varchar(128)
	DECLARE @Now datetime
	EXEC [dbo].[sp_GetContextInfo] @UserID OUTPUT, @Username OUTPUT, @TransactionGUID OUTPUT
	SET @infinite = '9999-12-31'
	-- InsertUpdate
	DECLARE @Action varchar(1)
	SET @Action = 'D'

    -- Defining if it's an UPDATE (U), INSERT (I), or DELETE ('D')
    IF (SELECT COUNT(*) FROM inserted) > 0 BEGIN
		IF (SELECT COUNT(*) FROM deleted) > 0  
			SET @Action = 'U'
			SET @Action = 'I'

    -- Closing the lifetime of the current revisions (EndDate=infinite) for records which were updated or deleted
    IF (@Action='D' OR @Action='U')
		UPDATE [audit].[Audit_dboProducts]
		SET [AuditEndDate] = @Now, 
		[AuditEndUserID] = @UserID,
		[AuditEndUsername] = @Username,
		[AuditEndTransactionGUID] = @TransactionGUID,
		[AuditEndOperation] = @Action 
		FROM [audit].[Audit_dboProducts] aud
		INNER JOIN deleted tab
		ON [tab].[ProductID] = [aud].[ProductID]
		AND aud.[AuditEndDate] = @infinite

    -- Creating new revisions for records which were inserted or updated
    IF (@Action='I' OR @Action='U') BEGIN
		INSERT INTO [audit].[Audit_dboProducts] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued],  [AuditStartDate], [AuditEndDate], [AuditStartOperation], [AuditStartUserID], [AuditStartUsername], [AuditStartTransactionGUID])
		SELECT 	[inserted].[ProductID], [inserted].[ProductName], [inserted].[SupplierID], [inserted].[CategoryID], [inserted].[QuantityPerUnit], [inserted].[UnitPrice], [inserted].[UnitsInStock], [inserted].[UnitsOnOrder], [inserted].[ReorderLevel], [inserted].[Discontinued],  
		FROM inserted


Let’s try another test on Northwind database:

        static void Main(string[] args)
            // creating product, order and orderitem
            var db = new NorthwindAuditDB("Richard Drizin", 27);
            var product = new Product()
                ProductName = "3/4 inches screw",
                UnitPrice = 9.99m,
                UnitsInStock = 23
            var order = new Order()
                CustomerID = "FRANK", // Customers PK is varchar in Northwind ... yeah I know
                EmployeeID = 1,
                OrderDate = DateTime.Now,
            order.Order_Details.Add(new Order_Detail()
                Product = product,
                UnitPrice = product.UnitPrice.Value,
                Quantity = 3,

            // updating quantity of items
            db = new NorthwindAuditDB("Mickey Mouse", 31);
            var lastOrder = db.Orders.Where(x => x.CustomerID == "FRANK").OrderByDescending(x=>x.OrderID).First();

            // deleting order and orderitem
            db = new NorthwindAuditDB("Donald Duck", 33);
            var lastOrder2 = db.Orders.Where(x => x.CustomerID == "FRANK").OrderByDescending(x => x.OrderID).First();



Product was created once, never modified or deleted. (I’ve hidden null columns so the screenshot could fit the article):

Order was inserted, and later deleted (it’s a single row, but I made it vertical so the screenshot could fit the article):

Order item was inserted, updated, and later deleted. (there are 2 rows, but I made it vertical so the screenshot could fit the article):

If I manually update (on SQL Management Studio) the table, it will track the SQL user, Hostname an IP:

Last comments and some advanced techniques:

  • You can create the AUDIT tables on a separate file group (because of growth?). However that will give you the freedom to restore audit tables individually, and I think it is more dangerous than helpful.
  • I created the AUDIT tables under a different schema, so the triggers must run “WITH EXECUTE AS” on some user which has permission on that schema. The regular database user for my application cannot access the auditing tables.
  • Instead of using “infinite” you could use NULL. I prefer to leave infinite so that my queries can use BETWEEN instead of checking for nulls or using COALESCE.
  • You cannot use text, ntext, or image columns in the ‘inserted’ tables. You can circumvent that by looking for the data in the real table, since the trigger runs after the insert/update happens. Just join the inserted with the real table, and refer to those columns on the real table.
  • For the updates I’m not checking if something really changed. I really don’t need it because Entity Framework only sends updates when something was really changed. If you need to check for modifications (with small performance penalty) you can also join the inserted table with the real table, and only insert when something was modified.
  • For updates you could also track only changed columns by keeping NULLs on every UPDATE that didn’t modify that column, but then for a nullable column you wouldn’t be able to tell when it’s a NULL or when it was a “not modified”. Even for non-nullable columns I still don’t think it’s worth - I prefer to have a simple and homogeneous design, in the sense that the audit tables reflect the exact same state as my transactional tables. If I need to make a human-readable log of what’s changed that’s responsibility of another algorithm, and not responsibility of the table.
  • I’m tracking modifications on every column. Again, I prefer to have a simple and homogeneous solution better than saving some disk space.