Code Generation in C# / CSX - Extracting SQL Server Schema

This is the second part of a multi-part post series I’m going to write about Code Generation using C#. In the previous part I’ve shown how to invoke C# Scripts (and CSX files) from PowerShell and Visual Studio. In this part, I’ll create a simple extractor to read the physical schema of a SQL Server Database and save it as a JSON file.

Creating the Project

The first step is creating a .NET project.

Running CSX scripts do not require csproj (or Visual Studio), but it’s very helpful to have a csproj for many reasons:

  • You get intellisense (mostly on CS, not so much on CSX - so try to use CSX as little as possible)
  • You can check for compilation-errors
  • You can configure a Console App (or any other entry point) to launch and debug your scripts
  • csproj is required for automatically restoring NuGet packages (although if you already have the NuGet packages you may just use them without csproj)

All libraries which I’ll use in my generator (Dapper and Newtonsoft) are compatible with netstandard2.0.
This means that the project type can be either .NET Framework (net472, which used to be called “full framework”) or .NET Core (netcore21 or netcore31) - both should work.

NuGet Packages

I’ll use two third-party libraries: Dapper (micro-ORM, for loading objects from SQL Queries) and Newtonsoft.Json (JSON library, just for serialization). All we need is their dlls, but the easiest way to getting those is by installing their NuGet packages to our csproj (which will automatically install those packages to anyone who tries to build our csproj, so it’s hassle-free collaboration among multiple developers).
If you’re not using Visual Studio IDE you can install those packages by running
dotnet add package Dapper and dotnet add package Newtonsoft.Json
in the same folder of your csproj.
If you’re using Visual Studio you can use the Package Manager Console and run this:

Install-Package Dapper
Install-Package Newtonsoft.Json

Sql Server Schema objects

Next, I’ll create my objects that will represent my physical SQL database. One important thing to remember is that C# scripting engines do NOT allow the use of namespaces.

SqlServerTable.cs:

using System;
using System.Collections.Generic;

public class SqlServerTable
{
    public string Database { get; set; }
    public string TableSchema { get; set; }
    public string TableName { get; set; }

    /// <summary>
    /// Can be "TABLE" or "VIEW"
    /// </summary>
    public string TableType { get; set; }

    public string TableDescription { get; set; }

    public List<SqlServerColumn> Columns { get; set; } = new List<SqlServerColumn>();

    /// <summary>
    /// FKs which point from THIS (Child) table to the primary key of OTHER (Parent) tables
    /// </summary>
    public List<SqlServerForeignKey> ForeignKeys { get; set; } = new List<SqlServerForeignKey>();

    /// <summary>
    /// FKs which point from OTHER (Child) tables to the primary key of THIS (Parent) table
    /// </summary>
    public List<SqlServerForeignKey> ChildForeignKeys { get; set; } = new List<SqlServerForeignKey>();

}

I’ll omit other classes for brevity, but you can refer to all code here (classes SqlServerColumn.cs, SqlServerForeignKey.cs, SqlServerForeignKeyMember.cs ).

Schema Reader

Finally, I’ll create the class which will read SQL metadata - SqlServerSchemaReader.cs:

using Dapper;
using System;
using System.Data;
using System.IO;
using System.Linq;

public class SqlServerSchemaReader
{
  public Func<IDbConnection> CreateDbConnection { get; set; }

  public SqlServerSchemaReader(Func<IDbConnection> createDbConnection)
  {
    CreateDbConnection = createDbConnection;
  }

  public void ExportSchemaToJSON(string outputJsonSchema)
  {
    Console.WriteLine("Reading Database...");

    using (var cn = CreateDbConnection())
    {
      var tables = cn.Query<SqlServerTable>(@"
        SELECT 
          t.TABLE_CATALOG as [Database], 
          t.TABLE_SCHEMA as [TableSchema], 
          t.TABLE_NAME as [TableName], 
          CASE WHEN t.TABLE_TYPE='VIEW' THEN 'VIEW' ELSE 'TABLE' END as [TableType],
          ep.value as [TableDescription]
		    FROM  INFORMATION_SCHEMA.TABLES t
		    INNER JOIN sys.schemas sc ON t.TABLE_SCHEMA = sc.[name]
          ... full code omitted for brevity - please refer to: 
          ... https://github.com/Drizin/CodegenCS/tree/master/src/CodegenCS.SqlServer
      ").AsList();

      var allColumns = cn.Query<SqlServerColumn>(@"
          ... full code omitted for brevity... 
      ").AsList();

      var fks = cn.Query<SqlServerForeignKey>(@"
          ... full code omitted for brevity... 
      ").AsList();

      var fkCols = cn.Query<SqlServerForeignKeyMember>(@"
          ... full code omitted for brevity... 
      ").AsList();
      foreach (var fk in fks)
      {
        fk.Columns = fkCols.Where(c => 
            c.ForeignKeyConstraintName == fk.ForeignKeyConstraintName && 
            c.FKTableSchema == fk.FKTableSchema
        ).OrderBy(c => c.PKColumnOrdinalPosition).ToList();
      }

      foreach (var table in tables)
      {
        table.Columns = allColumns.Where(c => c.TableSchema == table.TableSchema && c.TableName == table.TableName).ToList();
        foreach(var column in table.Columns)
          column.ClrType = GetClrType(table, column);
        table.Columns.ForEach(c => { c.Database = null; c.TableSchema = null; c.TableName = null; });

        // We copy FKs and remove redundant properties of the parent object (table) which we're attaching this FK into
        table.ForeignKeys = Clone(fks.Where(fk => fk.FKTableSchema == table.TableSchema && fk.FKTableName == table.TableName).ToList());
        table.ForeignKeys.ForEach(fk => { fk.FKTableSchema = null; fk.FKTableName = null; });

        // We copy FKs and remove redundant properties of the parent object (table) which we're attaching this FK into
        table.ChildForeignKeys = Clone(fks.Where(fk => fk.PKTableSchema == table.TableSchema && fk.PKTableName == table.TableName).ToList());
        table.ChildForeignKeys.ForEach(fk => { fk.PKTableSchema = null; fk.PKTableName = null; });

      }

      SqlServerDatabaseSchema schema = new SqlServerDatabaseSchema()
      {
        LastRefreshed = DateTimeOffset.Now,
        Tables = tables,
      };

      Console.WriteLine($"Saving into {outputJsonSchema}...");
      File.WriteAllText(outputJsonSchema, Newtonsoft.Json.JsonConvert.SerializeObject(schema, Newtonsoft.Json.Formatting.Indented));
    }

    Console.WriteLine("Success!");
  }

  string GetClrType(SqlServerTable table, SqlServerColumn column)
  {
    string sqlDataType = column.SqlDataType;
    switch (sqlDataType)
    {
      case "bigint":
        return typeof(long).FullName;
      case "smallint":
        return typeof(short).FullName;
      case "int":
        return typeof(int).FullName;
      case "uniqueidentifier":
        return typeof(Guid).FullName;
      case "smalldatetime":
      case "datetime":
      case "datetime2":
      case "date":
      case "time":
        return typeof(DateTime).FullName;
      case "datetimeoffset":
        return typeof(DateTimeOffset).FullName;
      case "float":
        return typeof(double).FullName;
      case "real":
        return typeof(float).FullName;
      case "numeric":
      case "smallmoney":
      case "decimal":
      case "money":
        return typeof(decimal).FullName;
      case "tinyint":
        return typeof(byte).FullName;
      case "bit":
        return typeof(bool).FullName;
      case "image":
      case "binary":
      case "varbinary":
      case "timestamp":
        return typeof(byte[]).FullName;
      case "nvarchar":
      case "varchar":
      case "nchar":
      case "char":
      case "text":
      case "ntext":
      case "xml":
        return typeof(string).FullName;
      default:
        Console.WriteLine($"Unknown sqlDataType for {table.TableName}.{column.ColumnName}: {sqlDataType}");
        return null;

      // Vendor-specific types
      case "hierarchyid":
        return "Microsoft.SqlServer.Types.SqlHierarchyId"; // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.34+)
      case "geography":
        return "Microsoft.SqlServer.Types.SqlGeography";  // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.32+)
      case "geometry":
        return "Microsoft.SqlServer.Types.SqlGeometry";  // requires Microsoft.SqlServer.Types.dll (EF or Dapper 1.33)+
    }
  }

  public static T Clone<T>(T source)
  {
    var serialized = JsonConvert.SerializeObject(source);
    return JsonConvert.DeserializeObject<T>(serialized);
  }

}

CSX

The idea is that we should put as much as possible into .cs files (and as little as possible in csx script), since intellisense and compile-time checks work better than in the CSX. The script file RefreshDatabaseSchema.csx will basically load libraries, define connection strings and paths, and invoke SqlServerSchemaReader.cs:

/// <summary>
/// This CSX Script will invoke SqlServerSchemaReader, which extracts the schema of SQL database and saves into a JSON file.
/// The easiest way to launch csi.exe (which is shipped with Visual Studio) to run this script is by using PowerShell script RefreshDatabaseSchema.ps1
/// You can do that from Visual Studio (see instructions in RefreshDatabaseSchema.ps1) or you can just execute "Powershell RefreshDatabaseSchema.ps1"
/// </summary>

// System libraries
#r "System.Data.dll"

// Load 3rd-party libraries by their relative paths, relative to "$Env:userprofile\.nuget\packages\"
#r "dapper\2.0.35\lib\netstandard2.0\Dapper.dll"
#r "newtonsoft.json\12.0.3\lib\netstandard2.0\Newtonsoft.Json.dll"

// CS files are better than CSX because Intellisense and Compile-time checks works better. 
#load "SqlServerTable.cs"
#load "SqlServerColumn.cs"
#load "SqlServerForeignKey.cs"
#load "SqlServerForeignKeyMember.cs"
#load "SqlServerDatabaseSchema.cs"
#load "SqlServerSchemaReader.cs"

using System;
using System.IO;
using System.Runtime.CompilerServices;
using System.Data;
using System.Data.SqlClient;

// Helpers to get the location of the current CSX script
public static string GetScriptPath([CallerFilePath] string path = null) => path;
public static string GetScriptFolder([CallerFilePath] string path = null) => Path.GetDirectoryName(path);


// location relative to the CSX script
string outputJsonSchema = Path.GetFullPath(Path.Combine(GetScriptFolder(), "AdventureWorksSchema.json")); 
string connectionString = @"Data Source=MYWORKSTATION\SQLEXPRESS;
                            Initial Catalog=AdventureWorks;
                            Integrated Security=True;";

Func<IDbConnection> connectionFactory = () => new SqlConnection(connectionString);
var reader = new SqlServerSchemaReader(connectionFactory);
reader.ExportSchemaToJSON(outputJsonSchema);

Powershell

Last, I’ll create a PowerShell to invoke the CSX file, which is useful because it can locate the csi.exe in multiple locations and because it can provide to csi.exe the location of per-user NuGet packages, so that CSX can load libraries by their relative-locations, without having to hard-code user-specific folders. RefreshDatabaseSchema.ps1:

# To Execute Powershell Scripts from Visual Studio:
# 1) Right-button PS1 file - "Open With...""
# 2) Configure:
#      Program: Powershell.exe
#      Arguments: -noexit -File %1
#      Friendly Name: Execute PowerShell Script

# To execute CSX scripts you'll need CSI.EXE (C# REPL) which is shipped with Visual Studio
# but can also be installed by using the NuGet package Microsoft.Net.Compilers.Toolset - https://www.nuget.org/packages/Microsoft.Net.Compilers.Toolset/

# For more info about launching CSX scripts from PowerShell or from Visual Studio, check https://drizin.io/code-generation-csx-scripts-part1/

$dir = Split-Path $MyInvocation.MyCommand.Path 
$script = Join-Path $dir ".\RefreshDatabaseSchema.csx"


# Locate CSI.EXE by searching common paths
$csi = ( 
    "$Env:userprofile\.nuget\packages\microsoft.net.compilers.toolset\3.6.0\tasks\net472\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Professional\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2019\Community\MSBuild\Current\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\15.0\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\Roslyn\csi.exe",
    "$Env:programfiles (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\Roslyn\csi.exe"
) | Where-Object { Test-Path $_ } | Select-Object -first 1

if (!$csi)
{
    Write-Host "---------------------------------------" -for red
    Write-Host "Can't find csi.exe" -for red
    Write-Host "Please fix search paths above, or install NuGet Microsoft.Net.Compilers.Toolset" -for red
    Write-Host "---------------------------------------" -for red
    Exit 1
}


$stopwatch = [System.Diagnostics.Stopwatch]::StartNew()

Write-host "Starting csi.exe $script ..." -for yellow
& $csi /lib:"$Env:userprofile\.nuget\packages\" $script

Write-Host "Finished in $($stopwatch.Elapsed.TotalMilliSeconds) milliseconds"

# Since I configured "-noexit" parameter in Visual Studio I don't need this
#if ($host.Name -notmatch 'ISE') { Write-Host -NoNewLine "(Just press Enter to exit)" -for cyan; read-host; }  

Running Powershell:

Result file AdventureWorksSchema.json:

Full Source code available here

This is the second part of a multi-part post series I’m going to write about Code Generation using C#:

comments powered by Disqus