This is the second part in a series of posts looking at Product Extensibility in .NET Framework. On the first part of this series, I proposed how it’s a viable business idea to run a highly-customizable SaaS product where instead of developing a full-fledged PaaS for customers to do their own customization (because that wouldn’t pay off unless you have tens of thousands of customers) you could develop and maintain the customizations yourself, by following some architecture principles so that you don’t end up with a completely orphaned codebase for each customer (because that would be a costly maintenance hell).

On this second part I’ll introduce some architectural foundations and tools that allow us to extend our Domain Objects (and respective CRUD) both with new properties and new behavior, while keeping our core product upgradable. I’ll also reinforce a lot about the difference of real architecture problems that we need to solve and nice-to-have concepts that we don’t need to achieve because they would imply in major maintenance efforts.

Disclaimer

I love software architecture (and I have worked exclusively with this for some time), but I try to always be a pragmatic developer, and not an architecture astronaut, and in this sense I try to reject all designs that do not add obvious value to my code. During the following posts I’ll take many decisions that are based on my own experience, trying to be pragmatic, but I’ll try to the best of my knowledge to always give some key reasons for the decisions that I’ll take (with lots of links to opinions of others who are much better developers than me). I believe that developers frequently forget to focus on business value and get lost doing overly engineered systems with complex abstractions that turn a simple “Hello World” into a mission that’s only accomplishable to rockstars developers with plenty of spare time. In this sense, I’ll try to reduce the number of layers and abstractions to a minimum, only enough to reach my goals of this article. Afterall, as Alan Kay said, simple things should be simple, and complex things should be possible.

This initial article discusses a LOT of concepts, so I tried to organize the text in small paragraphs, and tried to introduce the concepts in the correct order.

Background

I’m doing consultancy work for a client who has a core product which is forked for a few thousands customers, and each fork is extended (customized) according to very different customer requirements. The extensions are developed in the open-box model, which Wikipedia describes as “the most flexible form of extensibility (…) [in which] changes are performed invasively in the original source code”. The problem with this model is that it has difficult maintenance (e.g. bugs, upgrades, etc.) unless you have a well-designed architecture. Most of my decisions and designs are based on this client and his requirements.

Requirement – Modular Extensions

We have this core-product, we have customer-specific extensions/customizations, and we also have some generic modules which can be installed on our clients as needed. This leads me to have a plugin architecture (modular architecture), where all my extensions should be automatically loaded (without requiring explicit calls to register or invoke each one).

Unless specified otherwise, each module is independent from any other module.

Assumption – Independent Codebases

Now that you’ve just read about “modular plugin architecture”, please forget what you know about plugin architectures. We’re not developing plugins for Excel or for Photoshop. We’re not developing ABAP scripts for SAP. We DO have the source code for “our Excel”, and we don’t need to be so strict about the “holy product”. We do want modular extensions, but we don’t need to have a product that is untouchable and in which customizations may get difficult, expensive or even impossible. We also don’t want to have the Guardians Team who will block you from touching the core product. We want to isolate common code and customer-specific extensions to the best we can, but we don’t need to have those components totally decoupled. What we want is a well-structured codebase for our product, that can be forked (or branched) for each different customer, and may have whatever adjustment is requested by our customers, without completely modifying the base code so that it would be hard to merge a product upgrade. In other words, it’s totally acceptable to modify ANY part of the forked common code, as long as it still possible to later merge updates in the common code without having to manually rewrite/review all customer extensions. We want to stick to the open-box model (where we can make modifications to the original source), we just want to make it easier for us.

Rebuilding code should be easy and it’s not a problem. Maintaining individual codebases for each customer should be easy and it’s not a problem (I’ll cover this on future posts, but if you’re not using Git or some other DVCS, you’re forking in the hardest way). Actually, by making some allowances to your architecture will save you tons of problems that shouldn’t exist in the first place. Being pragmatic is the key.

If you still don’t get the idea, please refer back to the first part of this series, where I stress a lot that you’re not Microsoft.

Definition – CRUD

In this series of posts, I’ll use CRUD mostly to refer to SQL code that is used as part of your transactional business operations, in other words I’m mostly talking about INSERTs/UPDATEs/DELETEs used over your Domain Objects, but also about simple SELECTs used for automatically loading those entities. On the other hand, when I’m talking about SELECTs that are used for reports and displayable content (grids/listings/etc.), I should refer to them as Reporting Queries. And as you know, we should use the right tool for the right job, so don’t expect that the same foundations that I use for CRUD will be used for Reporting Queries.

Assumption – Hand-written CRUD

I assume that you won’t be writing manual queries for your CRUD. Paraphrasing Jeremy Miller, if you’re writing SQL CRUDs by hand you’re stealing from your employer or client. Again, I’m not talking about Reporting Queries here. For complex reports it’s totally acceptable and desirable to hand-write SQL queries, but if you’re hand-coding your own Domain Objects CRUD, then you’re indeed stealing from someone.

Additionally, I assume that you understand the risk of SQL Injection (Bobby Tables says hi!), and I assume that you know that parametrized queries (SqlParameters) are the best solution to avoid that (and also to improve performance since we get a cached execution plan). I suppose also that you know that it’s boring and error-prone to write SqlParameters by hand.

Last, at the risk of sounding Captain Obvious, but the only way to achieve a maintainable codebase is to keep your common code and your customization code isolated. You can’t obviously isolate common code from customizations if they both share the same line of code. At worst case you could keep custom code immediately after (or before) common code, but obviously not in the same line. That means that you can’t use hand-written CRUD because your extensions (custom columns for example) would be in the same code line of your common code, and no code comparison/merging tool could save you from a maintenance hell.

Assumption – Home-grown ORM

Since you won’t write your own CRUD, you should obviously use an ORM. And in the same sense that writing CRUD by hand is stealing from someone, writing your own ORM is probably not a good idea either, and Ayende has some very good arguments on why it’s harder than you thought. As someone wisely said on Hacker News: “If you’re not using an ORM, then you ultimately end up writing one. And doing a far worse job than the people who focus on that for a living. It’s no different from people who “don’t need a web framework”, and then go on to re-implement half of a framework. Learning any framework at a professional level is a serious time investment, and many student beginners or quasi-professional cowboys don’t want to do that. So they act like their hand-rolled crap is a badge of honor.” Ok. Enough about reinventing the square wheel.

Technology – Entity Framework 6 and Dapper

The most well-established ORM for .NET is Entity Framework (endorsed/developed by Microsoft itself), and the most well-established Micro-ORM for .NET is Dapper (developed by StackOverflow team). Entity Framework (abbreviated as EF) needs no introduction. It’s powerful, full-fledged, very well documented, and very consistent.

Dapper on the other hand is a lightweight library, has a very specific objective, but is an extremely useful tool. It’s mostly targeted at mapping from SQL queries to POCOs, and to mapping from CLR objects to SqlParameters in an easy way. [If by any reason you don’t use a full-fledged ORM like EF and still hand-write CRUD, Dapper can save you thousands of boilerplate lines of code without any tradeoff at all]. Both are very useful tools, and I like to use the right tool for the right job:

  • EF6 will automatically generate CRUD for my entities, has amazing support for relationships (eager loading, lazy loading, saving a graph of objects in the correct order, resolves concurrency problems, etc.). It’s strongly typed, which helps us being more productive (thanks to the best IDE and Intellisense), and also helps to catch errors during the build (I’m pragmatic enough to know that you won’t have 100% code coverage on your tests, especially because you shouldn’t be unit testing problems that have already been solved by someone else [or in this case by some other library]). Dapper also has some good extensions for generating CRUD and for working with relationships, but it’s not as mature as EF.
    For these reasons, I like to use EF6 for complex entity updates (with the benefits of type checking), lazy loading, etc.
  • Dapper makes it very straightforward to hand-write SQL queries, pass parameters from C# to the SqlCommand, and also has good support for multi-mapping (allowing me to manually write efficient queries), and being closer to “bare-metal” SQL. Additionally, Dapper saves us from using EF for complex queries and bulk operations where it’s known for having performance issues. Last, Dapper allows us to return dynamic types (while EF doesn’t) and allows WHERE IN.
    For these reasons, I like use Dapper for Reporting Queries (where I may not need a DTO), batch operations, and very-simple operations where I don’t need to benefit from ORM strongly-typing.

I must confess that in the past I’ve made the mistake of using EF for things where EF was a bad decision, and where Dapper would be a better fit. But my repulsion for using ADO.NET blinded me to the point of using EF for complex queries where I was obviously hammering screws. (To my defense I must say that this was a large Silverlight project, which required RIA services, and EF was the obvious choice for that.)

Assumption – Database is the King

I must confess that I’m old-school. I’ve learned database design with a senior DBA / Data Admin whose favorite part of the week was printing out a full 20pg database diagram (4x5 pages), sticking that ER diagram on the wall, and explaining to the developers all recent changes in the data model. In other words, I’m still much more comfortable with database-first design, rather than code-first design.

This means that I usually tend to think first in terms of Tables, and paraphrasing Jon Smith my philosophy is that Database is King. I usually starting with my tables and do a bottom-up approach, reflecting my tables in the Business Layer, as Persistent Entities (any class that can be persisted to database). And although in my posts you’ll find some references to non-Persistent classes (like ViewModels or other DTOs), when I say “Entities” (or Domain Entities) I’m probably talking about Persistent Entities that are directly mapped to Database tables.

Technology – EF Reverse POCO Code First Generator

I believe that plain C# code is much more friendly than XML configuration (specially for version control reviews/compares), so when I want to automatically extract my data model (my entities) from the database, instead of using the regular “Database first” option (which would generate my model inside an EDMX XML file, ugly, hard to do any adjustments or version control), I prefer the “Code First from existing Database” option.

However, the default EF “Code First from existing Database” Wizard is not configurable, can’t be automated (I have hundreds of legacy codebases to upgrade to my architecture), and it’s not actively maintained. More than that, I’m not a big fan of Conventions over Configurations - I prefer to see exactly my raw configurations/mapping in code. If I wanted magic conventions (like automatic plurazing my table names) I probably wouldn’t be using C#. So instead of using Data Annotations, I really prefer using Fluent API, which is also much more powerful.

With all those requirements in mind (pure C# code instead of XML, reverse engineer model / database-first, and Fluent API), I found this great T4 templates EntityFramework-Reverse-POCO-Code-First-Generator which can be fully customizable to our needs, and to which I had the pleasure to make a few contributions

Object Orientation vs Service Orientation, Domain Entities vs DTOs

With so many different architectures to choose from, people get really confused. Instead of having my entities as simple DTOs (objects used only to transfer data) and controlling those objects in Service Classes (or even worse in “Manager classes”), I prefer to have my entities as POCOs (objects with state AND behavior), based on the good and old Object Oriented Programming (OOP), and sitting somewhere between traditional layered-architecture and DDD.

I like the principles of OOP, I think it makes my code clear, concise, and as I’ll show later I think inheritance is one of the best ways to provide a “common vs custom” architecture. Well, maybe the DTO vs POCO is a personal taste decision, but I’m glad to have a view similar to Ayende and Martin Fowler (who describes the Anemic Domain Model anti-pattern) on this.

Definition - Business Layer

My entities (as POCOs with state and behavior) belong to what I call Business Layer, like we used to do in the old and good three-tier model. Ok, call me old school again, I don’t care. You can call it Services Layer, or whatever you prefer, but I believe people confuse “Services” with SOA or Web Services (REST or whatever), so I prefer to use the old and good “Business” name to make it clear that all business rules belong there.

Similarly to DDD (where services are for the situation when you have an operation that doesn’t properly belong to any aggregate root), I also keep those Services which can’t fit into any specific entity in the same Layer. That’s why I see my POCOs and BLL as something between traditional layered-architecture and DDD.

Last, in this same layer I also add Repository Queries (different filters to load my entities according to some criteria) , both for EF (as extensions to IQueryable<T>) and for Dapper.

In summary: My Business Layer will be composed of POCOs (or Entities, which I call interchangeably, although they are slightly different) with state and behavior (methods). Each instance of those Entities is what I’ll call a Domain Object, or just “instance”.

Repository and Unit of Work Patterns

Design patterns are a pretty good way of communicating software design concepts, but they are overly misunderstood and misused. Part of this is probably due to new technologies that emerged and got mature in the past 20 years (since the GOFA book was first published), and the other part is due to people repeating over patterns without really understanding the reasons behind them.

When developers use Entity Framework, it’s common that they misuse two design patterns: the Repository and the Unit of Work. The principle of the Repository is that your POCO should be persistent ignorant and that the Repository is an in-memory collection of objects responsible for hiding details of data access from the business layer. This is exactly what EF provides you as DbSet<T>, with methods for adding entities, removing, finding, etc.

The principle of the Unit of Work is that it will keep track of your objects, resolve the order of inserts, manage transactions and apply changes. This exactly what EF provides you as the DbContext.

In other words, you don’t need to implement Repository or Unit of Work because EF already does that for us. As Ayende explains, adding an abstraction over another abstraction doesn’t actually give you anything. He also explains that “Getting data from the database is a common operation, and should be treated as such. Adding additional layers of abstractions usually only make it hard”. So please stop adding abstractions over abstractions. Start with direct and ‘naive’ architecture and develop it over time. If you’re still inclined towards writing an abstraction over EF, read this nice and pragmatic opinion. Additionally, if you write abstractions to “protect” your developers, stop it immediately, and educate them instead.

More than that, the whole concept of isolating your DAL from your BLL (back from the three-tier model) is also outdated since Entity Framework (and nHibernate, and many other ORMs and Micro ORMs) are already persistence agnostic. Your DAL is Entity Framework and your BLL are your POCOs (or DTOs and Services/Managers if you prefer). I know this sounds obvious, but I have seen countless projects with completely empty BLLs and DALs (just proxying calls to the underlying EF or other ORM, or even worse calling stored procedures which were just another repetition of the DAL and of the BLL). People seriously misunderstand design patterns.

In summary: The Business Layer uses EF directly – there is no Data Access Layer because EF handles that for us.

Same POCOs for EF and Dapper

I don’t think it makes sense to have different POCOs for EF and Dapper (let alone having both a POCO and a DTO for every entity), so I use the same POCOs for both EF and Dapper, and will use the same entities for transferring data whenever it’s possible to avoid creating DTOs for existing entities.

The advantage of using the same POCO for both EF and Dapper is that my business methods can work with entities loaded by both. The drawback is that it is possible that a developer tries to navigate through relationships in POCOs that were loaded by Dapper, which would throw a NullReferenceException, since POCO relationships can only be lazy-loaded by EF proxies. Additionally, I think we may face some CLR type mismatches between what EF and Dapper expect, but I haven’t yet faced that problem and I don’t think it would happen for common types.

Requirement: POCOs Inheritance for Behavior

Since POCOs have behavior, standard application behavior will be in the POCOs. However, I want to rely on OOP for my extensions, so I want to be able to use INHERITED POCOS, where I can use OOP features to override default behavior with my own behavior. In other words, if my application has a class called Product with a method bool HasStock(int quantity), I want to be able to override that method with my own implementation (which may or may not use the base implementation), and class inheritance (and method overriding) is the most elegant way to do this.

Challenge: EF Entities Loading

When I create a new instance (Domain Object) for using with EF, I can choose my constructor. When I want to create a Product and add it to the DbSet<Product>, instead of using the regular Product class it’s easy to construct a derived class MyCustomProduct (with custom behavior – I’m not yet discussing custom data properties). That would work. However, when I’m loading the instances directly from the database I can’t explicitly choose the constructor for my entities (if I’m loading a specific entity) or for related entities (either with eager-load or lazy-load). In other words, if we call new DatabaseContext().Products.First() or if we lazy load order.Products.First() it will not load the derived class. So we must somehow tell Entity Framework that despite that our model is defined using Product class, it should always use MyCustomProduct instead.

To our rescue, we can use EF Inheritance as a hack for behavior inheritance. When EF tries to load an entity which is defined as an abstract class, this is what happens:

  • If the entity has NO derived concrete classes, we would get this error:

    The abstract type 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' has no mapped descendants and so cannot be mapped. Either remove 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' from the model or add one or more types deriving from 'ExtensibleAdventureWorks.Business.Entities.Sales_Store' to the model.
    
  • If the entity has MORE THAN ONE derived concrete class, we get an error telling that discriminator column is missing:

    Invalid column name 'Discriminator'.
    
  • If the entity has ONLY ONE derived concrete class, EF will find the correct child class and will use it instead of the abstract class. e.g.:

    db.Stores.Single(x=>x.id==1); // This brings the correct concrete class. 
    

By using this behavior, I can make all my entity classes abstract, create a single non-abstract child class for each one, and I can get BEHAVIOR overriding in my child classes. Parent abstract class can define the default behavior for my application, but when my entities are loaded they will always be child classes, and we’ll be able to use overridden methods.

My Class Hierarchy for Default vs Custom Behavior

Taking advantage of that EF behavior, I generate all my database tables as abstract classes and an empty child concrete class for each abstract class.

First I was planning to use the same name for both Base and Derived classes, keeping them in different namespaces. Unfortunately that doesn’t work because EF6 doesn’t allow two classes with the same name (although they obviously are in different namespaces) [this was fixed in EF core]. Because of that I just gave up on different namespaces, and kept both the base (abstract) classes and the derived (concrete) classes in the same namespace, different only by a leading underscore in the names of abstract classes.

Another idea was having 3 levels in my hierarchy: first level would be an abstract class having only Data Members (properties and relationships), second level would also be abstract but would add the default behavior, and finally the third level would be a concrete class with the custom behavior. That works, but I decided it was overkill. Remember, I don’t want to over engineer anything, but you can use 3 (or more) levels if you prefer, and the first level could somehow be used as plain DTOs. Just remember to wear your heavy astronaut boots so that you don’t float too far on the galaxies of useless-abstraction (myth busted).

To sum up, each POCO will have just a base abstract class and a derived custom concrete class, both in the same namespace, different only by a leading underscore in their names.

This two-level design implies a few important points:

  • When reading an entity we can refer to the base type, even though we will get the derived type:

    _Sales_Store store = db.Sales_Stores.Single(x => x.BusinessEntityId == 2051); 
    
  • We can also cast to the derived type (which is better as I’ll explain below):

    Sales_Store store = (Sales_Store) db.Sales_Stores.Single(x => x.BusinessEntityId == 2051);
    
  • When creating an entity, always refer to the derived (concrete) type (obviously the parent is abstract and couldn’t be instantiated):

    var store = db.Sales_Stores.Create<Sales_Store>();
    var store = new Sales_Store();
    
  • Creating the classes directly from the DbSet<T> won’t work because T is an abstract class:

    var store = db.Sales_Stores.Create()
    

    I’m not sure if I can change the templates so that all context and relationships will use the concrete types instead of the base types, but it might work, although EF plumbing is complex.

  • In brief, you should always instantiate the concrete type even if you’re writing a method in an abstract class (either from the same entity or from other entity). This creates a dependency, since our base classes depend on the concrete classes. I don’t think is a problem since they are all in the same layer, but maybe with Dependency Injection we could decouple them. I don’t think it’s worth because we would lose the ability to call the regular class constructors, which I think is a good way of forcing mandatory parameters on object initialization. In other words, I’d rather have a well-designed and consistent API (and Domain Model) than having the base class decoupled from the concrete classes.

  • Most Business/Services won’t ever need to reference base classes, and can always use the concrete entities instead. I can’t think of a scenario where one would have to use the base classes, so probably always casting to the derived type is a nice idea.

  • Like in any class, if no constructor is defined for the concrete classes, the compiler will automatically create a public empty constructor. However, if any constructor is defined (like in my example of forcing mandatory members to be passed in constructor) we should create a parameterless constructor for EF. A private one is enough for the entity to be loaded, but it should be at least protected to allow lazy loading of relationships.

For achieving the abstract/concrete inheritance described above, we make the following changes to the T4 templates:

  • Make generated entities abstract
  • Add an underscore before the name of abstract entity
  • Create an empty (derived) concrete class for each abstract class.

Both the abstract classes and the derived classes are all generated in partials in a single file. If the developer needs to create default application behavior he should create a new partial for the abstract entity, and if he needs to create custom behavior he should create a new partial for the derived class. Another option would be creating with T4 an individual file for each class (both for the abstract and for the concrete), but it would polute my sample with many empty classes, and additionally it makes it risky to rerun the T4 and overwrite your uncommited customizations.

Requirement: Extend existing Entities and Add new Entities

We want to be able to extend our Data Model with new entities (new tables), and extend existing entities with new properties (new columns). We don’t want to modify our core product, but only to add this extensions to the instance of a single customer. We want to be able to later upgrade the product, adding new entities/properties which have been added to our master codebase, without breaking our customer-specific extensions.

Dynamically Loading EF Model Extensions

On my first attempt, I had this objective of keeping all customizations isolated from the product code, including model extensions, which should be contained in an isolated module.

Since entity framework does not allow more than one EntityTypeConfiguration for the same entity, for extending existing entities (by adding new properties and relationships) I created a generic interface IModelExtension<T> which should be implemented by any class that needs to map. Then I created an extension method ConfigureExtensions(this EntityTypeConfiguration config) that loads all classes that implement IModelExtension<T>, and called this load method this.ConfigureExtensions() in the end of each configuration class in my T4 template. In other words, I created this plugin architecture so that I could keep customizations for entity T across multiple modules, each one mapping NEW PROPERTIES to entity T.

For extending the model with NEW entities I created an interface IModelExtension that should be implemented by any class that needs to map NEW ENTITIES to the model. Then I created an extension method LoadEntityFrameworkExtensions which loads all those extensions, and is called from the DbContext.OnModelCreating.

… and then I realized that I spent 4hrs for something stupid.

Why on earth would I need to keep customizations isolated from common code if this is all generated code? Go back to decoupling assumption: We don’t need to have product and extensions totally decoupled. This means that since our Data Model is automatically generated from the database, we don’t need to worry about keeping the common-part and extensions isolated in POCO columns definitions and POCO mappings.

Dynamically loading EF Model Extensions is totally possible(*), but it’s no useful for our goals. We can just apply the DDL scripts for each module and each customization, and just rely on the good and old code generation for updating our data model. Additionally, we also don’t need to isolate POCO properties and POCO mappings, etc. We (obviously) only need to isolate generated code from hand-written code (which can be either application standard behavior or custom behavior/extensions).

Move on… let’s look for real problems.

(*) If you’re interested in Dynamically Loading EF Model Extensions, please check my branch EFDynamicModel, and I can write about it in a future post. If for any reason you don’t want to use code generation that may be useful.

Model Extensions and the Inner-Platform Effect

The most popular customization in any application is probably adding new fields to your entities (in other words, customer wants to add new field to a form). If we have code generation and we use objects across our layers (instead of passing scalar values), that shouldn’t be hard – it should take only a few minutes, a rebuild, and our customer would have the new columns. (I will discuss UI in future posts, for now I’m only talking about POCOs/BLL). However, software vendors instead of just doing this themselves as a customization (adding the columns and deploying an update, which would allow us to use this new field in as a first-class citizen, in any business rule), they usually create a metadata-based feature to allow the user to create by themselves the new tables/columns/data. In other words, instead of doing something that should be simple (and easily isolated from the core product, so that it doesn’t block future upgrades), we delegate that to the user itself, as if this was empowering him, when actually we’re just leaving custom data properties disconnected from the rest of the application, because since they are not part of our POCOs they can’t be used on programmatic business rules. And more than that, it usually doesn’t pay off, because the efforts you have in creating and maintaining user-defined-fields are usually much higher than it would be if you just created them by yourself whenever customer needs something new.

And there you have Inner-Platform Effect, which happens when you design a system to be so customizable that it ends becoming a poor replica of the platform it was designed with. Sometimes doing this “customization” using the inner-platform becomes so complicated that only a programmer (or a consultant) is able to do it, instead of the end user who was supposed to use your inner-platform at first place. As someone said if your customer needs something totally flexible you should ship him a C# compiler.

On the top of my head I can remember a lot of “configuration tools” which were developed to be used by the end user and ended-up as being a half-baked tool that is at best a badly-designed subset of its underlying platform. Sometimes these tools take the form of a software, sometimes they are just a complex and unreadable XML/JSON file, sometimes they are a bunch of parameters stored in the database, sometimes they are a scripting language, etc.

As I explained in the beginning of this post, creating a full-fledged PaaS is expensive, and doesn’t pay off unless you have a very large user base, and creating an abstraction over an abstraction usually only make it hard. So my point is: instead of letting the users create the fields themselves, you (as the SaaS vendor) should be the one creating those fields, but you’ll be creating them on your development platform (.NET/SQL Server/etc), and not on some half-baked framework that you developed while pretending you’re Borland.

Summary

This second post (which is the first technical one) required me to introduce some concepts before going further into programming examples. With so many different paradigms I thought it was important to reinforce my technical goals and to explain some technical decisions.

I can’t stress enough that we’re focused on solving real problems. Given my goals, some unreal problems would be decoupling default behavior and custom behavior, isolating product generated code from customizations generated code, having a product codebase that can’t be modified, or a single codebase shared among all customers, etc. I don’t care about those issues, they don’t affect my goals. I also don’t care about having a Domain Model that has a direct one-to-one mapping to my database.

I also gave good reasons to explain some architectural decisions like using database-first, using both EF and Dapper, using POCOs instead of DTOs, putting all my Business Rules + Entities + Services into a single Business Layer (without DAL or extra Repository/UnitOfWork patterns).

Then, I described a class design that allows me to have default behavior and custom behavior using plain OOP. Last, I described how adding new properties and new entities shouldn’t be a hard problem when you rely on code generation, and how forked repositories (and the open-box model) are a good alternative for achieving unmatchable customization level to your customers.

Next Steps

On the next parts I’ll go into more technical examples and discuss new topics including:

  • Enums Extensibility, using Type-Safe Enums, to which we’ll develop custom mappings both for EF and for Dapper.
  • Business Transactions (and the Transactional aspect), OOP methods overriding, Publisher-Subscriber pattern, Domain Model Validation, and Dependency Injection.
  • Version control structure and how Git submodules can help us manage and control both common-code and custom modules/extensions, upgrades and merges.
  • Extensibility for Reporting Queries
  • Extensibility for ASP.NET MVC, Webforms, css/javascript.

Source Code

Source code for this post (part 2 for the series) is available at https://github.com/Drizin/ExtensibleAdventureWorks.

For this post (branch part2) you’ll find:

  • Scripts for the regular Adventure Works 2012 database (model here), which was used as base for my samples.
  • The T4 Templates in the Business Layer, customized for generating my POCOs as abstract classes (base classes) and for creating the concrete derived classes.
  • The generated POCOs and EF Mappings for all Entities.
  • Sample behavior added to abstract and concrete classes for tests.
  • Working infrastructure using Dapper and EntityFramework nugets (I used EntityFrameworkWithHierarchyId, since AdventureWorks uses table hierarchies)
  • Unit Tests for testing loading/saving concrete entities using both EF and Dapper and for testing Behavior Inheritance (derived POCOs)

Links to all posts of this series

After many years using (and enjoying) Git, I had to start using Subversion again on a daily basis.

Git treats ignores on a regular file (.gitignore, whose rules are applied both to the same folder and to all subfolders), while Subversion treats ignores as an svn:property, which makes it harder to add new filemasks to the ignore list.

Using TortoiseSvn you can just click on the “add to ignore list”, but on command line you don’t have that.

This PowerShell function helps you on that. It will extract the svn:ignore property on any folder, include a new pattern to the list of ignores, and update the property. Enjoy!

function Svn-Add-Ignore
{
   param(
     [Parameter(Position=0,mandatory=$true)]
     [string] $folder,
     [Parameter(Position=1,mandatory=$true)]
     [string] $filemask
   )

   if (( Test-Path $folder) -eq $False)
   {
       Write-Host "Path $folder not found" -for red
       Exit 1
   }
   
   $ignores = (svn propget svn:ignore $folder)
   if($ignores)
   {
      $ignores = ($ignores -join "`n" | Out-String).Trim()   # convert multiple lines to single multiline string
      $ignores = $ignores + "`n" + $filemask
   }
   else
   {
      $ignores = $filemask
   }
   svn propset svn:ignore $ignores $folder
}

Sample usage:

Svn-Add-Ignore .\ "bin"
Svn-Add-Ignore .\ "obj"
Svn-Add-Ignore .\ ".vs"
Svn-Add-Ignore .\ "*.user"

This is the first in a series of posts looking at Product Extensibility in .NET Framework. In this post, I’m going to talk about business models for IT vendors, and how you can run a profitable SaaS business without sticking to the traditional models, by making some allowances in your architecture that will let you make your customers happy while keeping low maintenance costs.

Enterprise Software, Custom Development, and SaaS

Enterprise Software companies (like Oracle, Microsoft, SAP, etc) have their value proposition centered on the products they own, but they usually sell professional services attached to those products, aimed at training, support, product configuration, and some degree of customization. They will charge tons of money for those professional services, but their core value still resides in their products. You can purchase Oracle Database or Siebel CRM, and they will gladly customize that for you for a few hundreds of thousands of dollars, but they surely won’t modify anything in the core of their product, no matter how much you beg (or pay) for that.

In the opposite way, Custom Development companies (including Consultancy companies which offer custom development services, like Accenture or Tata Consulting Services), they provide their value by doing exactly what their customers want, usually backed by a team of senior consultants/engineers, and some well-established processes. Their value proposition is that they have a large network of consultants who can help in any problem you may have, and you also have someone to blame in case things go wrong.

Custom Development companies don’t really care on how much customization you ask for. For anything that you ask for, they will overcharge you, and happily develop it for you. For those companies, body shopping is exactly what they sell, and their technology or product (if any) is definetly not relevant to their business. If by any chance they already have a codebase that they can reuse(*), they won’t bother to completely modify (or even rewrite it) if you ask for that, because that’s what they are there for. Your codebase won’t be reused for any other customer because it´s probably so tied to your business that it won’t make sense to anyone else. Similarly to Enterprise vendors, Custom Development companies will also cost you an arm and a leg. Similarly, they also have large costs for each new customer.

(*) I won’t call it a product - custom development companies and consultancies do not have products, even if they can reuse their past projects to some extent.

Finally, there are the Software-as-a-Service companies, which are based on economies of scale, where each new customer should increase their revenue at some very low marginal costs. The value proposition for those companies is usually their technology, combined with their low adoption barriers, the lack of vendor lock-in, and the aforementioned scalability. For a “pure” SaaS company, the more customers you have the better - but only as long as they don’t bother you too much or give you too much work, because the major income source is NOT body shopping, but the subscription fees (recurring revenue).

Although SaaS vendors tried hard to commoditize application funcionalities for their customers, each customer is different and they are not happy with out-of-the-box solutions - they need customizations and extensions. The problem is that SaaS sits between Custom Development and Enterprise Software, in the sense that they can modify their core products for their customers, but they also must maintain a stable product codebase which should evolve for all customers, since they are responsible for hosting and maintaning their customers. Unfortunately most SaaS companies can’t find the correct balance between the customizations maintenance and the product management.

Let’s Play Microsoft

Developing a highly configurable and truly extensible product is very difficult and expensive. Making something configurable has a fixed cost that will only make up for when you have a large number of customers that will need that configuration. Making everything configurable not only has a large cost, but is also pointless since you don’t want to sell a framework (much less a compiler), but a stable and well-defined product. Making something extensible to end-users requires a well-designed plugin architecture (or a well designed API) and that comes with a price tag which usually is only compensated when you have a huge number of customers. Large companies like Microsoft and Oracle have teams with hundreds of engineers working exclusively on Product Development (or R&D like you learn in MBA), and they still have limitations. A SaaS company cannot affort that model, unless it has tens of thousands of customers willing to pay for non-commodity SaaS.

I’ve worked for a few years in a SaaS company that followed that Enterprise model and that was very unproductive. When a customer requested something to the Projects/Consultancy department, the request was reviewed by a consultant, which usually had to discuss that with the Product Development team, which would evaluate if that request made sense to be incorporated in the product. In case it didn’t fit into the product they would (at best case) provide a new extension point so that the Projects team could develop a new extension and plug that into a DLL into the product. Does this sound reasonable? If it sounds reasonable for you, it’s because you didn’t picture that most of the requests were cosmetic changes like: remove this button, rename that label, change that color, make this field read-only, etc. All those small changes had to pass through many areas. THAT was costly.

Even for non-trivial things (like pricing rules and discount rules), each customer had rules so specific that the product configurations were not enough for the majority of them. It’s obvious that there were some changes that were possible without requiring a product modification (afterall there was a whole product development team for that), but usually that wasn’t the rule, but the exception. As as consequence of that structure, we could only reach enterprise customers (which usually are the ones who have more money than sense).

At some point the product team also decided to develop their own scripting language (VBA-like). Yes, that’s right - a SaaS company, with no more than a hundred customers, decided to develop their own scripting language, for internal use, because no one could touch the holy product, which was supposed to be so well designed that no educated developer would be able to modify the source code without fucking up everything.

If your company is doing like that, remember: You are not Microsoft. In other words, don’t develop a plugin architecture or a scripting language or a complex API unless you are either a Platform as a Service company (like SalesForce) or unless you have tens of thousands of customers who will extend your product on their own. If you are developing developer-tools for internal use, not only you are not adding value to the company, but also your product managers should leave technology passion aside and start thinking about business. Microsoft cannot develop automation for the worksheets of all their customers, SAS cannot develop ABAP customizations for all their customers, but your SaaS company can customize the instance of each customer, as long as you don’t want to make commodity SaaS.

There is a similar mistake which I’ve seen a few times that is to completely ignore that your product should solve some very clear business needs, and turn your product into a general tool for creating any program, like a 4th generation programming language. Unless you are competing with Microsoft (Visual Studio) or Borland, and unless you are trying to make your own GeneXus / FoxPro / Informix / Clipper/ Progress / etc, then please focus on your product, with some clear business value, and not on applications to make applications.

If you are a SaaS company and your engineers love building compilers and development tools, you should either teach them Business 101 or hire a results-oriented team.

Independent Codebases for each Customer

When you run a multi-tenant SaaS application, all your customers run on the same instance (and consequently on the same codebase). A multi-tenant application must surely provide some level of configuration: a customer can probably add their name, their logo, their colors and visual identity, and probably set up the users, roles and permissions. But the codebase is shared among all customers, so unless you have a quite elaborated plugin architecture (which has a price tag), the customizations are usually expensive and with lots of limitations. Remember: You are not Microsoft.

On the other side, when you run single-tenant applications (each customer with a dedicate instance - not necessarily with his own server), you can have an independent codebase for each customer, and that provides an unlimited level of customization.

Perfect. So I’ll run all my customers in single-tenant instances, and each one will have their own codebase repository, totally disconnected from my baseline product. If that works for Accenture, it can work for me, right?

Not really. As I explained before, Consultancy Body Shops focus is selling man-hour projects. They don’t have a core product, and they don’t care about your source code. Also, they do not host your software. A SaaS company has a core product which is their most valuable asset, they host the application of their customers, and they should provide them with updates and fixes. They cannot just copy the codebase for each customer, customize it, and keep that code running unattended. Well, actually they can, but that’s not scalable, and they would be leaving money on the table: Customers need upgrades, customers need bug fixes, customers don’t want to depend on a single engineer who understands their customizations. When you have a SaaS application with more than a few dozens of customers, meeting all those requirements with independent codebases is either impossible or very expensive and naive.

Supposing that you are brave enought to run your customers with independent codebases, as soon as you reach a few hundred customers, every bug which is discovered and fixed will be difficult to be applied to each of those hundreds of customers, not only because you would have to modify a hundred different places, but also because those projects by now are probably very different from each other. Each new feature developed for one customer (or for the product) won’t be easily available to existing customers. Each new engineer who joins your team will have a learning curve trying to understand the customizations for that client. Everytime some engineer goes on vacation, the person who covers him will have a hard time understanding the new project. As time goes by, it becomes almost impossible to distinguish between what was part of your original product and what was a client customization. So you’ll end up with hundreds of different loosely-related applications, and you lose your most valuable asset: having a coherent product that is mastered by your engineers.

Remember, you don’t want to be as huge as Microsoft - you want to scale to as many customers as possible with as few engineers as possible.

Bottom line is: the major problem in managing a SaaS Product is the tradeoff between Extensibility vs Maintenance.

Extensibility for SaaS Companies who don’t want to be Microsoft

In my next posts I’ll describe some ideas for developing an architecture that is a balance between the “Holy Product” model and the “Repository per Client” model. You’ll be able to offer product enhancements and upgrades to your customers, while being able to completely customize the product programatically for each customer. Customizing the application for each customer will be easier than ever, and that translates into less costs for your customers and more revenue for you.

I’ll use my favorite technology stack for those posts (C#, SQL Server and Entity Framework) but you should be able to adapt most ideas to any other language/framework. I’ll explore how some tools/technologies/paradigms can be used and how some of them don’t make sense or are not practical. On the top of my head I can think of topics like POCO inheritance, code generation, branches per customer, publish-subscriber pattern, Dependency Injection, partial classes, but probably I’ll add more topics as I write the next articles and as I develop my code samples.

Links to all posts of this series