Architecting in the pit of doom: The evils of the repository abstraction layer

time to read 12 min | 2309 words

Originally posted at 3/10/2011

This is part of the series of posts about the Whiteboard Chat project. In this post, I am going to explore the data access methodology and why is it so painful to work with. In particular, I am going to focus on the problems arising in this method:

[Transaction]
[Authorize]
[HttpPost]
public ActionResult GetLatestPost(int boardId, string lastPost)
{
    DateTime lastPostDateTime = DateTime.Parse(lastPost);
    
    IList<Post> posts = 
        _postRepository
        .FindAll(new GetPostLastestForBoardById(lastPostDateTime, boardId))
        .OrderBy(x=>x.Id).ToList();

    //update the latest known post
    string lastKnownPost = posts.Count > 0 ? 
        posts.Max(x => x.Time).ToString()
        : lastPost; //no updates
    
    Mapper.CreateMap<Post, PostViewModel>()
        .ForMember(dest => dest.Time, opt => opt.MapFrom(src => src.Time.ToString()))
        .ForMember(dest => dest.Owner, opt => opt.MapFrom(src => src.Owner.Name));

    UpdatePostViewModel update = new UpdatePostViewModel();
    update.Time = lastKnownPost; 
    Mapper.Map(posts, update.Posts);

    return Json(update);
}

You can clearly see that there was a lot of though dedicated to making sure that the architecture was done right. Here is the overall data access approach:

image

So are using the Specification & Repository patterns, which seems okay, at first glance.  Except that the whole system is seriously over engineered, and it doesn’t really provide you with any benefits. The whole purpose of a repository is to provide an in memory collection interface to a data source, but that pattern was established at a time where the newest thing on the block was raw SQL calls. Consider what NHibernate is doing, and you can see that here, one implementation of an in memory collection interface on top of a data store is wrapped in another, and it does nothing for us except add additional code and complexity.

As an aside, even ignoring the uselessness of the repository pattern here, I really don’t like that the FindOne() method returns an IQueryable<T>, and I don’t really see a reason for that there.

Then there is the issue of the Specification itself. It is not quite pulling its own weight. To be frank, it is actually sinking down the entire ship, rather than helping any. For example, let us look at the GetPostLastestForBoardById implementation:

public class GetPostLastestForBoardById : Specification<Post>
{
    private readonly DateTime _lastPost;
    private readonly int _boardId;

    public GetPostLastestForBoardById(DateTime lastPost, int boardId)
    {
        _lastPost = lastPost;
        _boardId = boardId;
    }

    public override Expression<Func<Post, bool>> MatchingCriteria
    {
        get
        {
            return x =>
                x.Time > _lastPost &&
                x.Board.Id == _boardId;
        }
    }
}

Are you kidding me, all of this infrastructure, all of those nested generics, for something like this? Is this really something that deserves its own class? I don’t think so.

Worse than that, this sort of design implies that there is a value in sharing queries among different parts of the application. And the problem with that sort of thinking is that this premise is usually false. Except for querying by id, most queries in an application tend to be fairly unique to their source, and even when they are the actual physical query, they usually have different logical meaning. For example, “select all the late rentals” is a query that we might use as part of the rental system home screen, and also as part of the business logic to charge people for late rentals. The problem is that while on the surface, this query is the same, the meaning of what we are doing is usually drastically different. And because of that, trying to reuse that query in some manner is actually going to cause us problems. We have a single piece of code that now have two masters, and two reasons for change. That is not a good place to be. You’ll probably have to touch that piece of code too often, and have to make it more complicated than it would have been because it has dual purposes.

You can see an example of how this is problematic in the method above. In my last post, I pointed out that there is a SELECT N+1 issue in the code. For each of the loaded posts, we also load the Owner’s name. NHibernate provides easy ways to fix this issue, by simply specifying what is the fetch paths that we want to use.

Except… this architecture make is drastically harder to make something like that. The repository abstraction layer hides the NHibernate API that allows you to specify the fetch paths. So you are left with a few options. The most common approach for this issue is usually to re-architect specifications so that they can also provide the fetch path for the query. It looks something like this:

public class GetPostLastestForBoardById : Specification<Post>
{
    private readonly DateTime _lastPost;
    private readonly int _boardId;

    public GetPostLastestForBoardById(DateTime lastPost, int boardId)
    {
        _lastPost = lastPost;
        _boardId = boardId;
    }

    public override Expression<Func<Post, bool>> MatchingCriteria
    {
        get
        {
            return x =>
                x.Time > _lastPost &&
                x.Board.Id == _boardId;
        }
    }
    
    public override IEnumerable<Expression<Func<Post, bool>>> FetchPaths
    {
      get
      {
          yield return x => x.Owner;
      }
    }
}

Note that this design allows you to specify multiple fetch paths, which seems to fix the issue that we had in our API.  Expect that it doesn’t really work, even identical queries near one another (say, in different actions of the same controller) usually have different fetching requirements.

Oh, and we haven’t even talked about the need to handle projections.

Okay, so we can do it differently, we can let developer the option to configure the fetch paths for each query at the location where that query is made. Of course, you then get the following architecture:

image

If you think that this is ridiculous, I agree. If you don’t think so… well, never mind, I was told I can’t actually revoke someone’s Touch-The-Keyboard license.

But let us say that you have solved even that problem somehow, probably by adding yet another abstraction layer on specifying fetch paths. At that point, your code is so abstract, it deserve being put in the Museum for Modern Art, but I’ll let you get away with it.

The problem is that fetch paths are actually only part of the solution for things like Select N+1. One very powerful option that we have with NHibernate is the use of Future Queries, and we can utilize that feature to significantly reduce both the number of time that we have to go to the database and the size of the data that we have to read from it.

Except… this design means that I am pretty much unable to implement this sort of  behavior. I will have to drastically change the system design before I can actually do a meaningful performance improvement.

I see this sort of problem all the time when I am doing code reviews. And the problem with this is that it is usually a stop ship issue because we literally cannot fix things properly without doing a lot of very painful changes.

Now, allow me to explain the underlying logic behind this post. Reading from the database is a common operation, and it should be treated as such. There should be very few abstractions involved, because there isn’t much to abstract. For the vast majority of cases, there is no logic involved in read operations, and even when there is, it is most often cross cutting logic. The answer that I give for things like "well, so how can I apply security filtering” is that you throw that into an extension method and do something like this:

var query = 
  (
        from post in session.Query<Post>()
        where post.DueDate > DateTime.Now 
        select post
   )
   .FilterAccordingToUserPermissions(CurrentUser)
   .ToList();

This make it very easy to review such queries and see that the appropriate actions where taken, and this type of action is only required when you can’t handle this as truly a part of the infrastructure.

It is also important that I am talking specifically about read operations here. For write operations, the situation is (slightly) different. Write operations usually have business logic, and you want to keep that separate. I would still want to avoid doing anything too abstract, and I would certainly not introduce a repository for writes.

What I would do is either create a service layer for writes that handle the actual business of doing writes, but (and this is important), I would only do so if there is actually logic to be executed. If this is merely a simple CUD operation that we are executing, there is really very little point to going with a service. Yes, that means using NHibernate directly from the controller method to save whatever it is that you are editing, assuming that there is nothing that requires business logic there.

To be frank, for the pieces that actually require business logic, I would probably prefer to avoid an explicit service implementation and move to using an internal publish / subscribe (event publisher) in order to have a nicer internal architecture. But at that point, it is more of a preference than anything else.

To summarize, do try to avoid needless complexity. Getting data from the database is a common operation, and should be treated as such. Adding additional layers of abstractions usually only make it hard. Worse, it creates, in your own application code, the sort of procedural operations that made it so hard to work with systems using Stored Procedures.

You want to avoid that, it was a bad time, and we don’t want to go back there. But code such as the above is literally something that you would write for those sort of systems, where you have only the pre-defined paths for accessing the data store.

It is no longer the case, and it is time that our code started acknowledging this fact.