Why Defer Loading in Entity Framework isn’t going to work

time to read 5 min | 958 words

When I finished reading this post I let out a heavy sigh. It is not going to work. Basically, the EF is going the same way that NHibernate was in NHibernate 1.0 (circa 2005!).

Let me show you how. in the post, the example given is:

public class Category
{
    public int CategoryID { get; set; }
    public string CategoryName { get; set; }
    public virtual List<Product> Products { get; set; }
    ...
}

This looks like it would work, right? The way the EF is doing this is by creating a proxy of your class (similar to the way that NHibernate is doing that) and intercepting the call to get_Products.

It has a few problems, however. Let us start from the most obvious one, program to interface not to implementation. Exposing List<T> means that you have no control whatsoever about what is going on with the collection. It also means that they have to intercept the access to the property, not using the collection. That is bad, it means that they are going to have to do eager loading in all too many cases where NHibernate can just ignore it.

Let us more on a bit, and talk about the inability to support any interesting scenario. If we look at collections with NHibernate, we can take this:

Console.WriteLine(category.Products.Count);

To this SQL:

select count(*) from Products
where CategoryId = 1

With the approach that the EF uses, they will have to load the entire collection.

But all of those are just optimizations, not the make-or-break for the feature. Here is a common scenario that is going to break it:

public class Category
{
    private List<Product> _products;
    public int CategoryID { get; set; }
    public string CategoryName { get; set; }
    public virtual List<Product> Products { get { return _products; } set { _products = value; } }

    public void AddProduct(Product p)
    {
        // do something interesting here
        _products.Add(p);
    }
    ...
}

There is a reason why the default proxies for NHibernate force all members of the entities to be virtual. It is not just because we think everything should be virtual (it should, but that is not a discussion for now). It is all about being able to allow the user to use a real POCO.

In the scenario outlined above, what do you think is going to happen?

AddProduct is a non virtual method call, so it cannot be intercepted. Accessing the _products field also cannot be intercepted.

The end result is a NullReferenceException that will seem to appear randomly, based on whatever something else touched the Property or not. And however nice auto properties are, there are plenty of reason to use fields. Using auto properties just mask the problem, but it is still there.

Oh, and if we want to use our POCO class with EF, forget about the new operator, you have to use:

Category category = context.CreateObject<Category>();

Hm, this just breaks just about anything that relies on creating new classes. Want to use your entity as parameter for ASP.Net MVC action, to be automatically bounded? Forget about it. You have to create your instances where you have access to the context, and that is a database specific thing, not something that you want to have all over the place.

And I really liked this as well:

The standard POCO entities we have talked about until now rely on snapshot based change tracking – i.e. the Entity Framework will maintain snapshots of before values and relationships of the entities so that they can be compared with current values later during Save. However, this comparison is relatively expensive when compared to the way change tracking works with EntityObject based entities.

Hm, this is how NHibernate and Hibernate have always worked. Somehow, I don’t see this showing up as a problem very often.

Final thought, why is the EF calling it Defer Loading? Everyone else call it lazy loading.