Are ADO.NET Datasets dead?

If you are plugged in at all to the Microsoft world you have been seeing more and more LINQ and ADO.NET Entity Framework buzz.  You can't go to a conference or attend a User Group meeting without discussing one of these new data access technologies.  I'm not saying that's a bad thing... Unless you are a Dataset, Datatable or Dataview.

I wrote a post back in May trying to shed some light on when to use which technology.  You might want to read that post and the comments before reading on.  In that post I stated that if you are creating new applications or just adding on features to your current app I would use LINQ or EF instead of ADO Datasets.  There are many reasons.  Here are the most compelling reasons to use LINQ and EF.

  • Objects - The nicest thing about LINQ and EF is they provide some of the foundations for Object Oriented programming.  They literally force you to think and work in terms of objects and collections of objects instead of rows and tables of rows.  By setting up this foundation, these technology allow for the application to grow and use some of the more complex OO concepts without doing tons of work to fill and save objects from Datasets.  So, start simple and as the requirements on your application change you can use some more complex OO topics to solve those problems.
  • Memory - There are two schools of though on this topic.  Some people like to pull all the data back into memory and then work with it there.  Others don't like to fill memory with data that might not ever be used.  LINQ and EF allow for both.  When you build your LINQ and EF models you can use stored procedures same as you used to with Datasets and bring back the whole table if you desire.  Or you can use the LINQ syntax to do filtering and sorting, which uses the power of the relational database engine to get only the row(s) you need.  To me it is compelling to have the flexibility that LINQ and EF give me.
  • Performance - We can go round and round about the performance implications here.  People say stored procedures are faster because they are pre-compiled, but with the new technologies these LINQ queries are cached and optimized and the difference is often minimal.  To me this is the 80\20 rule.  80 percent of my queries will run the same or faster in LINQ and EF than they do in a SP.  The other 20 percent of my queries will run faster in stored procedures.  So, use both.  LINQ and EF allow for using SP or the LINQ syntax.  Plus, with LINQ you can bring back only the columns you need which can improve performance as well.  I'm not saying quit using stored procedures, but instead of just blindly using them because that's what you've always done, step back and analyze your requirements to see if it's worth it. 
  • Interop with Other Object Collections - The other benefit of working with LINQ and EF is that you can join your data objects with other object collections.  If you've seen my talk on LINQ and the demo related to joining my SQL data with file data retrieved using the System.IO objects.  It is so easy and would require lots of work with Datasets.

There are many more reasons out there, but these are a few of the ones that are most important to me.  Any one else want to add?

4 comments:

  1. Jiangning Tang Says: This comment has been removed by the author.
  2. Jiangning Tang Says:

    Actually, if your organization dictates that everything has to go through Stored procedures, Linq to SQL is awkward to work with and much of the benefits of using Linq to SQL are gone, like DataContext automatically tracking changes and automatic optimistic concurrency.

    Comparing Linq to SQL with TableAdapter, TableAdapter is much more productive than Linq to SQL.

    1. TableAdapter and related utilities make programming against stored procedure much easier, code for sorting, paging etc are automatically generated while using Linq to SQL, you have to write a lot of code for those common tasks.

    2. It's recommended that if you want to implement clean tiered solution, don't return IQuery&ltT&gt, instead IEnumerable&ltT&gt, which automatically downgrade your data access from Linq to SQL to Sql to objects.

    3. Using IQuery&ltT&gt invokes translating IQuery to T-SQL, which is a performance hit.

    4. Object orientation purists sometimes kill performance. Sometimes, it's better to think data as tables and rows rather than objects. Business is about data for the most part. You can express your data as objects, but in the end, the data are persisted in tables as rows. The great sin of ORM is duplicating effort to cache data. RDMS does a great job in managing most used data in memory cache by doing all sorts of amazing jobs, read-ahead, checkpoint, etc; ORM then comes in and do the same, demanding you work with their copy of data in memory. More often than not it's a waste of resources and sometimes create unexpected and strange behaviors. No longer can you know where your changes are. Changes are managed by ORM now. And for most business applications, you are better to save your data to the database as soon as possible, instead leaving those changes in memory and later commit them. I often like to compare the data objects with the game character objects. They are different. When you are playing game, all those characters are really alive; they have to be ready to respond your click or key punches; I would like to call them live objects. On the other hand, data objects are mostly dead after they are retrieved from the database and presented to the user. Users work on those data through forms (fields, lists, etc).

    Users are not working on data objects directly in the way we play computer games. Once you can see the difference between data objects and game character objects, then it really doesn't make sense to make your data objects alive in memory all the time and waste your time and resources to manage their life cycle. I heard a lot of horrible stories about ORM/Java, etc. And currently I am working on converting a Java/Hibernet project to .Net because the Java/Hibernet app is ridiculously slow to the point you just want to smash your computer to pieces or jump out of the window :)

    By the way, Entity Framework is built on top of ADO.NET, so ADO.NET will not go away. But I agree with you that classic ADO.NET type of programming should not be used. Best approach at this point is to use typed datasets and tableadapter.

  3. Peter Says:

    I really like your post. I think you hit on some great points. One thing that seems a bit harder to do with the EF and Linq to SQL is the ease of serialization. I long for the time when I can do .GetXML() on an EF object. That was very simple with datasets.

  4. Ben H Says:

    k2bumper - I agree that ReadXML and WriteXML are great methods off of Datasets. It does make it easy to serialize and pass Datasets around. However, I believe you could serialize and pass the objects from the EF as long as both applications know the type being used in the XML. It would take some work, but I bet you'd be surprised how easy it is. Also, take a look at ADO.NET Data Services (Formally know as Astoria). This gives you an easy way to pass your EF data as XML through a REST based service.