ORM for Database Abstraction or Stored Procedures for Data Structure Abstraction?

“Object-relational mapping (ORM, O/RM, and O/R mapping) in computer software is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language.”

With the ever rising popularity of ORMs, I have to ask myself the question:  When is it a good idea to to use an ORM?  It’s a difficult question to answer because it relies on a number of variables, not all of which are known at design time.  When designing enterprise systems, I’ve found a use for both architecture styles and have developed my own personal biases for each.  The following is by no means a comprehensive list, just a few high level points I’ve considered when making my decisions.

ORM

The concept of the ORM is to simply map data fields to logical object.  Nearly every ORM has the capability to map disparate data fields to logical objects, yet few take an ORM to this level of complexity.  The common ORM implementation is a one to one mapping of objects to data tables.  This helps with LINQ queries, IntelliSense, and readability while minimizing confusion of where the data ends up in the database.  However, this approach tightly couples business objects to the data structure which can cause bigger headaches in the long run.  If the need to flatten a data structure for performance, or normalize a table to minimize duplicated data arises the chance for bugs and logic conflicts increases drastically.  Here are some of the

The Upside

Business logic stays in code. Probably the best argument for an ORM is that it helps to enforce the idea that business logic belongs in a business logic layer rather than the database.  With an ORM, all logic is handled in application/service code which makes this rule easy to enforce.

No DBA required.  Since the ORM brings all the heavy lifting to the application code and the queries are managed by the , there’s no compelling reason to staff a SQL expert.  In my experience, having a SQL expert and an ORM is usually like mixing potassium and water.

Changing data stores is purported to be minimal effort.  In theory, a properly implemented ORM can repointed from once data source to another (with the same structure) without drastic manipulation to code.  Personally, I’ve never encountered a situation in which I’ve had all of these variables perfectly in place, or the business need to do this sort of change-over.  Changing from MSSQL to Oracle is a big deal for numerous reasons and is likely to be accompanies with business logic changes.  However, this seems to be one of the selling points for the ORM approach.

Transactions are relatively straight forward.  ORMs store all changes for a transaction in memory until the application commits those changes to the database.  In Entity Framework, this is handled with the AcceptAllChanges()command.  This allows for a group of related changes to be staged, committed, and rolled back with relative ease.

The Downside

A change in data architecture can take down an application. With the application and data entities being tightly coupled to the data structure, any change to the data store has the potential to break portions of the application.  This means that any change in the database architecture requires a change to the ORM and referencing logic.  This increases development/test time for all changes of this nature.

Slower than stored procedures. Because the queries generated are plain SQL queries, they don’t have the benefit of being compiled or cached at the database server.  Because of the nature of the ORM data access, the queries are generated at runtime.

Complex data retrieval or storage queries can generate some nasty SQL.  ORMs can produce some hairy queries.  Although it’s improved with recent versions, Entity Framework has a difficult time handling multiple joins efficiently.  This causes the runtime preparation of complex queries to have a significant impact on application performance.

Altering data structures is no fun.  I’ve yet to work on an application that didn’t require additional tables, additional columns, data type changes, or dome alteration of the data structure.  Moving things around with an ORM is less than intuitive and requires a good amount of coordination and testing.

Stored Procedures

There are a few big upsides to using stored procedures as an abstraction layer that have pushed my designs to this direction for several large systems.  Implementing this can require additional resources and time in development but allows for data structure flexibility and optimal performance.

“The data architect’s answer to why use stored procedures is that T-SQL is only language that you KNOW will last as long as the data. A stored procedure that was written ten years ago is still just as easy to call as one that will be written today, or in the future. Anything less is building a long term solution on today’s fad, and it’s just plain foolishness.” from Paul Nielsen’s article “Why use Stored Procedures?” on sqlblog.com  Read more…

The Up Side

Compiled on the database and data execution plans are cached.  Because of this, speeds will out-perform any ORM.

Written in native database language with access to additional performance optimizations and language features. ORMs have to translate logic to queries.  Even at their best, the can’t take everything into account.

Decouple application code from database structure.  The data structure can be changed and manipulated to improve performance without having to update code; only the referencing stored procedures.

Business objects represent business needs.  There’s no direct need to have a one-to-one relationship of business objects to the data structure.  This allows a bit more flexibility in design and alteration.

The Down Side

If you don’t already have a dedicated SQL developer on staff, it will behoove you to hire one.  Most application developers can handle simple to moderate stored procedures.  Once the application concerns switch to performance, traffic and data size, you’ll want to make sure you have a specialist on hand.  Data integrity is priority for most organizations.  Data management, storage, and integrity for a high volume database is a job for a specialist not a part time concern for application developers.

You will inevitably and repetitively be urged to put business logic into stored procedures.  Whether it makes sense for the function or the DBA is telling you that he could do things faster and easier in the stored procedures, this will come up.  Spreading business logic around to various layers of an applications is not only bad form, but it makes it exponentially harder to troubleshoot when a problem arises.

Object mapping needs to be handled manually. Without an ORM, the output of a stored procedure needs to be mapped to an object manually.

Messaging needs to be handled.  Unless you have a desire to parse through SQL errors, you’ll need to device a common messaging structure to communicate detailed information to the application.  Depending on the level of detail, this can be a significant effort.

Transactions become a bit more cumbersome.  Simple/moderate transactions can be handled relatively easy with an ORM.  With Entity Framework, SaveChanges() commits all in memory changes to the database in a transactional manner.

The Hybrid

A very viable alternative to going whole hog on one of the previous designs is to take a little of both.  Pick a less volatile section of the application and use an ORM.  Leave your DBA out of non-critical features and let them focus on the high traffic, performance dependent areas of your application.

The Up Side

The best of both worlds.  Performance for the areas that need it, rapid development for the features that don’t.  Every application has features or areas that draw less traffic and might require faster development cycles; perfect for an ORM.

Most production ready ORM solutions can handle Stored Procedure output mapping.  The big players can be used to map the output record set(s) to business objects, saving you the hassle.  This adds quite a bit of overhead, and there are lower cost alternatives.

The Downside

Confusion.  Unless there are clear boundaries drawn, it can be very confusing to developers as to which method of data access to use where.

Sharing data structure makes things complicated.  If the different data access methodologies share a data structure that needs to change, this creates a good amount of work and a great deal of testing.  Say a UserProfile table needs to be altered to require a Date of Birth and a phone number.  Not only to all stored procedures need to be altered to incorporate this structure change, but the ORM entities need to change as well or inserts will potentially fail.  I’m sure in the real world changes like this will likely be part or a larger development effort anyway, but it’s worth noting.

Other Options

In-Line SQL

I’ve added this just so I can make a arguments against this choice.  Explicitly writing SQL queries inline can be difficult to manage and very difficult to keep in sync with the data structure.  The chance for developers to write inefficient or buggy queries increases and the adherence to uniformity drops.  I go out of my way to avoid this option unless absolutely necessary.

No-SQL

Another viable option that’s rapidly gaining popularity are the No-SQL or non-relational persistence models.  MongoDB… CouchDB… RavenDB… it seems like a new NoSQL database is popping up every week targeted to a slightly different audience to solve slightly different problems.  It’s a completely different way of storing data and one that works extremely well in many instances.  From a rapid development site to one that needs high availability and easy sharding, it’s a very viable alternative to traditional relational databases.  If you can get away from the idea that data needs to be normalized and that complex queries are necessary to run an application, this could work for you.  I’ll save this debate for a different post, but here’s a very thorough article to scratch the surface on the pro’s and con’s: SQL vs. NoSQL: Which Is Better?

Discussion

Let it be known that I’m neither advocating nor opposing any of these approaches as the only viable solution.  I also recognize that not all viable database servers offer stored procedures, and that the above selections aren’t the only viable solutions for data storage and access.  If you have another point to highlight or an option that I haven’t listed, please feel free to share in the comments.

16 Comments

  1. I have to take exception to your statement that stored procedures will always outperform an ORM. That statement simply isn’t true. It certainly isn’t true for Hibernate.

    You also make the mistake of saying that because you use an ORM you don’t need a SQL expert. Again I disagree. Even if using an ORM development staff should know SQL. You’re still going to have to tweak how the ORM works to get the best performance and that means knowing how to read and write SQL. You can’t get away from that.

    There of course are options like MyBatis which is a hybrid approach. It works quite well.

    • In my mind, there is a difference between SQL expert and an application developer who knows SQL well. The expert being someone who is skilled in performance tuning of the queries and data and dedicated to managing this aspect of the application. Something, in my experience, most developers don’t have the time to dedicate to this on a consistently basis.

      Anticipating this response regarding performance, I spent some time trying to find any sort of performance comparison showing an ORM out performing stored procedures. I couldn’t find anything. If you have something you can share, I’d love to read it.

      • Unfortunately I don’t have anything I can share but I can speak from first-hand experience. At my last employer I worked on a large project, a warehouse management system. It was all Hibernate-based and we were able to get the performance equal to that of any stored procedures. And boy did we have some fervent followers of the “stored procedures are always better and faster” religion. Between Hibernate caching, the fact that it generates prepared statements, and a lot of tuning the performance was great. Was it easy? No, not by a long shot and I’m not sure I’d want to go that route again. Using an ORM well is not easy. If you’re not going to spend time tuning you’d probably be better off not using one.

        The thing is, at least on oracle, stored procedures aren’t some guarantee of good performance. There was a paper I read awhile back (couldn’t find it again) that compared the performance of java prepared statements vs. stored procedures and there wasn’t much of a difference. And it makes sense if you look at the life cycle of both….you don’t really save a lot with a stored proc. not enough that it’s going to mean its worth losing the benefits of having your business logic where it belongs. Does that mean they’re useless? Of course not but with the tools available these days I’d rather use them sparingly.

        • That’s very interesting. Obviously, Hibernate, when tuned correctly, can perform almost as good as JDBC. But a set of JDBC calls is still likely to be slower than a single SP call to a procedure which then runs a dozen statements. The mere fact that you’ll have to transfer data over the wire seems to indicate so. Would be really interesting to have more insight into your actual benchmark, as Hibernate clearly isn’t faster than SP’s “per se”.

      • Oops. forgot to mention….we did a lot of profiling on that warehouse app comparing the difference between the hibernate code and SQL written by an expert and put in a stored procedure. The zealots on the db team insisted on it. We won ;)

  2. Got a question, doesn’t the downside of “A change in data architecture can take down an application.” for an ORM also occur for the stored procedure?

    Like you said for the stored procedure, you still need to map the stored procedure returned message to some sort of object. Adding additional values to return from the stored procedure will cause a change to the underlying mapping, thus causing the same downside mentioned for ORM use.

    The main benefit I see for ORM utilization is wrapping the aggregate of changes in a unit of work through the ORM.

    Testing is still kind of iffy to me when putting the business logic in the code and not in stored procedures. Developing unit tests that hit the database through the ORM may work all and well for the test developed (important note is that a unit test should not be hitting an actual instance of a database), but falls flat on its face when actually used in a real database. Using C# as an example; if the IQueryable references a C# function within the where clause Entity Framework will have no idea how to generate the SQL, but the unit test using IEnumerable will have no problem.

    My advice if using the Repository Pattern, again with C#, is to return IEnumerable from all the public facing methods the repository interface implements. So if you have a PersonRepository class, the method IEnumerable GetPeopleWhoHaveTheFirstName(string firstName) would be present. If you find you need to refine your seach even more from within the domain layer, such as first name doesn’t cut it because you need to query by people that have the first name “X” and are over the age 21. Do not do this from within the domain object because you would be wasting memory. The new query public facing method would be IEnumerable GetPeopleWhoHaveTheFirstNameAndIsOverTheAgeTwentyOne(string firstName) would be optimal.

    Also, this gives you a big benefit of using in-line SQL, or calling stored procedures of the same name if you find the ORM you are using just isn’t cutting it, but the downside is attempting to get non-ORM solutions to work within some sort of transaction scope/unit of work.

    The example above is following a Domain Driven Design approach. As for testing the Repository abstraction, I find the use of integration tests using Behavior Driven Design to be the best approach. These integration tests would use the same underlying database (implementation, not same physical box) that the production database uses.

  3. The biggest upside of Stored Procedures could be they can enforce business rules across many different applications from many different vendors.

    Where I work, the idea of someone else writing into our DB with nothing but a SP to protect us would be pure lunacy. But I believe some shops are into this, and I suppose it would work as long as your SP’s were robust enough.

    Hopefully whatever degree of robustness you enforce internally is enough for external users too, but I’m sure that’s often not true.

  4. Oh and for us the biggest DOWNSIDE of SP’s would be deployment and versioning hassles, eg. coordinating changes to many SP’s across many systems and in sync with changes to the corresponding Java/whatever code that calls them.

    • One way I’ve seen this handled is by using schemas to build a namespace-like convention for stored procedures as well as better controlling access. Depending on the preference of the developer or the needs of the application, it could start with application name or functional area first. Using a common data access/service layer, given enough time, I’ve always run into an issue with versioning. With the namespacing convention, it’s been easier to keep multiple versions of a sproc organized.

      If you’re not using stored procedures, how do you handle a structural change? Say a required column was added or removed from a common table. Wouldn’t you still run into the same issues with coordinating across many systems that referenced that table?

  5. This article is based on the assumption that you need a DBA or really really skilled SQL developer to actually implement any good SQL. Why not just learn it yourself? Why is it utterly impossible for a Java (or C#, etc.) developer to also learn SQL? It’s not that hard, just a bit different.

    And if you have a tool like jOOQ, by the way, then inline SQL can be awesome as your SQL code compiles into Java

    • I’d hardly say the entire article is based on that assumption, but that seems to be the common sticking point for folks. Yes, a developer can be great at SQL. A developer can also be great at networking, graphics, or any other aspects of development. At some point in any organization/application efficiency and productivity is increased by having specialists handle sensitive areas.

      As a developer, I’ve done some design and graphics work, with varying success. I’d gladly turn it over to someone that’s artsy and dedicated to making things pretty. Same can be said for data. I can and have designed/managed databases in the past. The moment I turned it over to someone dedicated to that aspect, I learned volumes and got some great ideas for performance improvements.

      • Heh, I’m going to leave playing Rachmaninov Piano Concertos to the experts, too ;-)

        But here, we’re talking about two slightly distinct development paradigms: Imperative/OO/Functional/whatever (Java, C#) and declarative (SQL).

        In particular, the “No DBA required.” argument in favour of ORMs was a bit troubling to me. If the database schema is complex and contains lots of data and throughput, using an ORM won’t be a magic bullet against having skilled DB personnel. This and the “If you don’t already have a dedicated SQL developer on staff” argument make it sound like “serious” SQL is something dangerous, not manageable by “ordinary developers”. This simply doesn’t match my experience, but yours may vary, of course.

        • You’re right. I think folks are getting hung up on my use of the term “DBA”. From the comments, it seems that people are interpreting this article as a “Developers can’t do SQL” argument. This is hardly the case.

          The only question I intended to pose is whether or not the SQL should be managed by a person or an application. Should queries be handwritten and manually tuned or left up to an ORM? The term DBA was used (possibly in error) to describe a person that dedicated to the performance and maintenance of the database and the queries within. This person can be a developer, janitor, or DBA.