Is ORM So bad?

12 06 2012

ORM gets a bad name, and most of the time it deserves a bad name. It produces nasty horrible queries that don’t scale and lead to dog slow applications in production. Well that’s not entirely fair. Programmers who write code or SQL and never bother to check if they have made stupid mistakes lead to dog slow applications in production. The problem with ORM is it puts lots of power into a programmers hands, lets them loose and makes them think that they can get away without thinking about what will happen when their application has more than 10 rows in a table. There is no magic to ORM, just like raw SQL, you have to tune it.

This is not rocket science. Load the tables you are making a query against with a representative number of records, run the query and then tune the query again, and again, and again. Then run the query concurrently, then run the query with updates concurrently. And tune, tune, tune. To do that, you have to read the manual on how to tune your chosen ORM in exactly the same way you will have read the tuning manual for your chosen RDBMS. There is hardly any difference between tuning raw SQL and tuning ORM generated queries.

Where ORMs differ, get a very bad name, and make many give up a code SQL directly, is how hard or easy they make it for you to both tune the query and make the query deliver precisely the results required. The bad ORMs (no naming and shaming here), make that painfully impossible. The good ORMs firstly generate queries that by and large can only be bettered by the best DBA, and if they occasionally fail, those ORMs are easy to tweak.

If your thinking it would be easier to use some other query mechanism you will probably have to invest just as much time and effort tuning either code, update strategies or some other query language. There are really no magic bullets when it comes to making queries against large data sets go fast, unless your problem is trivial and pointer based.

So how do I tune ORM (or raw SQL for that matter) ?

There is little point in spending hours squezing the last ms out of every query. It’s the worst queries that need attention first. One approach I use on DjOAE, an app using Django ORM is to set some parameters, eg: GET requests must take less than 10ms and perform no more than 10 SQL operations; POST request may take upto 50ms and 30 SQL operations. If any request breaches those limits all the raw SQL with timing information is dumped (preferably painted red). With the database suitable constrained to make it work (ie made very small, with almost no cache so anything that isn’t a lookup runs like a dog), and loaded with representative data any operation that needs tuning sticks out like a sore thumb. When I don’t see any more, I drop the thresholds, up the concurrency, raise the data set size.

Once you have the detailed evidence, it’s as easy as falling off a log, and way more satisfying. You could probably do the same with any ORM, even Hibernate.

What about caching query sets in the app ?

Caching should really be turned off for tuning. An application that can’t work without caching at the query layer will generally not scale as at some point the cache will suffer too much contention/invalidation/replication (gulp, no not replication, you must be kidding) and the underlying slow queries will be exposed. Where caching is worth it, is where the query is already fast, but repeated billions of times. If your using caching as a query tuning strategy, that’s fine, bit it will bite you in the end.

Linear Classloaders and OSGi

4 09 2008

OSGi does not remove all classloader problems as can be seen from and where the Peter Kriens notes that 

“Hibernate manipulates the classpath, and programs like that usually do not work well together with OSGi based systems. The reason is that in many systems the class visibility between modules is more or less unrestricted. In OSGi frameworks, the classpath is well defined and restricted. This gives us a lot of good features but it also gives us pain when we want to use a library that has aspirations to become a classloader when it grows up.”

It turns out that some JPA solutions are OSGi friendly, others are not. It all depends on what is done to load the persistence.xml and the related classes, and then the proxy classes cause further classloader problems.
I guess, since the author is Peter is OSGi Director of Technology, he knows what he is talking about.
Apparently EclipseLink was written to be OSGi friendly, and non-proxy, classloader clever ORM solutions also work, Cayenne falls into this group, and reportedly works OK in side OSGi, although I don’t know if that’s v2 or v3 

Running specific Maven Test with JVM Args

21 09 2007

I have some long running tests in search, but I wouldnt want anyone to run them as part of the normal build. The tests dont have the word Test in the classname which prevents them from running, but they can be invoked on the command line with -Dtest=classname

mvn -Dtest=SearchSoak test

Also I have found that its necessary sometimes to add jvm args to the unit test, reconfiguring the Surefire plugin makes this possible, in the pom

    <maven.test.jvmargs> </maven.test.jvmargs>

And then to run with a heap dump and YourKit connection

export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:/Applications/YourKit Java Profiler"mvn -Dtest=SearchSoak \
   -Dmaven.test.jvmargs='-XX:+HeapDumpOnOutOfMemoryError -agentlib:yjpagent' \