Having spent a month or so gradually improving performance in Nakramura I thought I should share some of the progress. First off, Nakamura uses Sling which uses Jackrabbit as the JCR, but Nakamura probably abuses many of the JCR concepts, so what I’ve found isn’t representative of normal Jackrabbit usage. We do a lot of ACL manipulation, we have lots of editors of content, with lots of denys, we have groups that are dynamic, all of this fundimentally changes the way which the internals of Jackrabbit respond under load, and its possible that this has been the root cause of all our problems. Still be have them and I will post later with some observations on that.
We were due to do a release at the end of September, but load testing (way to late) showed that we had major performance problems. To be fair, I had heard reports from other projects of similar problems but arrogantly thought that much earlier load testing showed that we didn’t have a problem, more fool me. We found that on read requests the server was essentially single threaded never making use of extra cores. Below is a shot of the thread trace from YourKit. Red shows where threads are blocking and doing nothing, bear in mind that there are 200 threads and the screen only shows a few. All were blocking. No wonder it was fast with 1 thread but really slow with 200.
After a bit on investigation I found, 1 SystemSession being shared by all request threads, blocking synchronously. Fortunately using OSGi bundles its easy to rebuild a jar with patched classes, so we patched that area to attach sessions to workspaces and threads within the Access Control Manager. Although that reduced the blocking we had to also reduce memory consumption since we now had 200 system sessions all building up state. So I added SoftReferences and positive eviction to ensure that no Session became too old and bloated and should the JVM GC activity get to high it would forcibly evict bloated sessions. Turns out the standard Sun JVM GC doesn’t look too far down SoftReferenced object trees to see what it can evict, and so it wasn’t that good at evicting caches many hops removed from the soft reference, which is why we age the sessions based on use and lifespan. This does have an impact on performance, but it also reduces long term memory growth and results in less overall GC activity. Coincidentally this also eliminated we deadlock I had been chasing for several weeks. A writer thread would take out a exclusive write lock in the SharedItemManager, and holding on to that lock start to update Items. In that process the ACLProvider would need a read lock using its SystemSession, however in the meantime another thread had entered the same system session via a synchronized block excluding all other threads and was waiting for a shared read lock in the SharedItemManager. Hey presto, deadlock. Pretty soon all reader threads queue up trying to get a shared read lock and the server stops responding. YourKit were kind enough to give me a license to use on Open Source, so I should say, it would have been much harder to find that one without the thread trace and monitors in YourKit (thanks guys)
That was fine, it reduced blocking in the entry to the ItemManager bound to each session but soon exposed blocking the next layer down. Jackrabbit makes heavy use of LRUMaps from Commons Collections. These are non thread safe LRUMaps implementing an efficient LRU algorithm. Absolutely fine if you really have single threaded operations, but not so good if anything concurrent gets in. Turns out its not to hard to re-implement a LRUMap concurrently (ok I cheated and extended ConcurrentHashMap) so I set about replacing some of the problematic ones with thread safe versions.
By this stage, read concurrency was Ok, with occasional waits for read locks in the SharedItemManager. There is a fix in Jackrabbit Trunk for concurrency, due to be released in JR2.2, but unfortunately there has been a lot going on between 2.1 and the head of trunk, and the stream of 30 or patches made more of mess than I was happy with, so we couldn’t make further progress on read performance. The next worrying thing that we noticed was that under high file upload to a content pools, the server was back into single threaded mode. After investigation it turned out that the content pool structure we are using, which is a large key pool requires that every item has its own set of ACLs. Content models in Jackrabbit are normally hierarchical, thats what it was built for. If every file has lots of ACLs, the invalidation traffic flowing through the Access Control Providers become dominant, and worse, the standard implementation shares the SystemSession reserved for request threads. This means that for every ACL modification, all threads on all requests get blocked. Removing that blockage by placing the modifications in a concurrent queue for later removal by the request thread appears to have eliminated all the blocking under load. Below is a trace from today. All our changes are in the server bundle of Nakamura, next target search performance, but I still have a nagging feeling that if we ignore David’s model, we should not be using JCR for content storage.