Getting somewhere, not sure where

4 01 2011

Finally I feel like I am getting somewhere.

This is JConsole showing Heap, half way though a load test uploading pooled content. There are 20 threads hitting the server from JMeter with no pauses uploading files from 50K down to 1K in size. JMeter is reporting throughput of  between 75 and 108 files per second depending on if I have the Terminal window with the log flying past open and JMeter GUI open. This is using the sparse sessions I mentioned earlier, with shallow sizes around 80 bytes (deep about 1K) and JDBC connections bound to threads, which is probably why the heap is static and flat around 100MB. The X axis is time of day.

The other change that has been made, is browed from the optimisations I did to Sakai 2.4 Content Hosting Service. There the bodies of objects were serialised to byte[] before injecting into the DB. I had been using key value pairs direct in the DB, however as with the early Jackrabbit persistence managers that generates a lot of SQL insert update traffic. Select traffic is mostly eliminated with one shared concurrent cache. Unlike the Jackrabbit Bundle persistence manager that stores a range of nodes in a blob, I am storing one content map per blob on the basis that it keeps the session small and simple, and ensures that the updates hit fewer content maps. That vastly reduces the indexing load on insert and update a the DB, drops the SQL insert update traffic by an order of magnitude and doesnt appear to impact serialization/deserialisation. Its also closer to the column concepts used in the Cassandra driver for this sparse content map system where the Thrift client is performing the serialization/deserialization for free.

The downside is that the ability to use the DB to search on any field is gone. Actually thats no bad thing, since we didnt want to search on 90% of the fields anyway, they were just payload. This “indexing” has now be replaced by dedicated indexing tables for each column family that only contain the fields and values that need to be indexed. This vastly drops the indexing and insert/update load on the DB since its only considering things that the developer really wants to index, and we are still using Solr as a “not quite so real time” ™ secondary index.

If I look back through my blog archives I notice that this was the model that was proposed for Nakamura (then K2) right at the start. The main difference being we were also writing our own component manager. With hindsight writing out own component manager was dumb, but I should have stuck to my guns on the indexing mechanism and done more of my own testing.

For those in Sakai 2.x world who are wondering. This content system is rather like Content Hosting Service; without the 100 or so API methods, with per object hierarchical ACLs, with git like versioning, with hard links, with soft links and with the ability to use Column databases as well as traditional RDBMS’s. I also suspect from the heap usage in this test….. its a bit lighter on resources.

Testing Sparse

3 01 2011

The depressing thing about profiling and load testing is the more you do it, the more problem you discover. In Nakamura I have now ported the content pool (amorphous pool of content identified by ID rather than path) from Jackrabbit to a Sparse Content Map store that I mentioned some weeks back. Load testing shows that the only concurrency issues are related to the Derby or MySQL JDBC drivers, and I am certain that I will need to jump through some hoops to minimise the sequential nature of RDBMS’s on write. From earlier tests thats not going to be the case for the Cassandra backend which should be fully concurrent on write. Load testing also shows memory leaks. Initially this was the typical prepared statements, being opened in loop, leaking without closing. Interesting Derby is immune to this behaviour and recognises duplicates so no leaks, but MySQL leaks badly, and always creates an internal result set implementation when a prepared statement is created. That was easy to fix, thread local hash map to ensure only one prepared statement of each sharded type is opened, and then all are closed after the end of the update operation.

Next new memory leak that has appeared is in the Jackrabbit Session. I am not certain if its new, or old just exposed by higher load. Sling has a PluggableDefaultAccessManager that is used by Jackrabbit to allow AccessManagers to the repository to be plugged in. Unfortunately the Jackrabbit configuration model is based on bean utils type injection, and the AccessManagers are bound to  JCR Sessions which are free floating objects. In order to plug AccessManagers in via OSGi, Sling has to have a service tracker to keep track of the AccessManagerFactories, which in turn must track all the resources created to ensure cleanup. Sadly the OSGi singleton service model doesn’t quite fit with the Jackrabbit free floating Session model, resulting in the AccessManagerFactory maintaining references to the Sessions, which dont always get cleaned. Hey presto, a leak. After about 8h of uploading content to the pool, which still uses JR for login/logout operations, I have 20K JR Sessions that the GC processes cant clean, and the JVM is getting swamped by GC cycles to almost no useful work.

Switching to the DefaultAccessManager that comes with JR we lose the ability to plug anything in via OSGi, but the leak goes, only to expose the real underlying cause. In our Q1 release we had so many problems with concurrency in JR Sessions that all lead back to 1 SystemSession per workspace managing security. On one hand this one session is good, since all the ACL resolution is cached for all sessions making ACL resolution fast, except, JCR sessions are non thread safe and so are heavily synchronized to prevent total JCR workspace meltdown in the event of concurrent access. Now if you use of JR or Sling is read mostly per workspace (or read write with a handful of users) then there is no problem. Even if all your ACLs fit in memory, you are probably going to avoid concurrent access of the single SystemSession supporting ACL resolution. Unfortunately for us, our use of JR is mostly read-write and the number ACLs we have, partially due to the content pool mentioned above, are orders of magnitude greater that what would fit in memory and avoid the Singleton SystemSession. So in Q1 we modified JR to bind SystemSessions to threads for as long as they were needed using concurrent finaliser queues to release resources correctly. That works. It does increase the memory footprint, but does avoid LRUMap contention when the SystemSession get populated with a universe of ACLs. Unfortunately under the load we can now put on the system it also causes a Session leak in the AccessManagerFactory which is a singleton. I really should go and fix that, but just at the moment I have consumed all the spare time tracking the problem down and so the simple solution is to replace it with the DefaultAccessManager from Jackrabbit, drop pluggability via OSGi and say good by to the leaks, and hello to the finalizer closing SystemSessions. (also needs to be fixed)

On a more positive note, the Sparse Content Map was intended to be lightweight so that it would pass through GC cycles with minimal impact. Compared to JR sessions, the SparseSession has a shallow size of 80 bytes per session and a deep retained size of 167 bytes per session. JR XASessionImpls have a shallow size of 232 bytes and a retained size of 39K, so at least the “lite” in the Classname is somewhere close to reality. Like the JR Session traffic to the RDBMS is mostly update with a central shared concurrent cache eliminating most reads. Unlike JR the write and read operations are 100% concurrent with no synchronization.