Jackrabbit searching

25 03 2009

Jackrabbit searching on jcr:path as the primary search vector is expensive… avoid. Node properties and node types are fast.





Alternative Locking for a Jackrabbit Cluster

20 03 2009

From the previous 2 posts you will see that I have been working on fixing some concurrent update issues with jackrabbit in a cluster. The optimising and merge nature of Jackrabbits conflict resolution strategy certainly gives it performance, but it does not guarantee that the data will always be persisted. Handling those exceptions would work in a perfect world, but I don’t have one of those to hand.

The solution, for the moment at least, appears to be to lock the nodes prior to modification locating the closest persisted ancestor to hold the lock. Unfortunately the jackrabbit lock manager uses the Journal records when performing locks so the I have written a in memory lock manager that replicates the map of locks over the cluster not using the the database. This, if it proves reliable, should eliminate the need to access a shared database on every lock and unlock operation. The unit tests are showing that under no contention locks take 0.02ms and clearing a set of about 10 locks is 0.004ms. Obviously with massive contention the lock time approaches a factor of the throughput. Sadly, the logging system is susceptible to deadlocking since we cannot guarantee the order of locking, however since update follow the same code paths the locking and unlocking order is liable to be the same. Its the same problem as exists inside the DB, except that the scope of locks we are dealing with are probably smaller.





Impact of Locks in a cluster

20 03 2009

I thought JCR locking was a potential solution, but there are some issues. With Jackrabbit, each lock generates a journal entry, and it looks like there might be some journal activity generated with attempting to get a lock.

Using the locking mechanism in the previous post. I have one update to a property on one node. Performed 200 times, by ten threads concurrently. That leads to 19K journal updates. If I unwind the threads and the operations into loops performing the work sequentially for the 2000 operations I get about 4000 journal entries. Which means that locking multiplies the number of database operations in Jackrabbit by about 4x under load. Since these are write operations and they need to be replayed on all application server nodes in a cluster that might not be acceptable.

There are 2 other approaches to this problem. Accept that the exception can happen, and handle it in the same way you would a optimistic lock failure, or create a RDBMS lock scheme.

The optimistic lock failure recovery has complications since it requires perfect transactional isolation in the application code.

The RDBMS locking table might work provided the root persisted node can be identified. It may also be possible to implement this with a cluster replicated cache avoiding any DB overhead.





JCR Locks and Concurrent Modifications

20 03 2009

Heavy concurrent modification of a single node in Jackrabbit will result in InvalidItemStateException even with a transaction.
The solution is to lock the node, the code below performs a database like lock on the node, timing out after 30s if no lock was obtained. The lock needs to be unlocked as its a cluster wide lock on the node.

I suspect however that the propagation rate will not be fast enough to maintain consistency over a cluster, but then again… nothing will be fast enough without impacting performance. The slightly annoying feature of this is that you must perform locking manually. This is IMVHO a bit crazy since at some point if you don’t and you write to the node, you will get an exception, and if you are in a transaction (as you should be) you wont be able to recover the exception since it will require  rollback and a complete redo of the whole transaction.

  public Lock getNodeLock(Node node) throws UnsupportedRepositoryOperationException,
      LockException, AccessDeniedException, RepositoryException {
    Lock lock = null;
    try {
      lock = node.getLock();
      if (lock.getLockToken() != null) {
        return lock;
      }
    } catch (LockException e) {
    }
    lock = null;
    long sleepTime = 100;
    int tries = 0;
    while (tries++ < 300 ) {
      try {
        return node.lock(true, false);
      } catch (Exception ex) {
        if ( sleepTime < 500 ) {
          sleepTime = sleepTime + 10;
        }
        try {
          if ( tries%100 == 0 ) {
            System.err.println(Thread.currentThread() + " Waiting for "+sleepTime+" ms "+tries);
          }
          Thread.sleep(sleepTime);
        } catch (InterruptedException e) {
        }
      }
    }
    throw new Error("Failed to lock node ");
  }




Jackrabbit Observation

19 03 2009

Not Observation as in the ObservationManager sense, but an observation about JCR and Jackrabbit that has been confusing me and still is. If I put access control on JCR, I dont get notification of an access control failure untill I try and save an item or if in a transaction at the commit (need to check that). This means that the failure doesnt happen in the code where the problem is. I am not certain that is right since, given a permission denied on save you might take alternative action, but if you have to wait until the end of the transaction… how can you ?

A second thing that is consuming a certain abount of my free thought cycles at the moment is the issue of locking. I would have thought, opening a transaction and adding a node in the tree would lock the parent not until the transaction had been committed. However this does not appear to be the case. Does this mean that I have to explicitly lock parent nodes or nodes on modification, if so… what a pain… is there is a better way ?





Faster Jackrabbit

12 03 2009

Just with everything, there are right ways to do thing and wrong ways. It looks like Jackrabbit is the same, doing lots of saves generates lots of version history in jackrabbit and results in lots of DB traffic which makes all JCR operations slow. If you can, one save per request cycle, and binding transaction manager to the JCR objects means that all SQL activity is performed at the end of the request cycle in one block. Having seen the impact of a small amount of tuning on write performance, I think there will be scope for more.