Abstracting Sakai Urls

4 06 2009

For years the Sakai community has suffered with unspeakable URL’s. To give educators URL’s that they can only communicate in text, and are unable to spread by word of mouth must be a barrier to teaching. As we rewrite Sakai based on Apache Sling I am determined to ensure that a tutor at Cambridge can say, in a busy street, at lunch time to a confused student; “go to camtools quantumwell2008 and look for the lecture notes” with some confidence that the student will enter camtools.cam.ac.uk/quantumwell2008 and find what they need.

Competition for simple urls generates challenges. A land grab to claim base level urls will create a huge number of unique root URL’s. However we might like to impose a content model on the participants, the URL’s are their space, and any abstraction purely for technical purposes has to be avoided. Why should I ask the tutor to put his content in /2008/physics/tutorials/group54/week12 which is only marginally better than earlier versions of sakai with an unspeakable /portal/site/edf435-edf-a237-bdc12 or something close.

To make the URL’s of Sakai usable we have to accept in places there will be millions of potential elements at level in URL space. This leaves us with a technical problem. Put even 1000’s of items into a folder in JCR and write performance and concurrency suffer. Experiments show that < 1000 child nodes delivers acceptable performance. This is not unique to JCR, as all hierarchical filing systems exhibit performance limitations with numbers of items per folder at some level, if the folder optimization has been used to speed underlying access. I am not advocating patching Jackrabbit to make it capable of storing millions of items in a folder since doing that will certainly expose other problems.

So we have accepted that we need to support millions of siblings in url space. To achieve this in JCR and maintain scalability we are re-writing the URL space though a filter that hashes the path so that no folder contains more than 255 items. With 4 levels of hashing  the theoretical limit becomes somewhere around 4E9 items, assuming the underlying infrastructure can cope with that many items.

After a great deal of experimentation and reading of the Sling code base the implementation solution was embarrassingly simple. Write a servlet that rewrites the URL, resolves the real JCR Sling Resource and re-dispatch that resolved resource back into Sling. In all about 20 lines of code. The only hard part was to patch Sling to allow it to bind non-existent URL’s to Servlets. (see http://codereview.appspot.com/67146) So we now have a mechanism were we can designate nodes within the content tree to be abstracted in this way creating highly scalable stores of information… simply by setting a property on the node.



3 responses

5 06 2009
Michael Marth

Sling also supports vanity URLs. Setting the property sling:vanityUrl of a node to the URL you want should do what you want to achieve I believe.


5 06 2009

If I read the patch correctly there are 2 problems with the approach.
Firstly, a vanityUrls is a static mapping, resulting in the target node being statically mapped. /products has a property that points to /23/ed/3f/myPage. What happens when i want /products/enterprise/specification.pdf ? I the abstracting model that should automatically be /23/ed/3f/myPage/enterprise/specification.pdf.

The second problem which is really the one being solved, is that we have millions of items at / which will never work in a JCR (or many other file systems). With vanityURL’s we still need millions of items at / since each /product is a node in the jcr pointing to the real location in the JCR. With the appoach in the post there are 0 items at / and the real path in the JCR being computed from the request URL itself, hence it scales.

Its perfectly possible that I have read the patch incorrectly, I I have missed something please say. ( and thanks for the pointer, it will come in useful)

5 06 2009
Michael Marth

In the example you quote the node at /23/ed/3f/myPage would have a property sling:vanityUrl with value “/products”. There would not be a node located at /products. That should solve the second problem. The first problem you quote is not solved by vanityUrl.

%d bloggers like this: