OSGi and Snapshot versions.

9 06 2009

If you call your versions 1.2.0-SNAPSHOT and reference them as such in a manifest file, they wont load in OSGi, or at least with Felix, as the versions are expected to be numbers, parsed on . In the Manifest the version must be parsable, 1.2.0.SNAPSHOT, which looks a bit odd, but works. It looks like the bnd tool does this for you.





File meta data

5 06 2009

Digital repositories need metadata associated with content, at the most basic level this can be as simple as properties associated with the file nodes in JCR.

But…

By default files uploaded to Sling cant have properties added to them because the nt:resource node type in the spec only allows a small defined set of properties.  Obviously we really want to have properties on files, as all documents have metadata both technical and authoritative. There is a way to make nt:resource (or any other node) have properties and unstructured children, with a mixin.

Here how

First we need to define a definition for the mixin, this is in cnd format, which is loaded into Sling with the standard ContentLoader extension to an OSGi bundle. Here is one I did earlier for Sakai Sites.

// Add a mixin that allows nodes to accept other properties, (eg nt:file)
[sakai:propertiesmix] > mix:versionable mixin
 - '*' (undefined)
 copy
 - 'sakai:labels' (string)
 copy
 multiple
 + '*' (nt:unstructured)

This says, the mixinType sakai:properties mix allows any property, of an unspecified type to be added and the sakai:labels type to be added as a multiple type. It also allows multiple children of type nt:unstructured.

If you need to create add this capability to any node here is what to do, assuming the SakaiK2 server is on localhost:8080
create a temp folder as admin

curl-F"test=test" http://admin:admin@localhost:8080/tmp

upload a file called patch file from the local file called SLING-251.patch

curl -F"patchfile=@SLING-251.patch" http://admin:admin@localhost:8080/tmp

looking at the properties of the node I just created we get.

curl http://admin:admin@localhost:8080/tmp.tidy.2.json
{
 "test": "test",
 "jcr:primaryType": "nt:unstructured",
 "patchfile": {
   ":jcr:data": 16062,
   "jcr:primaryType": "nt:resource",
   "jcr:mimeType": "application/octet-stream",
   "jcr:uuid": "24030fa7-6bf2-49ec-b52b-1dfbf1a6a5d9",
   "jcr:lastModified": "Fri Jun 05 2009 13:06:36 GMT+0100"
 }
}

Add the sakai:propertiesmix mixin

curl -F"jcr:mixinTypes=sakai:propertiesmix" http://admin:admin@localhost:8080/tmp/patchfile

looking at the properties of the node I just created again, notice the mixin

curl http://admin:admin@localhost:8080/tmp.tidy.2.json
{
 "test": "test",
 "jcr:primaryType": "nt:unstructured",
 "patchfile": {
   "jcr:versionHistory": "cdfeb02d-aeed-4311-be9e-d6a402566f42",
   "jcr:isCheckedOut": true,
   "jcr:baseVersion": "e816ccb9-df4a-4ad1-9772-be2079982131",
   ":jcr:data": 16062,
   "jcr:primaryType": "nt:resource",
   "jcr:mixinTypes": [
     "sakai:propertiesmix"
   ],
   "jcr:mimeType": "application/octet-stream",
   "jcr:uuid": "24030fa7-6bf2-49ec-b52b-1dfbf1a6a5d9",
   "jcr:predecessors": [
     "e816ccb9-df4a-4ad1-9772-be2079982131"
   ],
   "jcr:lastModified": "Fri Jun 05 2009 13:06:36 GMT+0100"
 }
}

Add a new property to the file

curl-F"something=else" http://admin:admin@localhost:8080/tmp/patchfile

and the result is

curl http://admin:admin@localhost:8080/tmp.tidy.2.json
{
 "test": "test",
 "jcr:primaryType": "nt:unstructured",
 "file": {
 "jcr:isCheckedOut": true,
 "jcr:versionHistory": "cdfeb02d-aeed-4311-be9e-d6a402566f42",
 "something": "else",
 "jcr:baseVersion": "e816ccb9-df4a-4ad1-9772-be2079982131",
 "jcr:mixinTypes": [
 "sakai:propertiesmix"
 ],
 "jcr:primaryType": "nt:resource",
 ":jcr:data": 16062,
 "jcr:uuid": "24030fa7-6bf2-49ec-b52b-1dfbf1a6a5d9",
 "jcr:mimeType": "application/octet-stream",
 "jcr:lastModified": "Fri Jun 05 2009 13:06:36 GMT+0100",
 "jcr:predecessors": [
 "e816ccb9-df4a-4ad1-9772-be2079982131"
 ]
 }
}




Inverted Index Scalability

4 06 2009

Search mechanism based on inverted indexes work, because the number of terms in the search space is considerably smaller than the search space itself, otherwise, why would you bother to invert. So most search engines work well on languages. The human brain is quite capable of learning a controlled vocabulary that enables it to communicate concepts with other humans. Like a search engine it would suffer learning a single token to every piece of knowledge that ever existed. Communication would be highly efficient, but rather boring; single words followed by long and contemplative periods of thought.

As we tag content with identifiers that have no meaning other than to represent some metadata about those terms we risk expending the vocabulary by which we communicate that knowledge to an extent where it becomes incommunicable. So a search index, that indexes metadata to enable precise re-location and search  will eventually fail as the controlled vocabulary of the terms within the inverted index grows beyond the search space itself. I am certain  without careful consideration the index-able content and metadata in a Jackrabbit based system, we stress the scalability of the Lucene based search index, billions of properties all with unique terms ?





Abstracting Sakai Urls

4 06 2009

For years the Sakai community has suffered with unspeakable URL’s. To give educators URL’s that they can only communicate in text, and are unable to spread by word of mouth must be a barrier to teaching. As we rewrite Sakai based on Apache Sling I am determined to ensure that a tutor at Cambridge can say, in a busy street, at lunch time to a confused student; “go to camtools quantumwell2008 and look for the lecture notes” with some confidence that the student will enter camtools.cam.ac.uk/quantumwell2008 and find what they need.

Competition for simple urls generates challenges. A land grab to claim base level urls will create a huge number of unique root URL’s. However we might like to impose a content model on the participants, the URL’s are their space, and any abstraction purely for technical purposes has to be avoided. Why should I ask the tutor to put his content in /2008/physics/tutorials/group54/week12 which is only marginally better than earlier versions of sakai with an unspeakable /portal/site/edf435-edf-a237-bdc12 or something close.

To make the URL’s of Sakai usable we have to accept in places there will be millions of potential elements at level in URL space. This leaves us with a technical problem. Put even 1000′s of items into a folder in JCR and write performance and concurrency suffer. Experiments show that < 1000 child nodes delivers acceptable performance. This is not unique to JCR, as all hierarchical filing systems exhibit performance limitations with numbers of items per folder at some level, if the folder optimization has been used to speed underlying access. I am not advocating patching Jackrabbit to make it capable of storing millions of items in a folder since doing that will certainly expose other problems.

So we have accepted that we need to support millions of siblings in url space. To achieve this in JCR and maintain scalability we are re-writing the URL space though a filter that hashes the path so that no folder contains more than 255 items. With 4 levels of hashing  the theoretical limit becomes somewhere around 4E9 items, assuming the underlying infrastructure can cope with that many items.

After a great deal of experimentation and reading of the Sling code base the implementation solution was embarrassingly simple. Write a servlet that rewrites the URL, resolves the real JCR Sling Resource and re-dispatch that resolved resource back into Sling. In all about 20 lines of code. The only hard part was to patch Sling to allow it to bind non-existent URL’s to Servlets. (see http://codereview.appspot.com/67146) So we now have a mechanism were we can designate nodes within the content tree to be abstracted in this way creating highly scalable stores of information… simply by setting a property on the node.








Follow

Get every new post delivered to your Inbox.

Join 114 other followers