Configuring Jackrabbit

31 08 2006

There are some interesting things that happen when you try and configure Jackrabbit. Firstly, the configuration appears to be stored in the repository home, so, once created, changing the startup configuration probably wont change the repository…. except that some settings appear to get through. For instance, the driver class and the database url appear to get through, but the username and password dont. Im not certain what exactly is happening.

The other strangeness, is that the standard SinpleDbPersistanceManager uses a DirectManager.getConnection(url,usename,password) to get the connection to the database. I haven’t found out (yet) if there is one persistence manager per JVM, or its one per session.

If its one per JVM, its probably Ok, one per session, is not so good, as this will require pooling or opening and closing of sessions. For the moment, I’ve implemented a version (almost exact copy) of the jackrabbit SimpleDbPersistanceManager that looks the datasource up from Spring, and uses that.

The weird bit about JackRabbit in a spring environment, is that is uses its own IoC quite separate from Spring. Im not entirely certain how this is all going to work in a cluster, but it looks like it might require some shared disk to work. I hope not, the documentation appears to say its Cluster friendly, but I can see some Lucene segments on local disk which doesn’t look cluster like.





JackRabbit CHS service working!

31 08 2006

Its working… unbelievable 🙂 Hats of to Apache Jackrabbit as it looks stable and easy to work with. Also the Existing Sakai CHS code is quite solid as I’ve mostly done cut and past and only found one or 2 places where abstract dependencies cross the API boundaries.

Next task is to see if the original Sakai DAV servlet will work with the new CHS. Resource tool does, but I expect there is tighter binding in that area.





Boddington/Tetra/Hierachy

30 08 2006

The penny dropped, I went to Inverness to meet the Tetra/Boddington crowd from UHI and the similarities and differences between Tetra/Boddington and Sakai became clear. They are both collaborative environments for education and research, but Boddington has a hierarchical organization where Sakai has a flat organizational structure. If you add a Hierarchy super structure to organize Sakai Sites and Entities and a mechanism of viewing that structure then Sakai starts to look a lot like Boddington.

Boddingtons architecture is interesting. Being from the mid 1990’s, the inter webapp context capabilities of Tomcat and other containers was not fully understood, so Boddington deploys in a single War and single classloader. This appears to have 2 results, development is centralized and the separation of concerns within the architecture is blurred. It started life before JSP’s and XML were around, so it has an interesting templating language which is concise with a a short request stack, but not that sophisticated. There is nothing wrong with this, but it does add constraints to the development process.

If we add an organizational hierarchy to Sakai so that Tetra can use parts of the Sakai framework to organize its tools and sites then most of the knowledge and IP developed since the mid 1990’s can be ported to a more modern services oriented architecture, eventually are real SOA’s appear (eg Apache Tuscany), then Tetra can start to fulfill some of the JISC ELF dreams of a loosely coupled SOA Web services architecture.





TidlyWiki

30 08 2006

At first I thought…. not another Wiki…. but then this looks interesting. TidlyWiki is a wiki in an HTML page with the entire wiki engine written in JavaScript. Perfectly possible, JavaScript has good Regex support and all the Wiki engine needs to do is re-write DIV’s with the rendered content.

Bill Steel posted onto the Sakai-Dev list about embedding this, which makes a lot of sense. An HTML page in Content Hosting containing the markup and the engine in a static JS file. There are some drawbacks… the engine is 200K of Javascript…. all the wiki is in a single page…. there isn’t a mechanism to update the page, other than by a PUT back to the server… but all of those could be overcome. If there was a small amount of backend support, to do things like diffs then for notebook operations this makes sense. Obviously its a really online environment, I’m not certain how you would do a huge wiki, or render the wiki to PDF/RTF in Javascript, or provide full text search…. but an inverted index in Javascript is perfectly possible.





Content Hosting JSR-170 with Jackrabbit Continued….

30 08 2006

So the CHS based on Jackrabbit is largely written and working. It can be found in Sakai contrib (https://source.sakaiproject.org/contrib/tfd/trunk). Surprisingly it works! After a few hiccups with node types and getting the Sakai representation of additional properties into the JCR node structure, it all works. I think I’ve only had to fix about 3 bugs so far, mainly where /’s were not put into Id’s.

It does need more testing and there are some startup issues to be sorted, as Im only running against HSQLDB at the moment. I’d also like to see if we cant get the Jackrabbit DAV Servlet working as Sakai DAV…. there are some concerns about compliance with this.





Search Bugs

30 08 2006

There has been a slow stream of bugs coming from Cape Town with search. It shows that its being used in anger which is good, but it also shows that there want enough testing in QA and my own approach to testing wasnt 100%. But thats not untypical of a developer who doesnt think up every type of user input.

All the bugs are logged in Jira and most of them have been trivial to fix, a couple of lines of code.





Big Packets in MYSQL

4 08 2006

Once again we discover that putting lots of data into the DB as blobs is not the best thing to do. In this case is the search segments with MySQL. If you chuck 4 – 5 in blobs into MySQL Innodb tables, then everything runs a little slow on that table. You cant perform seeks on the blobs (not real ones) and when you try and retrieve them you find a practical file size limit somewhere below 2G as you might expect on a normal 32bit file system, not because MySQL cant cope with the data, but because the connectors and things on the network timeout.

When I wrote he clustered search engine for sakai, one requirement was that if would work out of the box in a cluster, so we put everything into the DB. Thats find for upto 20-30K document under light load, but put a few more in and a bit of load and the above problems start to expose themselves.

Fortunately the storage architecture in the search engine in sakai is not dissimilar to parts of Jackrabbit. There is a persistence manager, and a virtual filesystem. Obviously it has different aims and if far simple, but it should be relatively easy to make it use a shared file system space rather than the DB, as it only has one class where all the storage comes together (thats one small class).

Anyway, I’ll see what happens.