Configuring Jackrabbit

31 08 2006

There are some interesting things that happen when you try and configure Jackrabbit. Firstly, the configuration appears to be stored in the repository home, so, once created, changing the startup configuration probably wont change the repository…. except that some settings appear to get through. For instance, the driver class and the database url appear to get through, but the username and password dont. Im not certain what exactly is happening.

The other strangeness, is that the standard SinpleDbPersistanceManager uses a DirectManager.getConnection(url,usename,password) to get the connection to the database. I haven’t found out (yet) if there is one persistence manager per JVM, or its one per session.

If its one per JVM, its probably Ok, one per session, is not so good, as this will require pooling or opening and closing of sessions. For the moment, I’ve implemented a version (almost exact copy) of the jackrabbit SimpleDbPersistanceManager that looks the datasource up from Spring, and uses that.

The weird bit about JackRabbit in a spring environment, is that is uses its own IoC quite separate from Spring. Im not entirely certain how this is all going to work in a cluster, but it looks like it might require some shared disk to work. I hope not, the documentation appears to say its Cluster friendly, but I can see some Lucene segments on local disk which doesn’t look cluster like.

JackRabbit CHS service working!

31 08 2006

Its working… unbelievable 🙂 Hats of to Apache Jackrabbit as it looks stable and easy to work with. Also the Existing Sakai CHS code is quite solid as I’ve mostly done cut and past and only found one or 2 places where abstract dependencies cross the API boundaries.

Next task is to see if the original Sakai DAV servlet will work with the new CHS. Resource tool does, but I expect there is tighter binding in that area.


30 08 2006

The penny dropped, I went to Inverness to meet the Tetra/Boddington crowd from UHI and the similarities and differences between Tetra/Boddington and Sakai became clear. They are both collaborative environments for education and research, but Boddington has a hierarchical organization where Sakai has a flat organizational structure. If you add a Hierarchy super structure to organize Sakai Sites and Entities and a mechanism of viewing that structure then Sakai starts to look a lot like Boddington.

Boddingtons architecture is interesting. Being from the mid 1990’s, the inter webapp context capabilities of Tomcat and other containers was not fully understood, so Boddington deploys in a single War and single classloader. This appears to have 2 results, development is centralized and the separation of concerns within the architecture is blurred. It started life before JSP’s and XML were around, so it has an interesting templating language which is concise with a a short request stack, but not that sophisticated. There is nothing wrong with this, but it does add constraints to the development process.

If we add an organizational hierarchy to Sakai so that Tetra can use parts of the Sakai framework to organize its tools and sites then most of the knowledge and IP developed since the mid 1990’s can be ported to a more modern services oriented architecture, eventually are real SOA’s appear (eg Apache Tuscany), then Tetra can start to fulfill some of the JISC ELF dreams of a loosely coupled SOA Web services architecture.


30 08 2006

At first I thought…. not another Wiki…. but then this looks interesting. TidlyWiki is a wiki in an HTML page with the entire wiki engine written in JavaScript. Perfectly possible, JavaScript has good Regex support and all the Wiki engine needs to do is re-write DIV’s with the rendered content.

Bill Steel posted onto the Sakai-Dev list about embedding this, which makes a lot of sense. An HTML page in Content Hosting containing the markup and the engine in a static JS file. There are some drawbacks… the engine is 200K of Javascript…. all the wiki is in a single page…. there isn’t a mechanism to update the page, other than by a PUT back to the server… but all of those could be overcome. If there was a small amount of backend support, to do things like diffs then for notebook operations this makes sense. Obviously its a really online environment, I’m not certain how you would do a huge wiki, or render the wiki to PDF/RTF in Javascript, or provide full text search…. but an inverted index in Javascript is perfectly possible.

Content Hosting JSR-170 with Jackrabbit Continued….

30 08 2006

So the CHS based on Jackrabbit is largely written and working. It can be found in Sakai contrib ( Surprisingly it works! After a few hiccups with node types and getting the Sakai representation of additional properties into the JCR node structure, it all works. I think I’ve only had to fix about 3 bugs so far, mainly where /’s were not put into Id’s.

It does need more testing and there are some startup issues to be sorted, as Im only running against HSQLDB at the moment. I’d also like to see if we cant get the Jackrabbit DAV Servlet working as Sakai DAV…. there are some concerns about compliance with this.

Search Bugs

30 08 2006

There has been a slow stream of bugs coming from Cape Town with search. It shows that its being used in anger which is good, but it also shows that there want enough testing in QA and my own approach to testing wasnt 100%. But thats not untypical of a developer who doesnt think up every type of user input.

All the bugs are logged in Jira and most of them have been trivial to fix, a couple of lines of code.

Big Packets in MYSQL

4 08 2006

Once again we discover that putting lots of data into the DB as blobs is not the best thing to do. In this case is the search segments with MySQL. If you chuck 4 – 5 in blobs into MySQL Innodb tables, then everything runs a little slow on that table. You cant perform seeks on the blobs (not real ones) and when you try and retrieve them you find a practical file size limit somewhere below 2G as you might expect on a normal 32bit file system, not because MySQL cant cope with the data, but because the connectors and things on the network timeout.

When I wrote he clustered search engine for sakai, one requirement was that if would work out of the box in a cluster, so we put everything into the DB. Thats find for upto 20-30K document under light load, but put a few more in and a bit of load and the above problems start to expose themselves.

Fortunately the storage architecture in the search engine in sakai is not dissimilar to parts of Jackrabbit. There is a persistence manager, and a virtual filesystem. Obviously it has different aims and if far simple, but it should be relatively easy to make it use a shared file system space rather than the DB, as it only has one class where all the storage comes together (thats one small class).

Anyway, I’ll see what happens.

Content Hosting API in 170

2 08 2006

Almost there, One thing that I dont know if its a good idea or not, but the Groups attributes associated with entities inside content hosting are stored in content hosting. So if you want to let Content Hosting manage its security underneath, by talking to AuthZGroups, then you have to stop it asking itself.

‘Is it ok to look at this node to find out if I can look at this node to find out if I can look at this node to find out if I can look at this node to find out if I can look at this node….’

Fortunately there is a thing called a Security Advisor which can be used to remove security constraints below a certain point in the request cycle. Its also a really good way to make the whole thing go much faster and remove the load on the AuthZGroups resolution.

More on Patents

2 08 2006

Turns out Blackboard lawyers have stated they have no intention of going after opensource since they are not commercial (oS that is). But then they have commercial partners.

Interestingly on one of the claims Sakai falls wide, since it doesn’t have static sets of roles. Rather they are fluid through the whole system. Moodle does have more static sets but not for long. And once we all get Shibbed, that part of Blackboards patent is really old hat.

But whatever, there is lots of prior art about way back to 1960, and even Blackboards original system came out of Higher Ed….. unless they believe they have a patent on general Internet Groupware ?

Blackboard Patent

1 08 2006

It doesn’t come as any surprise that Blackboard has a number of patents. Some of the them granted in places like Australia, NZ, and US and some pending in the EU and other places.

They have also declared their intent to pursue these patents by filling against Desire2Learn, so they are going to attack their competitors, soft targets in supportive jurisdictions first. Lets hope that in countries outside the US, these patents will be seen as software patents and not be granted. I think that in the UK at least, most patents have to show that there is no prior art and that the patent is not a pure implementation patent of a pre-existing invention.

Some European Universities go back to the 1400’s or earlier, and some were doing distance learning back in the 1960’s, so perhaps that counts as prior art? There are certainly a number examples of Online Course management systems (CMS) in the US prior to the claims, and the roles of Administrator, TA, Instructor and Student have existed in most Universities for a few decades. I can only assume that those who granted these Patents did not see any of this as relevant, granting and leaving the arguments to court?

If granted, will Blackboard file against Open Source communities ? Will they file against their customers who happen to use or develop Open Source ? If they are granted their patents in many countries and they aggressively execute them, we may see the end of software development for educational purposes in Higher Education, except,Blackboard building blocks…. but then lets not worry to much, Blackboard has been known to make IPR claims on Building Blocks developed by their customers…. perhaps they should do all the “thought leading research” for us in the future, anyway, who cares about innovation in higher ed, clearly not the patent offices or anyone who thinks that pure Software Patents are defensible.

The problem for Blackboard may be that most Open source communities don’t have any commercial model, and don’t have a single point of focus, so Blackboard might have to file against each University or individual developer in turn.

And what will happen to all research environments, or Web based communities, are they also in Blackboards sights?

Dont get me wrong, I’m not against Patents, just ones that try an claim innovation and invention where implementation is closer to the truth.