So much for launch dates.

16 05 2007

Caret have been helping launch another Darwin website, this time Darwin Correspondence. And this time we thought we had the launch under control. The media were expected to publish tomorrow morning, and the site would be up and ready, but we were wrong. The BBC decided they would publish a story a little earlier than expected.

Its a good way to focus the mind, fixing bugs, deploying patches and changing DNS and hostnames with about 200 users on a site while your doing it. If your lucky the site will still be up in the morning.

Technology wise, is a hybrid. A Lucene search engine that indexes using maven 2 plug ins about 100MB of reasonable compact XML into a few 1000 pages of Darwin’s Letter. The search being available on multiple indexes delivering static HTML. Then the whole thing wrapped in a Joomla CMS for the editorial content. It currently takes an Apple Xserve using Continuum about 25 minutes to rebuild and deploy the index but its almost automatic. It also does some Zoomify operations on images to make them available at high res over the web.

And some more news: 

My MacBook is contributing to global warming. (Jackrabbit Cluster Testing Session)

14 05 2007

Only by a few watts, testing Jackrabbit in a cluster. I have been testing a 2 node cluster of Sakai with JCRService with 2 data sets. Once it lots of small files, 9,215 files in 178Mb, some indexable, and the other 100 or so files of mixed type for 315MB. The single node cluster (ie no cluster) works seamlessly, uploading 4 files/second for the smaller files and growing the heap by about 100M every 2 minutes, which gets GC’s back down to a baseline of about 160M, probably about 1M per request. Looking at the CPU time, most time appears to be spent syncing the data down to the DB, which is good, indicating that the JCR is running as fast as the DB can accept data.

I am running with standard Jackrabbit 1.3 persistence managers and cache journal with everything going to the DB. (well standard apart from a bug I fixed in the journal code) Each node has its own repository home.

Of the larger dataset, I have uploaded this several times and copied it within the JCR using webdav several times so there is at least 1G of content.

This isn’t exactly a full test, but its a start and indicates that Jackrabbit 1.3, using pure DB persistence for Version, Bodies and nodes (using the SimpleDbPersistance) works well in a cluster.

I also tried the File Journal, but found that there were some hanging issues under high load, indicating that transactional stability is important for the journal and the distributed mutex mechanism on the file journal may have concurrency issues. I haven’t verified this, but looking at the behavior in JProfile points to this.

As I said, this test is with the standard classes, that manage their own connections the DB, I am intending to look into using the sakai datasource to open connections…. however some work may need to be done since Jackrabbit appears to manage the connections actively, rather than allocating 1 connection per request thread.

mvn eclipse:eclipse does some cool things

13 05 2007

Not having tried this much before I wasn’t certain what it did exactly, but once you have fully populated pom’s in you project, a quick mvn will sync all the .settings, .project and .classpath files that eclipse uses suitable for use in eclipse. The really great bits are, 1) it knows about transitive deps and adds those where necessary 2) it know about source code and if you do a mvn -DdownloadSources=true eclipse:eclipse it will download the sources, wire in the javadoc etc. It also knows about projects in the same package, referencing those directly.

Potentially, if we adopted the layout structure that the plugin builds, we wouldn’t need to have any .classpath, .project or .settings files in SVN, as all developers would to would be a quick mvn -DdownloadSources=true eclipse:eclipse to generate all the eclipse specific files from the pom.xml’s.

This would be a huge benefit reducing the work syncing the eclipse settings and allow those who want to use Idea or another supported IDE to do the equivalent.

Maven1 Maven2 synchronization

11 05 2007

Maven 2 is now building in trunk, but the other problem that I’m trying to fix the jars that are deployed and where they are deployed to, there are 2 a number bash commands Im using to automate this process, recorded here so I dont forget them.

To do the maven1 and maven2 builds, I am deploying to dumy empty deployment targets, tomcatm1deploy and tomcatm2deploy

mvn -o -Dmaven.test.skip=true -Dmaven.tomcat.home=/Users/ieb/Caret/sakai22/tomcatm2deploy/ clean install sakai:deploy
#maven 1
maven -Dmaven.test.skip=true -Dmaven.tomcat.home=/Users/ieb/Caret/sakai22/tomcatm1deploy/ cln bld dpl

Then to compare the deployed unpacked components and packed wars

find tomcatm1deploy -print | sed s/tomcatm1deploy/deploy/ | grep -v WEB-INF/tld > m1deploy
find tomcatm2deploy -print | sed s/tomcatm2deploy/deploy/ | sed s/M2/dev/ | grep -v maven | grep -v web.xml > m2deploy
diff m1deploy m2deploy
This gives a list of differences that can be inspected manually to find the differences in deployment profile.

Next step it to unpack all the wars and compare the deployed jars within the wars. This is a little more complicated, find the wars into a file, sort the list of wars in each deployment and then iterate through listing the contents of the wars for jars only, comparing the final result.

#find the wars and store them
find tomcatm1deploy -name '*.war' | sort > m1wars
find tomcatm2deploy -name '*.war' | sort > m2wars
# unpack
for i in `cat m1wars `; do echo START $i; jar tvf $i | grep jar | cut -d'/' -f2- ; echo END $i; done > m1deploywars
for i in `cat m2wars `; do echo START $i; jar tvf $i | grep jar | cut -d'/' -f2- ; echo END $i; done | sed s/M2/dev/ > m2deploywars
# compare
diff m1deploywars m2deploywars
Finally you have to dig though the poms and porject.xml files to make the deployments the same. This can mean overriding the transitive dependencies in jars.