Development and Testing cluster up

29 08 2007

I used to do Sakai cluster testing with multiple tomcat instances on the same box, and share the ports. This is Ok, but when 5 developers try to do the same thing on the same box you soon run out of meaningful port numbers, and a careless killall can bin a few hours worth of testing.

Hence the current interest in virtualisation in this blog. Well the cluster is now up and running, after getting rid of some problems with inconsistent arp responses. Each root partition contains java maven and tomcat and not much else. The source is built on a seperate server in a shared NFS mounted home directory and then with a few simple ssh  scripts the cluster nodes are updated automatically from that central location. This makes it much easier to restart the cluster and redeploy or reconfigure the cluster. Shared filesystem for content and search is provided by a NFS file system, and the MySQL instance runs on the same central box.
This is probably nothing like a production environment, but the aims are slightly different as the update cycle needs the same as for a single node developer environment. So far its working well, virtualisation is providing good separation between the cluster nodes, making good use of the resources and its easy to manage the root images on LVM with scripts. Recreating a new node now only takes a few minutes.

I havent tried yet, but I suspect the kernel and / filesystem required to bring java / tomcat up is very small.





Xen Virtual Sakai Servers with an NFS Home

28 08 2007

The aim is to create A number of Xen virtual servers with as much configuration in place to bring up Sakai, as per the previous posts, I am using an old Sarge box, with a Dom0 Xen installation and a bridged Xen network. Each client will hopefully DHCP its interface up, but I could build clients with fixed  IP’s.

This is a development installation where we trust the users of the Dom0 machine so I am going to sync the users information with Dom0, and NFS mount the Dom0 home directories. To make this work in the in the virtual server it needs the correct modules.

The virtual servers are built on LVM as LVM partitions, using xen-tools. I tried doing manual builds, but it takes forever to mount configure etc what is all scripted in xen-create-image script.

So the xen-create-image command populates a 4GB root filesystem with all the pre-requisites for the OS and mounting nfs home directories. I thin run a custom role that appends an NFS mount to the end of /etc/fstab, installs the JDK, maven2 and tomcat from tarballs and does some local configuration of the tomcat installation.

Since the role script to do all this is part of the xen-tools configuration, all I have to do to create a new virtiual machine for running a Sakai cluster node is to do xen-create-image –hostname sakai1 –memory 1024M –swap 1024M –role sakaisetup

wait about 3 minutes as Debian Etch is installed and then

xm create sakai1.cfg

to boot the new virtual server.

xm console sakai1

Gives me a console window

xm shutdown sakai1

powers down the virtual machine, and

xen-delete-image sakai1

deletes the image from the LVM

There is a GUI manager, but this is all so simple, why bother…… and there is VMWare….. but Xen doesn’t cost, other than time and is available via apt-get.





Xen installation on Debian, commands to remember

24 08 2007

Just a note to myself on a Xen installation on Sarge. Having managed to boot the machine into a non responsive state several times, here is how not to do it.

Apt will get the packages

If using LVM make certain  of LVM2
If using udev you will need initramfs installed first to make the lvm2 parts work
apt-get install lvm2
apt-get install iproute
apt-get install bridge-utils
apt-get install xen-hypervisor-3.0.4-1-i386-pae

PAE will allow you run in more than 4G

apt-get install linux-image-2.6.18-5-xen-686
apt-get install bridge-utils iproute sysfsutils libc6-xen xen-tools
this might replace apache and a whole bunch of other things

Fix the bridge setup and xen network

Reboot.

Do the rest of the xen setup

Problems:

The 2.18 kernels do not provide LVM 1 support in the initrd boot images by default, and the boot will hang with a “waiting for route filesystem” untill busybox appears. This appears only to impact udev kernels with a initramfs built boot image. These boot images are gziped cpio unlike mkinitrd which are real filesystems.
Solution: Upgrade to LVM2 and do an update-initramfs
on the kernel and then check that the initrd image has the correct contents.

You should see a lvm2 script in local-top and vgchange in /sbin, if not check the initramfs-tools areas (/etc/initramfs-tools and /usr/share/initramfs-tools) to make certain that the lvm2 script is there, then check for an copy_exec in the function hooks script in /usr/share….

To unpack a initramfs image do

mkdir unpack
cd unpack
gunzip -c -9 /boot/test-initrd.img-2.6.18-1-686 |     cpio -i -d -H newc –no-absolute-filenames
GotYas:

If you boot up and bring Xend up without specifying routing parameters and ip addresses for the box, you will end up with no addressable IP on the new physical ethernet device. So if {network-setup network-bridge} is in the xend-config.sxp, peth0 will hijack eth0 and expose no IP address and not respond to ARP. Fine if your on console, not so good when you are 20 miles away with no BIOS access.

Solution:

Fix the bridging setup before you reboot, or have a 2nd Ethernet interface to break with impunity.





Ajax and UTF8

21 08 2007

I have suddenly found out that the faith that I had placed in the javascript escape(), encodeURI() and encodeURIComponent() for encoding correctly were misplaced. Here is the problem, a traditional form submits UTF-8 perfectly and all works. An AJAX form only works if there are no UTF-8 characters. And this only happens with certain UTF-8 characters that are high up in the range. It turns out that %20%u5FFD encoding produced by the above doesnt work when submitted as a application/x-www-form-urlencoded to Tomcat, even with charset=utf-8 or Character-Encoding: UTF-8; . The encoding has to be +%5F%FD to make it work. If you bring up tcpdump and look at the raw tcp packets you will see that Firefox uses the latter for direct posts.

Unfortunately there is no javascript encoder to do this :(, but its not that hard.

var result = encodeURIComponent(formVar);
result = result.replace(/%20/g,"+");
for ( var p = result.indexOf("%u"); p != -1; p = result.indexOf("%u")  ) {
   var code = result.substr(p,6);
   var rep = '%' + code.substr(2,2) + '%' + code.substr(4,2);
   result = result.replace(code,rep);
}
var p = -1;
for ( p = result.indexOf("%",p+1); p != -1; p = result.indexOf("%",p+1)  ) {
   var code = result.substr(p,3);
   var rep = code.toUpperCase();
   result = result.replace(code,rep);
}
return result;

Its not exactly the perfect way, but it works.





Caching

17 08 2007

Its a while since I have had a chance to look at caching. Most of us reach for HashMap or something more concurrent, but there are plenty of cache implementations and the use cases are relatively well understood. Put an object, get it back, and clear it. JCACHE JSR-109 never really did appear to produce a standard, perhapse it was one of those standards that was just too close to the Map concept to be interesting for those involved, however several of the cache providers look like they support it. But in Sakai we have already been using a cache, for some time. We have a sophisticated internal cache in the form of the memory service. This works in a cluster and serveral years back was state of the art. Since then the Concurrent work has deliverd really sensible concurrent hash maps, Commons collections has LRU hash maps and collections, and the higher level caching providers have moved on.

When I looked a Jackrabbit and its cluster implementation I was struck by how its Cache was both transaction aware via a XA/JTA binding and how it propagated those transactions over the cluster. Others that appear to adress these issues include OSCache and JBossTree. There are plenty of commercial offerings in this area that we cant look at, and some that just dont have a friendly license. Today I went back to look at ehcache which we have scattered all over sakai. I noticed that it also has moved on since version 1.1. Since 1.2 (current version is 1.3) is supports cluster wide caches with replication either by copy or invildation over the whole cluster. The Architecture looks pluggable.

All of this is all very interesting, but unless we all use the same central cache or memory service, there is no chance that Sakai’s memory footprint will be actively managed in production, and its going to be an uphill struggle transferring the load away from the database and into the app servers without impacting the speedup. There is no point in having a wonderfully powerful, parallel, scalable application if the algorithm is 10  times slower than the serial version.





Mobile Search Contexts

6 08 2007

Looks like the world outside Sakai is continuing to focus on search that has context. Taptu provides search results where the content is the sort of things that you actually want on a mobile phone…. not just the entire web recoded for a mobile phone. I understand that they are using Lucene and Velocity, and from the looks of it are buying lots of disk for the indexes. I wouldn’t expect a Sakai instance to need a 750GB index, but it looks like Taptu are already up at 10x that. See http://www.taptu.com/blog/ for the blog … which also looks obsessed and well informed on/with mobile devices.