23 06 2006

XPDL is a large schema (1000+ lines), I wanted to create an entity model from the schema, but I also wanted that entity model to make sense. So I looked at HyperJaxB, JaxB, Castor and number of other technologies. These are all great technologies, except for 2 things. With a complex real life schema, all tend to represent the Object model like a dom. Its not that easy to make the object model understand the xsd. For instance means a list of attributes, not an new object that implements a container that contains attributes. My second requirement is that the generated model should persist in Hibernate. This is where the JaxB like mapping technologies fall over. The bean model that is created is so complex that it looks completely mad when mapped into a database. Its almost impossible to do anything with.

So in the spirit of change, I threw all the JaxB code out. Then I remembered that to get a good entity model with hibernate, with an efficient database schema, the most effective way is to edit the hbm file by hand, and work from there. But XSD is XML and so is the HBM file. A quick transform later with some annotation hints in the XSD and we have converted 1000 lines of XSD into a reasonable hbm entity model. Unlike the HyperJaxb model which contained 344 entities, this contains 36 entities.

All you do, is read your xsd, annotate what you want to be sets, and specify entities to ignore, data types and lengths. Then apply the transform and you get a hbm. Plug that into Hibernate Eclipse Synchronizer and you get the Model Objects and the DAO’s and away you go.

The only thing this doesn’t do yet, is marshall the Java model into XML and out of XML, but since I want mixed namespace, complex XML I will probably hand code the SAX. At least with 36 beans, this is managable. I would have needed JaxB for 344 beans!

Attached are the current version of the transform and an example annotated schema, this is not the complete schema since that might just be confusing. You can find the xsd here and the xsl here

IronGrid, P6Spy, IronTrack, IronEye

20 06 2006

Picking up an old Hibernate book, flicking to the back, I noticed IronTrack. I remember this from back in 2003, but never used it. When you start searching for IronGrid in Google, it looks like they don’t exist any more, a pity since it looked like a good way of seeing what a DB was doing. There are commercial products that do the same, but this was supposed to be OpenSource. However their is talk of a Trial License, which a new model to me, open source, but if you want the binaries, you need a license, strange. Perhaps thats why they don’t exist any more.

However, there is a Sourceforge project, with 2 developers, no released files and no activity and no stated license. Digging a little deeper, the code is in CVS. It appears to be Apache 1.1 licensed, and the Licensing code has been commented out. For posterity, I’ve built the jars and uploaded binaries and source. You can run the binary with java -jar irontracksql.jar or java -jar ironeyesql.jar

Workflow models

17 06 2006

I had a complete and working workflow engine, that had most of the concepts present in XBPLM. Then I realized that it was entity focused and not service focused. To make things work really well in Sakai or any other SOA, they need to minimize the size of the entity model that is pulled into memory and focus much more on delivering a result. The workflow engine was teetering on becoming and EJB monster without the EJB container.

So I threw it all away,

Having built a working prototype the next version is always better understood. With the help of Hibernate, Sychronizer and Spring the new model and service is all up, in less than 8 hours… and its service oriented with a better structured design…. what else can I go an delete ?

One downside to Synchronizer, you have to do a huge amount of refactoring if you want to target Java 1.5…. but seeing all the type safety in 1.5 makes you want to move.

Workflow Implementation

14 06 2006

I’ve resurrected the Workflow implementation that was started at the end of the last JISC VRE project meeting. Wonderful how meetings give you time to think, but leaving it alone for such a long time has done two things. Help me forget what I was doing, and solve the same problems in a different but better way.

The workflow engine is a light weight, in memory engine that has built in transactions and rolls back on failure of any of the activities. Hence in a discrete section of flow it either reaches a steady state or it rolls all the way back to the starting point. The whole operation is contained within a thread and storage transaction manager, this means that the execution engine can explore all the pathways, evaluating or executing all the tasks. Unlike a lamda-calculus approach (XLang, BPML) this avoids the pre determination of an execution route through the discrete segment of workflow. In fact, its rather simple and mirrors some of the execution processes of a multi-threaded CPU. It has an execution stack that winds up and down as the flow pointer traverses the flow pathways. If all goes wrong before the final commit, the storage transaction is rolled back, any XA compliant transaction mangers that took part in the workflow segment are notified and the flow stack, and flow modifications are thrown away leaving the workflow state in the same state as it was before the operation started.

As with other transaction managers, the flow state is attached to the thread and only committed once all is ready to commit. If you think of this in DB terms, this would be characterized as READ_COMMITED. It should also be possible to create SERIALIZABLE mode by synchronizing blocks of the flow.

The one area to watch in this approach is the Business Process Objects (BPO’s) that hand of responsibility to no XA compliant components, must ensure that, either the transaction flow passes synchronously into the component, only returning ready for commit when the component is able to do so, or when the component is able to take full responsibility. State modifications that go outside the scope of the transaction cannot be rolled back. An I don’t fancy implementing a ‘whoaa there, give me my data back, I need to make a change’ protocol.

IMS-LD and Workflow

12 06 2006

There has been a thread on sakai-pedagogy on Learning Design sparked by Mark Norton. This discussion triggered a long held thought, that IMS-LD is a specialized form of workflow that could be implemented and enacted in a generic workflow environment. I dont know how true this is, or if there is a sufficiently complete mapping to make this possible, but experimentation will help us discover if this is the case.

I am in the process of writing a lightweight workflow engine that does not specify or bind to anything that is available as part of the Sakai framework. The intention being, to provide those services using components from the Sakai framework. This workflow service will not use any heavy weight components (eg EJB’s) and so will not require JBoss or another EJB container.

Currently I think that it will be possible to instance a process by loading an XML definition, potentially embedded inside an IMS-CP, that will manipulate and control items on the Sakai Entity bus. (ie content, messages, tools etc). Once instances, a process design, becomes an instance of a Runtime Entity and manages its own state, storing what it needs to inside its own state persistence service. The workflow service will understand roles in the Sakai context so that it can distinguish between users in different roles. It will have the concept of time, allowing the process flow of a Runtime Entity to pause and restart at the command of the user or by design of the process.

The closest workflow standard that has these concepts is the Workflow Management Coalitions XPDL standard, which we *may* use as the native expression of the workflow definition. Whichever definition format is used, there will be java code to load those definitions into the internal object model of the process.

One of the tasks that needs consideration, is the transformation and loading of IMS-LD into an XPDL like workflow service. Not knowing enough about IMS-LD at present, I don’t know exactly where all the gaps are.

At this point you might be thinking, why havent I mentioned IMS-SS, SCORM2004 or BEPL. IMS-SS and SCORM2004 which contains IMS-SS are a more specialized form of workflow which potentially constrains the learner further. BPEL is a machine-machine workflow, suitable to implementing black box functionality, but not suitable for interacting with a Human. BPEL is ideal for specifying the activities that a XPDL based workfloe might control.

JCR JCP-170 JackRabbit and ContentHostingService

12 06 2006

As I start to look more at ContentHostingService and the JCR API it looks like the address the same problem, with different positions in the Content stack. Sakai Content Hosting Serivce API has been bound to by many of the tools in Sakai. Changing this API would be a significant investment with widespread impact. JCR API is a standard, and although complete appears to be missing some features that the Sakai Content Hosting Service uses. My interpretation, perhapse incorrect, is that JCR does not support the fine grained access control that exists within Sakai Content Hosting Service. There is however a mechanism for injecting the concept of a user into the JCR, but that mechanism feels like representing a user of the JCR rather than a user of Sakai. This, perhapse is a philisopical standpoint.

In some RDBMS applications, users are real users of the database, with permissions controlled within the database. In many web applications, the user becomes a concept of the application, and the connection to the database represents a role the application is playing with respect to the database.

One approach with JCR is to say that applications connection to the JCR represents a role within the JCR, and the application user is simply a user of the application. If that is the case, then a Content Hosting Service stack in Sakai might look like this.

Tools and Other services connect to

a Content Hosting Service which is implementated as

a thin Content Hosting Service shim implementation that talks to

JackRabbit JCR.

Concepts that are core to Sakai are implemented in the shim and concepts that are core to the JCR are implemented by the JCR. Care will have to be taken to ensure that there are no tight bindings between the Shim and the JCR. If we do this, deployers could choose to any JCR that had sufficient features.

Having had a quick look at the JCR and Jackrabbit implementation, I feel that this approach might be significantly less work than implementing a full Content Hosting Service from scratch.

Clustered Index in Search

9 06 2006

After spending many hours with a hot CPU, I finally came up with and efficient mechanism of indexing in real time in a database clustered environment. I could have put the Lucene Index segments directly into the database with a JDBCDirectory from the Compass Framework. But unfortunately the MySQL configuration of Sakai prohibited the emulation of seeks within BLOBs, so the performance was hopeless. Im not convinced that emulating seeks in BLOBS actually helps as I think the entire BLOB might still be streamed to the App server from the database.

Normally you would run Lucene using Nutch Distributed File System, which borrows some concepts the the Google File System. NDFS is a self healing shared nothing file system tuned for use with Lucene…. but its not easy to set up from within a Java app, and it has to have some dedicated nodes, dedicated to certain tasks.

Failing that you might run rsync at the end of each index cycle to sync the Index onto the cluster nodes. I think this was the preferred method prior to NDFS. However, its a bit difficult to get to EXT3 inodes from within a Java app, and Sakai runs on Windows and Unix, so I cant rely on the native rsync command.

The solution that has just gone out to the QA community, was to use the local FSDirectory to manage local copies of the index segments, and once an index write cycle is complete, distribute the modified segments via the database. In testing, I tried this against MySQL with about 10GB of data in 200-300 K documents. It worked Ok. I’m waiting with baited breath to see how many JIRA items as posted against this, as everything that flows over the Sakai Entity bus is seen by the indexer. Nice not to have a component that just gets tested whatever is done in QA!

Section Group Support for Wiki in Sakai

9 06 2006

There is already some support for Groups and Sections in Sakai RWiki. This is basic support that connects a Wiki SubSpace to a Worksite group. If the connection is made (by using the name of the group as the SubSpace name), permissions are taken from the Group permissions. There is a wiki macro that will generate links to all the potential Group/Section SubSites in a Worksite (see the list of macros in the editing help page)

This is a simple approach that is probably understandable, but its not exactly sophisticated or flexible. So, being a glutton for UI punishment, we have started to open up the concept further.

The concept is, that for any node in the Wiki hierarchy, thats Wiki Pages or Wiki Subsites, you (an maintain or admin user) can configure which permissions ‘realm’ is associated with the node, edit the permission on the roles, add/delete roles in that ‘realm’, modify permissions associated with the role, add/remove users from a role.

A can of worms! The challenge is not in creating the functionality, any thing is possible. The challenge is with creating a UI that doesn’t confuse the hell out of anyone other than the developer that created.

One view on this is that its better to stick with simple statements that control the permissions and not expose the full power of the underlying permissions system. Such a statement might be ‘Lock this page’. I think I agree with that for an access type users, but for a user who is maintaining a worksite, this may not be enough power. I am going to have to do many mock ups to uncover all the issues. The advanced permissions editing may not make 2.2.

LGPL What is acceptable extension

6 06 2006

Sesame is LGPL license, with a clarification on Java binding. The net result of the statement in the Readme is that you can use Sesame in another project without it having to be LGPL. Thats great! Well its great if you want to use the LGPL Library in a way the developers intended. When it comes to reimplementing an underlying diver you are faced with three choices. Either implement the driver so that its compatible with the internal implementation, or implement your own algorithm, or use something else.

Implementing your own driver, keeping it compatible with the original driver and underlying storage structure is almost certainly an LGPL extension that should also be licensed as LGPL and released back to the net.

Implementing your own driver with its own algorithm will probably give you the right to claim that the code is just using the LGPL code as a library. Then you can choose you own license.

If your project already has a non xGPL license, then neither of the above options are palatable. A fork in the ‘virtual’ code base or changing your license. I dont know the answer, so I’m choosing the do nothing option. I wont be using the Data source and non DDL based Sesame RDBMS driver or fixing any of the bugs in it, since its just too close to the original, uses too many of the thoughts and would have to be licensed LGPL. So Sesame will be available inside Sakai search as a add in module that uses its own database connection. I guess that not many will want to use this non-standard, in the terms of Sakai, connection strategy.

There is hope, if an RDF Triple store becomes as widespread and acceptable as a RDBMS, then we will be able to treat it just like MySQL or any other database. Time will tell.

Sesame RDBMS Drivers

6 06 2006

I’ve written a Data source based Sesame driver, but one thing that occurs to me in the Sakai environment. Most production deployments do not allow the application servers to perform DDL operations on the database. Looking at the default implementations, thats the non data source ones, they all perform lots of DDL on the database in startup and in operation. This could be problem for embedding Sesame inside Sakai. I think I am going to have to re-implement from scratch the schema generation model. It might even be worth using Hibernate to build the schema although it not going to make sense to use Hibernate to derive the object model, the queries are just too complex and optimized.