Jackrabbit, Oak, Sling, maybe even OAE

30 08 2012

Back in January 2010 the Jackrabbit team starting asking its community what did it want to see in the next version of
Jackrabbit. There were some themes in the responses. High(er) levels of write concurrency, millions of child nodes and cloud scale clustering with NoSQL backends. I voted for all of these.

I was reminded of this activity by the announcement of a Oak Hackathon, or rather an Oakathon that is being organised this September at the .adaptTo conference in Berlin. This Oakathon seems to be intended to get users upto speed on using Oak, which means that it might be ready for users to take a look. So I did.  The code checks out, builds and passes all its integration tests. No surprises there from the Jackrabbit team.

I am not going to pretend I understand the storage model being used or how it addresses the requirements that came out of Jackrabbit, but the Persistence implementation looks like it could be adapted to a sharded schema over many DB instances or even ontop of Cassandra. The storage model looks something like a git tree. It seems to solve the many child nodes issue that Sparse Map Content solved for OAE in a slightly different, but more efficient way by using a DAG structure with pointers to a child tree rather than a parent pointer. I won’t be able to tell if the concurrency issues that caused me to have to squeeze Sparse Map Content into the Sling repository API layers, without some testing, but the approach looks like it might. Certainly the aims of the Road map cover the needs of OAE and go beyond the scale and concurrency required.

Best of all, it already has a Sling Repository implemented, so it should be relatively easy to spin up Sling on Oak and run all the tests that caused OAE to move from Sling on Jackrabbit to Sling on a hybrid of SMC and Jackrabbit.

Massively Online

22 08 2012

Paris Metro logo Español: Logo del Metro de Pa...

In mid 2008 during the Sakai conference at Paris amongst the early summer european heat and the over friendly crowding of the Paris Metro I was involved in a small group of friends who had the idea that Sakai could recapture its innovative lead in Higher Education. There were representatives of commercial organisations at the conference looking nervously at our plans and taking potential customers to one side. Over the months that followed our plans grew, fueled by enquiries of what scale. The application we started building then was built so that every operation was by pointer, scaling in every operation such that the power of N was considerably less than 1. This was not done purely to satisfy the desire to serve the enquiries for collaboration communities of upto 12M users. It was done to enable one of the very early aims of the project.

There is a iTunesU lecture from Stanford that I remember watching on an iPod travelling through the suburbs of Chicago in a taxi on the way to the airport. It told the story of medical device company (a case study) where a new CEO wanted to drive the bottom line throughout the entire organisation. Monthly reports were created based on current months data, unheard of in the company, and where headings could not be filled the words “Insufficient Data” were placed. This so incensed the readers of these reports that soon every line was filled. Had blanks been left, perhaps human curiosity would not have taken over. Most of us dislike the unknown and attempt to fill it with information.

Earlier that year Google had announced OpenSocial. A standard intended to make it easier for developers to build applications that integrated with a social network, perhaps an answer to Facebook’s App environment. The underlying motivations for this initiative was not really to pander to developers but rather to increase the quality, volume and hence value of the data that Google was able to collect on its product, you and me, or rather what we find relevant.

In 2008, I wanted to do two things. Make Sakai easier for developers to develop for, and make it a suitable platform for running at massive scale, not for the sake of scale but for the sake of collecting data, big data, from which to make informed decisions about the actions we take in Higher Education. Those actions don’t inform or drive purchase decisions, they inform and drive whole lifetimes of achievement. Saving a drop out student, creating a cancer researcher changes lives.

That was 2008. In 2009 presented a keynote speech at North Sidney Institute for the AuSakai conference. Although the speech was poorly delivered, the message had developed having attended and listened to Hadoop/Pig/Mahout sessions at Apache conferences. Higher Ed needed to learn from the analytics and data analysis being employed in the wider web. Our data was far richer in metadata than a marketing web site and failing to put the infrastructure in place to collect that rich metadata without reduction would put Higher Ed at a disadvantage.

Fast forward to today. The organisations that approached me with requirements for scale have moved on. One group, MIT and Harvard has formed edX, delivering courses at massive scale, collecting data and using the results to mark, learn and improve. Other groups went internal with development and new comers like Coursera have exploded on to the scene delivering the scale and insight that Sakai 3 was being built to deliver.

With the benefit of hindsight I can see the differentiator. Sakai 3, now Sakai OAE tried to solve the use case of content authoring for course delivery to small groups teaching. Analytics becomes irrelevant in that environment, and the application is incapable of making the compromises necessary to be adapted for scale. Like pushing a bolder up a hill. Contrast that to the workflow of Corsera and edX. The bolder is already at the top of the hill. Massive scale makes the business model work and so content authoring and creation can take on a wholly different nature. The problems to be solved are that of huge numbers of users interacting with a small number of courses, not of thousands of users and thousands of courses all interacting with each other in random ways.

The future. I suspect the quality and precision of teaching, backed by analytics and evidence that world class materials can deliver through the edX and Coursera platforms will make those platforms the new Google of education. Just as no small online advertising platform can compete in generalist online ad placement against Google, the platforms that don’t have the ability to collect data and use it appropriately will become irrelevant. I can’t tell who will win this the race to grab this new land, sadly I think what I started in 2008 never got out the blocks in this race.

Evolution of networks

2 08 2012

In late 1993 I remember using my first web browser. I was on a IBM RS6000, called “elysium”. It had a 24bit 3D graphics accelerator, obviously vital for the blue on grey text that turned purple when you clicked it. I also needed it to look at the results of analysis runs. The stress waves flowing through the body shell of the next BWM 3 series, calculated by a parallel implementation of an iterative dynamic solver I was working on for MSC-Nastran. I was at the Parallel Application Center in Southampton, UK. I worked on European projects doing large scale parallelisations. Generally in engineering but always solving practical problems; Processing seismic survey data from the south atlantic in 6 days rather than 6 weeks on HP convex clusters. Monte Carlo simulations of how brain tissue would absorb radiation during radio therapy, avoiding subjecting the cancer patient to two hospital visits. It was interesting work, and in my nieve youth I felt at times I was doing good for humanity. Using a web browser was becoming part of my normal life. I used it to visit the few academic sites that contained real information, to access papers and research that previously we would have received in paper form from the British Library. This was the first network, exciting, raw, generating a shift in how I communicated and how effective I was.

I remember some time that year, hearing about how this largely academic network was growing. It wasn’t going to be dominated by .edu  and .ac.uk, but there was going to be many many more .com s. I was worried by this. Concerned that the influx would bury a new found source of information. I was being selfish. One of my friends had foresight. He registered aa.com at a time when all web addresses were no more than TLAs. Some years later he had to hand it over to American Airlines. My own web address came a few years latter at a time when registration was a paper process accompanied by proof of registration at Companies House. This first network evolved with an influx of people and organisations performing a land grab. Soon all the TLAs were gone. Soon the number of porn sites far outweighed those with research content, and my worst fears were not realised as the search engines also evolved. Thanks to those earlier pioneers I can still find what I need.

Fast forward 2003, the doc com boom came and went. Sites that pushed content grew and fell as did the strange concept of money from nothing. Some survived. A new sort of network started. A network of humans using a site called MySpace. Initially it was cool and useful. It grew at a crazy pace with little or no control, allowing anyone to do almost anything on a web page.  And they did. Pretty soon it became one of the most unpleasant places to be on the internet. Then Murdoch bought it. No one talks about it any more. The myspace sign, if there is one doesnt appear on TV adverts beside that of Twitter or Facebook. If you look at it today it looks like a pale but still mildly unpleasant version of Facebook. I would guess a high portion of the Myspace pages are not those of normal human beings. The are a mixture of  bots and corporates all out to extract some last cent in vast quantities from some hair thin opportunity.  At least only 25M of us have been fooled by it.

Meanwhile, Facebook was growing. Facebook had the good sense not allow the type of  user generated content that had made MySpace so unpleasant. It started as a private club on some quite unsound social footings at Harvard.  Then it invited only a select few higher Ed institutions, first in the US, later overseas. Cambridge, cam.ac.uk where I now work was added relatively late on. It was a cool and fun place to be to start with. I remember being told by one particularly sharp Cambridge student who stared at me when I suggested she might like to use a reading list app, and told me to “stay out of my Facebook”. I was now older certainly not cool. As it opened up every wanted to have a Facebook page, and everyone did. 500M of us, oh no wait, that’s  now 900M of us, sorry I blinked. Or did we. The IPO had a valuation that predicted revenue streams in the gazillions, but the post IPO headache is kicking in strong for some. When Facebook was relatively small and raw, few were interested. Friends meant something. You could be very certain that everyone you could connect with was a normal human. With faith in human nature all have some good qualities. Those qualities made them worth connecting with. Facebook will eventually go the way of MySpace as in attempting to extract value from is product (us and our connections), it has attracted millions of fake us’es and devalued both the concept of friend and what it could sell those connections for. Everyone is born with a finite amont of  “Friend” currency. We all choose how to spend it. Some invest in 4 or 5 lifelong friends who stick together through thick and thin. Others spread it thinly over 100s. Perhaps the celebs who feel lonely have spread their Friend currency so thinly they cant identify and real friends any more.  No one has 500 valuable friends.

This is the evolution of networks. There is only a finite amount of value for any one node within the network. Initially in an unconnected state there will be lots of potential, but as it grows and each node makes more connections. Outside entities step in to extract value from those connections. As the connections are made, the value gets dissipated and lost until the space returns to primeval slime. Financial markets have a word for this. Arbitrage. Eventually every last opportunity is exploited until all is balanced and there are no opportunities left. Beyond that point we venture into an unreal world of fake promisses and pyramid selling.

I hope we can learn from how past networks have evolved. Twitter almost did during the Arab Spring. Sadly it too has been overrun by bots and organisations vying to exploit and extract. G+ might prove sustainable, but what else ? Coursera? Academia is a proven breeding ground.