Had a visit from Edwin Shin (Fedora Commons Committer) this week who nipped over from France where he now lives. Compared to the med, Cambridge is cold this year. We spent the week looking at the progress made on Sakai K2, based on Apache Sling from a view point of Digital Repositories. Looks like there is lots of common ground. We both see long term storage as being cloud based, with interesting points on storage mechanisms like Apache Cassandra, Project Voldemort etc. Also the component structure provided by OSGi (Apache Felix in the case of Sling) has some strong benefits. He started to embed a Fedora RDF component into Sling/K2 as an OSGi component. I wonder how much of a Fedora/DSpace functionality could be covered by standard components like this. Might achieve the same economies we saw with Sakai 1.8Million lines of code down to about 60K, just because we avoided “not-invented-here”
Spending 4 hours in a car driving to Oxford to give a presentation at OSS Watch gave me an opportunity to think. Perhaps getting mildly lost in Milton Keynes on the way back consolidated my thoughts. For those that dont live in the Europe, Milton Keynes is a new town built in the 1960’s consuming a village of the same name. It’s laid out on a grid pattern like many US cities, rather alien to Europeans who have become used to winding roads that promiss the reward of a destination, “The Great North Road”, goes north south and for the authorities in London took them to the Great North. If named by those in Newcastle it might have been called “The Crowded South Road” to discourage any brain drain. But the interesting thing that struck me as a pulled over to search on Google Maps just where “H5” went in Milton Keynes, (H5 is the name of the one of grid roads), was that humans are unable to make sense of large amounts of unfamiliar information. For the average European (habitants of Milton Keynes excluded), the grid pattern of Milton Keynes with its symbolic naming of the major arteries is confusing, just as the winding roads of Europe with ancient names and strange numbers “A1”, must feel like a trip along the blood vesciles of some strange animal for the average US city dweller. But even then we are all given a frame of reference or a language that enables our small brains to navigate this space. In the UK, the road names provide us a way, if you follow “Cambridge Road” out of the east end of London, you stand some chance of ending up in Cambridge, before urban sprawl that chance was a certainty. Imagine a world where there were no maps, and no visibility beyond the end of your nose, except a device. That device allowed you to say where you wanted to go, and it would go out into the cloud of information and tell you the way. This is the world of search and the cloud. The compartmenalisation and ordering has been abstracted to such an extent that all containers are removed and everything exists with in a massive amorphous cloud. We have developed highly efficient tools to locate information within that cloud eliminating all need to pre-categorise anything. But are we missing something? We are humans after all, and we have become adept at sharing and communicating by compartmentalizing what is important. We talk of main roads, autobahns, highways, interstates and know that although there are smaller less travelled routes to one side, we could take a detour, follow our noses, make discoveries and likely get back on the highway at the next junction. The trigger might be a signpost tempting us off the trunk route. Cloud and search does not really provide us with this structure, and the point of this post is that when you try and interface a compartmentalized or hierarchical mechanims with a search based cloud system it generates tension.
I and a team at Cambridge have been arguing about the UX issues surrounding file storage. There is a desire to create a cloud based storage system where users throw files into a central pot of information. There is no structure in the organization of the information, although they can retrieve that information by a URI to the information. There is nothing complicated or difficult about this. The URI is totally meaningless, like a Tiny URL before you follow it, but the file has metadata attached that makes it possible to find it by search, (eg tags). This enables us to create multiple views of the information. Obviously free text search is also available. On one side of the argument there is the opinion that these views should be single level depths and there should be no hierarchy. eg /tags/ceasium-137 would return all files tagged with ceasium-137. In the middle there is the view that humans need structure and so we should allow the views to have some taxonomy, /physics/fission/ceasium-137, and on the other side there is a view that the meaningless URI should have meaning as well.
The tension this generates comes from the ability to list the contents at any one level and consequently the ability of other systems to interface to the structure. Listing /tags is equivalent to listing all the tags, viable within a single subject area, but impractical within human existence. The problem is exacerbated when tools that assume hierarchical structure are interfaced. Many have made the assumption that the hierarchy has been defined to limit the number of children to a manageable number. Filesystems and the tools that act on them, don’t traditionally expect to support millions of child nodes. In fact most file systems browsers become unusable over a few thousand items. So as soon as you interface these tools to an information store that is cloud based they fail as they are not clever enough to tell the user what options there might be without listing the all the options at the next level.
The real underlying question is, if we were to undo history and start intelligent life with a search engine, would Parmenides have though about Ontologies to organize our world into shared hierarchies which we could communicate with one another into every aspect of existence ? Perhaps the human brain craves structure even if in its default form its suboptimal.
Comments : Comments Off on Clouds are search based, humans are not.
Categories : Uncategorized