Languages and Threading models

17 05 2012

Since I emerged from the dark world of Java where anything is possible I have been missing the freedom to do whatever I wanted with threads to exploit as many  cores that are available. With a certain level of nervousness I have been reading commentary on most of the major languages surrounding their threading models and how they make it easy or hard to utilize or waste hardware resources. Every article I read sits on a scale somewhere between absolute truth to utter FUD. The articles towards the FUD end of the scale always seem to benchmarks created by the author of the winning platform, so are easy to spot. This post is not about which language is better or what app server is the coolest thing, its a note to myself on what I have learnt, with the hope if I have read to much FUD, someone will save me.

To the chase; I have looked at Java, Python touched on Ruby and thought about serving pages in event based and thread based modes. I am only considering web applications, serving large numbers of users and not thinking about compute intensive, massively parallel or GUI apps. Unless you are lucky enough to be able to fit all your data into memory or even shard the memory over a wide scale cluster, the web application will become IO bound. Even if you have managed to fit al data into core memory you will still be IO bound on output as core memory and CPU bandwidth will forever exceed that of networks, and 99% of webapps are not CPU intensive. If it was not that way, the MPP code I was working on in 1992, would have been truly massively parallel, and would have found a cure for Cancer the following year. How well a language performs as the foundation to a web application is down to how well that language manages the latencies introduced by non core IO and not how efficiently optimises inner loops. I am warming to the opinion that all languages and most web application frameworks are created equal in this respect, and its only in the presentation of what they do where there is differentiation. An example. A Python based server running in a threaded mode compared to Node.js.

Some background. Node.js uses the Chrome Javascript engine that predicts patterns of JS code and converts them into C. It runs as a single thread inside a process on one core, delivering events to code that perform work exclusively until they encounter some code that releases control back to the core event dispatch, normally by returning from the event handling code. The core of Node.js generally uses an efficient event dispatch mechanism built into the OS. (epoll, kqueue etc). There is no internal threading within a Node.js proces and to use multicore hardware you must fork separate OS level processes which communicate over lightweight channels. Node.js gets is speed from ensuring that the single thread is never blocked by IO from doing work. The moment that happens the single thread in Node.js moves on to performing some other useful work. Being a single process it never has to think about inter-thread locking. That is my understanding of Node.js

Python (and Ruby to some extents), when running as a single process allows the user to create threads. By default these are OS level threads (pthreads) although there are other models available. I am talking only about pthreads here which dont require programmer intervention. Due to the nature of the Python interpreter there is a global lock (GIL) that only allows 1 python thread to use the interpreter at a time. Threads are allowed to use the interpreter for a set time after which they are rescheduled. Even if you run a python process on a multicore system, my understanding is, only 1 thread per process will execute at a time. When a thread enters blocking IO it releases the lock allowing other threads to execute. Like Node.js, to make full use of multicore hardware you must run more than one Python processor. Unlike Node.js the internal implementation of the interpreter and not the programming style ensures that the CPU running the python process switches between threads to ensure its always performing useful work. In fact thats not quite true, since the IO libraries in Node.js have to relinquish control back to the main event loop to ensure they do not block.

So provided, the mechanism for delivering work to the process is event based there is little difference in the potential for Ruby, Python or Node.js to utilize hardware resources effectively. They all need 1 process per hardware core. Where they differ is how the programmer ensures that control is released on blocking. With Python (and Ruby IIUC), control is released by core interpreter with out the programmer even knowing it is happening. With Node.js control is released by the programmer invoking a function that explicitly passes control back. The only thing a Python programmer has to ensure is that there are sufficient threads in the process for the PIL to pass control to when IO latencies are encountered, and that depends on the deployment mechanism which should be multi-threaded. The only added complication for the Node.js model is that the IO drivers need to ensure that every subsystem that performs blocking IO has some mechanism of storing state not bound to a thread (since there is only 1). A database transaction, for one request must not interact with that for another. This is no mean feat and I will guess (not having looked) is simular to the context switching process between native OS level threads. The only thing you cant do in Node.js is perform a compute intensive task without releasing control back to the event loop. Doing that stops a Node.js from serving any other requests. If you do that in Python, the interpreter suspends the pthread and reschedules after a set number of instructions. Proof, in some senses that multitasking is a foundation of the language rather than an artifact of the programmers code base.

The third language I mentioned is Java. Having spent most of my the last 16 years coding Java based apps I have enjoyed the freedom to be able to use every hardware core available from a single process all sharing the same heap. I have also suffered the misery of having to deal with interleaving IO, synchronization and avoiding blocking over shared resources. Java is unlike the other languages in this respect since it gives the programmer the tools and the responsibility to make best use of the hardware platform. Often that tempts the programmer to think they can be successful in eliminating all blocking IO by eliminating all non core memory IO. The reality is somewhat different, as no application that scales and connects humans together will ever have few enough connections between data to localise all the data used in a request to a single board of RAM. From my MPP years this was the domain decomposition bandwidth. It may be possible to eliminate IO from disk, but I have to doubt that a non trivial application can eliminate all backend network IO. In a sense, the threading model of Java tempts the developer to try and implement efficient hardware resource utilization, but doesn’t help them in doing so. The same can be said for many of lower level compiled languages. Fast and dangerous.

Don’t forget, with web applications, it’s IO that matters.


The trouble with Time Machine

9 05 2012

Every now and again Time Machine will spit out a “Cant perform backup, you must re-create your backup from scratch” or “Cant attach backup”.  For anyone who was relying on its rollback-time feature this is a reasonably depressing message and does typify modern operating systems, especially those of the closed source variety. At some point, having spent all the budget on pretty user interfaces, and catered for all use cases the deadline driven environment decides, “Aw stuffit we will just popup a catch all, your stuffed mate dialog box”. 99% of users, rant and rave and delete their backup starting again with a sense of injustice. If your reading this and have little or no technical knowledge, thats what you should do now.

If you get down to bare nuts and bolts you will find that a Time Machine backup is not that disimular to a BackupPC backup of 10 years ago. It makes extensive use of hard links to snapshot the state of the disk. It perform this in folders with thousands of files creating uniformly distributed tree. That all works fine except when it doesn’t. Anyone who has used hard links in anger on a file system will know it tends to put the file system under a lot of stress resulting in more filesystem corruptions than normal. File systems are not that transactional so if an operation fails part way through, then the hard links may start to generate orphaned links.

Now TimeMachine runs fsck_hsf when it attaches a sparse bundle file system which is the Time Machine backup. Unfortunately it doesn’t try that hard to fix any problems it finds and couldn’t possibly corrupt its pretty UI by telling the user that it might have a problem with the users cherished backup of life’s memories. Not good for marketing, loosing your loyal customers photos when you promised them it wouldn’t happen. Fortunately, those messages are logged in /var/log/fsck_hfs.log. If you use Time Machine and are finding the attach stage takes forever. Take a look in there for the words “FILESYSTEM DIRTY”. That indicates, that the last time Time Machine tried to attache the drive the file system check was unable to check the file system and correct any errors, and so, it marked it DIRTY. It is possible to correct one of these filesystems, however, with all those hard links the likelyhood is that your filesystem, even if fsck_hfs -dryf /dev/discXs1 does correct the errors and put it into a FILESYTEM CLEAN state, it wont be a usable and valid backup. When your laptop exits you house with a man wearing a stripy jumper and tights over his head, your children (and you) will cry realising that the backup in the cupboard is corrupt.

What advice can I give you?

  1. Check your backups regularly
  2. If you use TimeMachine, open the “console” program, type DIRTY into the search box and if you find that word, go out an buy another backup disk…. quick.

For those that want to try and recover a Time Machine backup.

chflags -R nouchg /Volumes/My\ Time\ Capsule/mylaptop.sparsebundle
hdiutil attach -nomount -noverify -verbose -noautofsck /Volumes/My\ Time\ Capsule/mylaptop.sparsebundle
tail -f /var/log/fsck_hfs.log
# If you see  "The Volume could not be repaired"
# then you need to run
fsck_hsf -dryf /dev/rdiskXs2
# where X was the number of disk listed when you hdutil attached.
# I can almost guarentee that the disk will not be recoverable and you will see tens of thousands
# of broken hard link chains. Fixing those will probably corrupt the backup.
# which is why this is futile.

If you are using a Time Capsule, power cycle it first, connect your machine to it of 1000BaseT and make sure no other machines are accessing it. Don’t use Wifi unless you want to grow old and die before the process completes.



Perhaps I am being a little unfair here. The same unreliability could happen with any backup mechanism that is vulnerable to corrupted backups as a result of the user shutting the lid, the computer going to sleep, a power failure. Time Machine and Time Capsules weakness is that its all to easy to disconnect the network hard disk image and once you do that the Time Capsule end has no way of shutting down the back up process in a safe way. Do that enough times (I have found 1 is enough) and the backup is corrupt and unrecoverable and even the HFS+ Journal can’t recover.

I was also a bit unfair on BackupPC, which is initiated from the server and so although it may create nightmare file systems, can leave the backup image in a reasonable state when the server looses sight of the client.

Time Machine on an attached drive appears more reliable, but a lot less useful.

PyOAE renamed DjOAE

2 05 2012

I’ve been talking to several folks since my last post on PyOAE and it has become clear that the name doesn’t convey the right message. The questions often center around the production usage of a native Python webapp or the complexity of writing your own framework from scratch. To address this issue I have renamed PyOAE to DjOAE to reflect its true nature.

It is a DJango web application and the reason I chose DJango was because I didn’t want to write yet another framework. I could have chosen any framework, even a Java framework if such a thing existed, but I chose Django because it has good production experience with some large sites, a vibrant community and has already solved most of the problems that a framework should have solved.

The latest addition to that set of problems already solved, that I have needed is data and schema migration. DjOAE is intended to be deployed in a DevOps like way with hourly deployments  if needed. To make that viable the code base has to address schema and data migrations as they happen. I have started to use South that not only provides a framework for doing this, but automates roll forward and roll back of database schema and data (if possible). For the deployer the command is ever so simple.

python migrate

Which queries the database to work out where it is relative to the code and then upgrades it to match the code.

This formalizes the process that has been used for years in Sakai CLE into a third party component used by thousands and avoids the nightmare scenario where all data migration has to be worked out when a release is performed.

I have to apologise to anyone upstream for the name change as it will cause some disruption, but better now than later. Fortunately clones are simple to adjust, as git seems to only care about the commit sha1 so a simple edit to .git/config changing

url = ssh://
url = ssh://

should be enough.

If you are the standard settings you will need to rename your database. I did this with pgAdminIII without dropping the database.