Node.js vs SilkJS

28 09 2012

synchronous ducks

Node.js, everyone on the planet has heard about. Every developer at least. SilkJS is relatively new and creates an interesting server to compare Node.js against because it shares so much of the same code base. Both are based on the Google V8 Javascript engine that convert JS into compiled code before executing. Node.js as we all know uses a single thread that uses a OS level event queue to process events. What is often overlooked is that Node.js uses a single thread, and therefore a single core of the host machine. SilkJS is a threaded server using pthreads where each thread processes the request leaving it upto the OS to manage interleaving between threads while waiting for IO to complete. Node.js is often refereed to as Async and SilkJS is Sync. The advantages to both approaches that are the source of many flame wars. There is a good summary of the differences and reasons for each approach on the SilkJS website. In essence SilkJS claims to have a less complex programming model that does not require the developer to constantly think of everything in terms of events and callbacks in order to coerce a single thread into doing useful work whilst IO is happening. Although this approach hands the interleaving of IO over to the OS letting it decide when each pthread should be run. OS developers will argue that thats what an OS should be doing and certainly to get the most out of modern multicore hardware there is almost no way of getting away from the need to run multiple processes or threads to use all cores. There is some evidence in the benchmarks (horror, benchmarks, that’s a red rag to a bull!) from Node.js, SilkJS, Tomcat7, Jetty8, Tornado etc that using multiple threads or processes is a requirement for making use of all cores. So what is that evidence ?

Well, first read why not to trust benchmarks once you’ve read that lets assume that everyone creating a benchmark is trying to show their software off best.

The Node.js 0.8.0 gives a request/second benchmark for a 1K response at 3585.62 request/second.

Over at Vert.x there was an of Vert.x and Node.js showing Vert.x running at 300,00 requests/s. You do have to take it with a pinch of salt after you have read another post with some detailed analysis that points out testing performance on the same box with no network and no latency is theoretically interesting, but probably not informative for the real world. What is more important is can the server stand up reliably forever with no downtime and perform normal server side processing.

So the SilkJS benchmarks in one of its more reasonable benchmarks claim it runs at around 22,000 request per second delivering 13K of file from disk with a very high levels of concurrency 20000. Again its hard to tell how true the benchmark is since many of those requests are pipelined (no socket open overhead), but one thing is clear. With a server capable of handling that level of concurrency some of the passionate arguments supporting async servers running one thread per core are lost. Either way works.

There is a second side to the SilkJS claims that bears some weight. With 200 server threads, what happens when one dies or needs to do something that is not IO bound? Something mildly non trivial that might use a tiny bit of CPU. With 1 server thread we know what happens, the server queues everything up while the on server thread does that computation. With 200, the OS manages the time spent working on the 1 thread. There is a simple answer, offload anything that does and processing to a threaded environment, but then you might as well use an async proxy front end to achieve the same.

There is a second part of the SilkJS argument that holds some weight. What happens when 1 of the SilkJS workers dies? Errors that kill processes happen for all sorts of reasons, some of them nothing to do with the code in the thread. With 199 threads the server continues to respond, with 0 it does not. At this point everyone who is enjoying the single-threaded simplicity of an async server will, I am sure, be telling me their process is so robust it will never die. That may well be true, but process sometimes dont always die, sometimes they get killed. The counter argument is, what happens when all 199 threads are busy running something. The threaded server dies.

To be balanced, life in an async server can be wonderfully simple. There is absolutely no risk of thread contention since there is only ever one thread, and it doesn’t matter how long a request might be pending for IO for as all IO is theoretically non blocking. It doesn’t mater how many requests there are provided there is enough memory to represent the queue. Synchronous servers can’t do long requests required by WebSockets and CometD. Well they can, but the thread pool soon gets exhausted. The ugly truth is that async servers also have something that gets exhausted  Memory. Every operation in the event queue consumes valuable memory, and with many garbage collected system, garbage collection is significant. Although it may not be apparent at light loads, at heavy loads even if CPU and IO are not saturated, async servers suffer from memory exhaustion and or garbage collection trying to avoid memory exhaustion, which, may appear as CPU exhaustion. So life is not so simple, thread contention is replaced by memory contention which is arguably harder to address.

So what is the best server architecture for modern web application?

An architecture that uses threads for requests that can be processed and delivered in ms, consuming no memory and delegating responsibility for interleaving IO to the OS, the resident expert at that task. Coupled with an architecture that recognises long IO intensive requests as such and delegates them to async part of the server, and above all, an architecture on which a simple and straightforward framework can be built to allow developers to get on with the task of delivering applications at webscale, rather than wondering how to achieve webscale with high load reliability. I don’t have an answer, other than it could be built with Jetty, but I know one thing, the golden bullets on each side of this particular flame war are only part of the solution.

Modern WebApps

12 03 2012

Modern web apps. like it or not, are going to make use of things like WebSockets. Browser support is already present and UX designers will start requiring that UI implementations get data from the server in real time. Polling is not a viable solution for real deployment since at a network level it will cause the endless transfer of useless data to and from the server. Each request asking every time, “what happened ?” and the server dutifully responding “like I said last time, nothing”. Even with minimal response sizes, every request comes with headers that will eat network capacity. Moving away from the polling model will be easy for UI developers working mostly in client and creating tempting UIs for a handfull of users. Those attractive UIs generate demand and soon the handfull of users become hundreds or thousands. In the past we were able to simply scale up the web server, turn keep alives off, distribute static content and tune the hell out of each critical request. As WebSockets become more wide spread, that won’t be possible. The problem here is that web servers have been built for the last 10 years on a thread per request model, and many supporting libraries share that assumption. In the polling world that’s fine, since the request gets bound to the thread, the response is generated as fast as possible, and the the thread is unbound. Provided the response time is low enough the request throughput of the sever will be maintained high enough to service all requests without exausting the OS’s ability to manage threads/processes/open resources.

Serving a WebSocket request with the same model is a problem. The request is bound to a thread, the response is not generated  as it waits, mid request, pending some external event. Some time later, that event happens and the response is delivered back to the client. The traditional web server environment will have to expect to be able to support as many concurrent requests on your infrastructure as there are users who have a page pointing to your sever on one of the many tabs they have open. If you have 100K users with a browser window open on a page where you have a WebSocket connection, then the hosting infrastructure will need to support 100K in progress requests. If the webserver model is process per request, somehow you have to provide resources to support 100K OS level processes. If its thread per request, then 100K threads. Obviously the only way of supporting this level of idle but connected requests is to use an event processing model. But that creates problems.

For instance, anyone writing PHP code will know it will probably on only run in process per worker mode as many of the PHP extensions are not thread safe. Java servlets are simular although changes in the Servlet 3 spec have constructs to release the processing thread back to the container, although many applications are still being developed on Servlet 2.4 and 2.5, and most frameworks are not capable of suspending requests. Python using mod_wsgi doesn’t have a well defined way of releasing the processing thread back to the server although there is some code originating from Google that uses mod_python to manipulate the connection and release the thread back to Apache Httpd.

There are new frameworks (eg Node.js) that address this problem and there is a considerable amount of religion surrounding their use. The believers able to show unbelievable performance levels on benchmark test cases and the doubters able to point to unbelievably complex and unfathomable real application code. There are plenty of other approaches to the same problem that avoid spagetti code, but the fundamental message is, that to support WebSockets at the server side an event based processing model has to be used, that is the direct opposite to how web applications have been delivered to date, and regardless of the religion, that creates a problem for deployment.

Deployment of this type of application demands that WebSocket connections are can be unbound from the thread servicing the request, when it becomes a WebSocket connection. The nasty twist is that every box handling the request needs to be able to do that, including any WebTiers or load balancers, and any HTTP connection can be converted from the Http protocol into the WebSocket protocol during the request. Fortunately, sensible applications will only support WebSocket on known URLs which gives the LB and WebTiers an oppertunity to route, but prior to routing every component in the chain must be using a small number of threads servicing a large number of open and active sockets.

This doesn’t mean that an entire application framework must be thrown away, but it does mean that whatever is handling the WebSocket request, upgrade and eventual connection must be event based. This also doesn’t mean that everyone must learn how to read and write spaghetti code in managing every aspect of threading threading, concurrency in communication re-writing every library to be non-blocking and asynchronous. Fortunately there are some extremely capable epoll based containers (including Node.js, other than its insistance to use JS) that can be used either as WebTier proxies or ultimate endpoints. Some of them, such as the Python based Tornado server will frameworks supporting the mod_wsgi standard and hence capable of running Django based applications for the non WebSocket portion. As can be seen from real benchmarks, these servers offer performance level expected of event based processing and support for traditional frameworks with real blocking resource connections.