Google CourseBuilder, a scalable course delivery platform ?

15 09 2012

This week I discovered Google CourseBuilder, the latest entry into the MOOC arena. It’s a Google App Engine application that Google Research used to host a MOOC to 155K students a few months ago. It follows a simular pedagogy to that used by other MOOC providers with high quality video lessons, that give the student the feeling they are working one on one with the lecturer. Google have open sourced the code under and Apache 2 license which gives us all an insight into the economies of scale that a MOOC represents. Unlike the traditional Virtual Learning Environment where the needs of staff are catered for in the user interface, Google CourseBuilder currently delegates all the functionality to spreadsheets, editing snippets of javascript and html. There is no reason why it could not be given an user interface, but when you consider what its is trying to do you realise that staff user interfaces for course creation are less important than the delivery of the course at scale. Consequently the application itself is tightly focused on delivering the course as quickly and as simply as possible to as many users as possible. Google App Engine makes this easy, even for meer mortals. Once you have accepted that nothing is really for free, and you do have to pay for bandwidth used and energy in at some point scaling this application upto 100K or even 1M users requires little or no effort on  your part. You also, at the moment, have to accept if you are going to reach that many students, you are going to have to ask for a little bit of help from someone to write some HTML, drive a spreadsheet and write a bit of Javascript as well as hit the “deploy” button on the App Engine SDK. I say, at the moment, because it isn’t going to be that hard to create an administrative UI, and thats what I have been doing for a few hours this week.

So the reality is, very few lecturers are going to create a course that will be delivered to 155K students, and if they succeed in going viral, the drop out rate is likely to be very high. The course Google ran issued 22K certificates, indicating a drop out rate of 85%. Its still an impressive number when many campuses are no where near that size however, most institutions would not survive with that level of drop out and all would be looking at ways of reducing it. Institutions invest more in their students and so need lower levels of drop out. As a result, their courses are smaller, they don’t have the economies of scale and can’t invest as much in the delivery of each individual course. All is not lost however, the opportunity that Googles CourseBuilder represents could be utilized if there was a small reduction in effort associated with course creation and course delivery.

The video attached to this blog post shows how that might be achieved. This is a modified version of Google CourseBuilder that allows a single Google App Engine to host more than one course. It could easily host a course catalogue from an small institution or medium size faculty. That course catalogue is uploaded via a spreadsheet. Individual courses containing units and lessons are also uploaded via seperate spreadsheets.

Students sign in using their Google ID, Google Apps for Education ID, or OpenID. They then register with the the courses they want to take. If you want to give it a try there is a App Engine Instance running at http://cbmultidemo.appspot.com/, bear in mind its a free instance so may become unavailable.

At the moment the administrative interface is very basic, but the intention is to build that up to allow courses to be created without needing to resort to technical resources. So far I have spent about 4h eliminating most of the code base editing and adding multi course capability. The code base is available as a fork of the Google CourseBuilder project and can be deployed by anyone with a Google ID. Since the original code was written in Python, using a modern variant of the GAE framework porting to Django would be trivial  with those who have concern about running on Google infrastructure. Obviously in doing so, you will have to work out how to do the scaling, see Instagram for pointers on that.





PyOAE

10 04 2012

For those that have been watching my G+ feed you will have noticed some videos being posted. Those vidoes are of the OAE 1.2 UI running on a server developed in Python using DJango. I am getting a increasing stream of questions about what it is, hence this blog post.

What is PyOAE?

PyOAE is a re-implementation of the OAE server using DJango and a fully relational database schema. I use PostgeSQL and/or SqlLite3 but it would probably wok on any RDBMS supported by DJango. The implementation uses the OAE 1.2.0 UI code (the yet to be released 1.2.0 branch) as its specification and runs that UI unmodified.

Is it a port of Nakamura to Python?

It is not a port of the Java code base called Nakamura that is the official server of OAE and shares no common code or concepts. It does not contain the SparseMap content system or a Python port of SparseMap.

When will it be released ?

It is not, yet, functionally complete and so is not ready for release. When it is, I will release it.

Will it scale ?

The short answer is I don’t have enough evidence to know at this stage. I have some circumstantial evidence and some hard evidence that suggests that there is no reason why a fully RDBMS model using DJango might not scale.

Circumstantial: DJango follows the same architectural model used by many LAMP based applications. Those have been shown to be amenable to scaling. WordPress, Wikipedia etc. In the educational space MoodleRooms scaled Moodle to 1M concurrent users with the help of MySQL and others.

Circumstantial: Sakai CLE uses relational storage and scales to the levels required to support the institutions that want to use it.

Circumstantial: DJango has been used for some large, high traffic websites.

Hard: I have loaded the database with 1M users and the response time shows no increase wrt to the number of users.

Hard: I have loaded the database with 50K users and 300K messages with no sign of increasing response time.

Hard: I have loaded the content store with 100K users and 200K content items and seen no increase in response time.

However, I have not load tested with 1000’s of concurrent users.

Will it support multi tenancy ?

Short answer is yes, in the same way that WordPress supports multi tenancy, but with the potentially to support it in other ways.

Is it complex to deploy ?

No, its simple and uses any of the deployment mechanisms recommended by DJango.

Does it use Solr?

At the moment it does not, although I havent implemented free text searching on the bodies of uploaded content. All queries are relational SQL generated by DJango’s ORM framework.

What areas does it cover?

Users, Groups, Content, Messaging, Activities, Connections are currently implemented and tested. Content Authoring  is partially supported and I have yet to look at supporting World. Its about 7K lines of python including comments covering about 160 REST endpoints with all UI content being served directly from disk. There are about 40 tables in the database.

What areas have you made different ?

Other than the obvious use of an RDBMS for storage there are 2 major differences that jumps out. The content system is single instance. ie if you upload the same content item 100 times, its only stored once and referenced 100 times. The second is that the usernames are not visible in any http response from the server making it impossible to harvest usernames from an OAE instance. This version uses FERPA safe opaque IDs. Due to some hard coded parts of the UI I have not been able to do the same for Group names which can be harvested.

Could a NoSQL store be used ?

If DJango has an ORM adapter for the NoSQL store, then yes although I havent tried, and I am not convinced its necessary for the volume of records I am seeing.

Could the same be done in Java?

Probably, but the pace of development would be considerably less (DJango development follows a edit-save-refresh pattern and requires far fewer lines of code than Java). Also, a mature framework that had solved all the problems DJango has addressed would be needed to avoid the curse of all Java developers, the temptation to write their own framework.

Who is developing it?

I have been developing it in my spare time over the past 6 weeks. It represents abut 10 days work so far.

Why did you start doing it?

In January 2012 I heard that several people whose opinion I respected had said Nakamura should be scrapped and re-written in favour of a fully relational model I wanted to find out if they were correct. So far I have found nothing that says they are wrong. My only regret is that I didn’t hear them earlier and I wish I had tried this in October 2010.

Why DJango ?

I didn’t want to write yet another framework, and I don’t have enough free time to do this in Java. DJango has proved to have everything I need, and Python has proved to be quick to develop and plenty fast enough to run OAE.

Didn’t you try GAE ?

I did try writing a Google App Engine backend for OAE, however I sound found two problems. The UI requirements for counting and querying relating to Groups were incompatible with BigStore. I also calculated a GAE hosted instance would rapidly breach the free GAE hosting limits due to the volume of UI requests. Those two factors combined to cause me to abandon a GAE based backend.