I'm in the midst of developing a real-time publishing system, and I wanted to give an example of what it is and why I want to do it.

Gizmodo's live blog

Today, Apple is promoting a live press event, and Gizmodo is live blogging their coverage on http://live.gizmodo.com/.  It's a really basic page, but it updates automatically as new changes are available.  Their update engine is also pretty basic, but illustrates exactly what I want to do ... just in a very inefficient fashion.

The core of their code (I'm summarizing it) is written in JavaScript

[cc lang=javascript width="580" height="500"]
function update( version ) {
var entries = jQuery('#entries .entry');
var lastPostId = entries.size() ? parseInt( jQuery('#entries .entry').eq(0).attr( 'id' ).substr( 6 ), 10 ) : 0;

if ( typeof lastPostId == 'number' && !isNaN( lastPostId ) ) {
jQuery.get( version + 'update_' + lastPostId + '.html', function( data ) {

if ( typeof data != 'undefined' && data != '' ) {

// Process data

clearTimeout( timeoutId );
timeoutId = setTimeout( function() { update() }, 10 );

}

} );
}
timeoutId = setTimeout( function() { update() }, 10000 );
}
[/cc]

How this works

Essentially, this script will automatically fetch "update_XXX.htm" from the server every so often. XXX in this case is the update number (they were on update 408 at the time of this writing).

[caption id="attachment_3504" align="alignright" width="221" caption="Gizmodo's AJAX polling method results in repeated 404 Not Found errors."][/caption]

If the file exists, the script processes the new data and adds it to the page and fetches the next update (XXX+1) just 10 milliseconds later.

If the file doesn't exist, the script waits for 10 seconds and tries again.

This is an example of AJAX polling - The page uses AJAX to check with the server every now and then and ask if anything's changed. It's a simple way to retrieve data, but is ineffective in many regards.

For one, the browser is firing a new HTTP GET request once every 10 seconds, whether the server has changed or not. This leads to a lot of 404 Not Found errors if the update hasn't happened. It's very much like a kid in the back seat asking "are we there yet" every mile hoping you've finally made it to McDonalds.

It works, but it's not the best way to do things.

The system I'm currently working on will use server-triggered events to update data instead. Think of it like the too-mature kid sitting in the back reading a book and waiting until you've stopped the car and are ready to order your Big Mac. Not only is the kid less annoying, but you have fewer distractions to worry about while you're driving.

The problem with polling

With AJAX polling, every browser is requesting a file from the server every 10 seconds while the user sits on a page. For a site like this one, that's maybe 100-150 requests every 10 seconds in addition to normal traffic. For a site like Gizmodo, that's several thousand requests every 10 seconds in addition to normal traffic.

When you imagine the single "are we there yet" scenario, it's not too bad. Now imagine you're driving a bus full of kids, all asking "are we there yet" every few seconds. Now imagine an entire school full of impatient kids. An entire city full of them.

Quickly, this becomes overwhelming.

Solution: Server-sent events

The alternative is to leave the updating entirely to the server. When new content's ready, it tells the browsers directly. If no content has changed, no extra HTTP messages travel across the wire. Now the server is able to work on other tasks and can serve regular traffic without a chorus of impatient browsers asking if anything's changed.

The entire city full of kids waits for you to tell them that you're there. Not a single "are we there yet" in earshot to interrupt the pleasant music coming across the radio.

Tell me, which over-crowded car would you want to drive? I bet your server feels the same way.