Yesterday I had the opportunity to present a talk in Brazil explaining how to use the JavaScript web worker API to crunch data asynchronously.

It was a blast!

Unfortunately, not everyone could be there, so I wanted to give you all a glimpse of the awesomeness that is the worker API.

Initial Idea

In 2011, I attended a fantastic conference in Portland about real-time web communications.  One of my favorite presentations there was by Scott Hanselman about the new SignalR system built by Microsoft.  Together with Paul Batum, Scott demonstrated a fantastic application that used multi-threading on the server and web sockets in the client to stream information about a simulation to multiple clients at once.

The simulation was an attempt to solve the "million monkey problem;" a theory that a million monkeys typing on a million typewriters will eventually produce the works of Shakespeare.

Fast forward a few years and I was invited to speak at jQuery Russia about web workers. Struggling to come up with an applicable implementation of the API, I fell back to Scott's example and built a demonstration of how the million monkey problem could be solved in pure JavaScript.

I moved the asynchronicity to the browser, using multiple web workers to multi-thread the processing layer. At the time, I also wrapped the workers' return events in jQuery events, triggering an event on the document body whenever a worker became available so I could re-assign work.

It functioned, but wasn't nearly as clean as I'd wanted. Luckily, another presenter at jQuery Russia was talking about the Deferred API of jQuery, and absolutely blew my mind.

Reworking a Concept

I went back to the drawing board and completely rebuilt my codebase. Now, I have:

  • The main program script (controls the UI)
  • A taskrunner script (queues web workers, processes data, resolves/notifies a Deferred to communicate with the UI script).
  • A worker script (actually processes the data)

The code is orders of magnitude cleaner than the original, and runs significantly faster as well.  The task runner queues up three web workers (all using the same worker script) and wraps them each in a $.Deferred object. When the workers return data, the task runner is able to resolve the workers' Promise objects, informing other parts of the script that it's ready to proceed.

The task runner itself creates a Deferred object when the main program script calls its [cci]run()[/cci] method. Every time the workers complete a new generation, the task runner will call [cci].notify()[/cci] on its Deferred object so the main script can keep track of what's going on.

Internally, the runner queues up its web workers and dispatches their various tasks.  As each one finished, the runner will check and see if there's any more work to be done - if so, it dispatches a new job. When no work remains, the runner will resolve the Deferred objects wrapping the workers as they return.  When all of the workers' Deferred objects are resolved, the runner either resolves its own Deferred or notifies the UI thread (see above).

The application itself is a study in asynchronous chaining and event management. It was enormously fun to write, and is even more fun to show off to others!

Genetic Algorithms

The biggest question I get when I show this to others, though, isn't about the workers crunching data in the background - it's about the simulation itself. How am I breeding monkeys in my computer?

First off, the monkeys aren't really monkeys, but are objects of type [cci]Shakespeare.Genome[/cci]. The object requires a target text and an actual text for instantiation - the target text is what we want our monkeys to type, the actual text is what their typewriters would have shown after a few seconds of furious keyboard mashing.

Internally, the object also computes the fitness of each genome by taking the sum of the squares of the differences between each character in the target and actual text. A lower fitness score means the monkey is closer to achieving the goal.

The task runner will create a "population" of 200 random "monkeys." It will then order the population based on their fitness and immediately kill off the lower half of the set. Then the task runner will instruct its asynchronous workers to create a new generation of 200 "monkeys" based on the 100 it allowed to breed.

The workers will select two objects at random from the available 100. They will calculate a crossover point between the two - the point on the actual text string where it will slice and swap the strings from one "parent" object to the other. There is an 85% chance of genetic crossover.

The workers will then calculate a mutation for each object - picking a point at random on the text string and either incrementing or decrementing the character value. There is a 50% chance of genetic crossover.

Once the workers have created a new generation of 200 monkeys, the task runner will report back to the UI thread (thanks to its Deferred object's [cci].notify()[/cci] method) the generation count, the rate at which generations are being created, and the fittest Genome object in the set.

Showing Off

If you want to actually look at the code, feel free to check out the repository on GitHub.

You'll need Bower to install the code's dependencies (jQuery, Twitter Boostrap), but everything is relatively self-contained. You'll also need to run the application from within a web server (i.e. Vagrant, XAMPP, or equivalent) - I set mine up to run from [cci]http://monkeys.dev[/cci]. Browsers don't allow web workers to be created using the [cci]file:///[/cci] protocol, so you'll need to be able to load the example from an [cci]http://[/cci] URL.

The code demo will automatically queue a chunk of Hamlet's famous "to be or not to be" soliloquy. On both my Mac and my PC, using all of the major browsers, the code will take < 10 minutes to run to completion (breed a monkey capable of reproducing the soliloquy).

For comparison, a non-worker version of the exact same script takes > 100 minutes to run to completion.

That's a significant difference.

If you want to view my slides from the talk, they'll be imported to seoslides on this site shortly. For now, you can check out the Reveal.js version from GitHub. Again, use Bower to install dependencies and you're good to go.

Breeding digital monkeys is a fun application of workers, but it's not very helpful in day-to-day business. What other applications do you have in mind for web workers?