21 March, 2005

Web2: hello, world

In June of 2003, getting on for two years ago, I compiled my third-year project at Warwick university. It was an insanely ambitious, typically over-the-top scheme, predicated on an absurdly vain notion: the web is all wrong, and I know how to fix it. Oddly enough, since then, everybody seems to have been agreeing with me* -- at least the first part.

WebTorrent/Coral

The first vindication of my designs for a new web architecture was Coral. In my final report (published June 2003), I said:

One of the problems with the nature of the web is that while the web itself is distributed, individual web sites are not: popular sites are subject to the Slashdot effect and other major events which cause flash crowds and overload the server. ... [The problem] is simply that the client-server model is not adequate for such a highly volatile environment as the web. Enter WebTorrent. A computer with WebTorrent will access websites by their ordinary URLs. If a user wants a web page that no one has downloaded recently, it is downloaded directly. If lots of users are viewing a website, however, the WebTorrent clients are able to discover each other and download the site from each other using P2P methods. The result is distributed web caching. Particularly interesting is the fact that since this model requires no changes to the website itself, it is not a method of distributing a single website, but it simultaneously solves the same problem for the entire web. The more WebTorrent clients exist on any network, the more useful and robust the network becomes.

Coral first came to my attention in August 2004. From their website:

CoralCDN is a decentralized, self-organizing, peer-to-peer web-content distribution network. CoralCDN leverages the aggregate bandwidth of volunteers running the software to absorb and dissipate most of the traffic for web sites using the system. In so doing, CoralCDN replicates content in proportion to the content's popularity, regardless of the publisher's resources---in effect democratizing content publication.

Coral is an implementation of WebTorrent, built by people much cleverer than me. Instead of distributed proxies, they decided on a single proxy at a central point -- a design flaw, in my opinion, which has decreased the usefulness and the popularity of the service, since the proxy itself can be (and was) itself Slashdotted by sudden loads.

Strike 2: Ajax

Now, with all the hype around BitTorrent, it's easy to say that WebTorrent was an obvious idea. And that's what I said. But recently something else has happened, and taken the web development community by storm: Ajax. It is a buzzword, coined in the aforelinked article, apparently derived from "Asynchronous Javascript And XML", but really anything that sounded cool would have done. I've not seen a concise definition of Ajax before, but let me have a go:

Ajax is a set of technologies which allow the construction of web applications that are lightweight and responsive by using Javascript and XML processing to efficiently communicate with the server and modify the interface appropriately, instead of relying on the server to regenerate the entire interface every time a server-side action is taken.

Oh well, not very concise, but it'll do. Why is Ajax useful? Because it is circumventing the document-centric nature of the web. What is the document-centric nature of the web? Again from my report:

HTML has a problem in that it is document-centric. ... HTML is designed for the express purpose of describing a single document: it is structured to define the head and body of the document, and sections within the document. ... This document-centric approach produces, in even fairly small sites, a large amount of repetition. As individual websites grow, the task of maintaining a coherent "look and feel" across hundreds of similar pages manually becomes laborious and time-consuming. To solve this problem, an entire industry has sprung up: the field of Content Management Systems (CMS) ....

And why is this a problem when designing a web application? As I said in the report:

The web is not designed to provide the level of user experience required for desktop-like applications: these require almost immediate response to user action. This is another effect of the document-centric nature of HTML. Applications generally provide a largely stable user interface with minor changes as a result of user action. HTML is not equipped to provide incremental changes to an interface; to change an HTML page it is necessary to re-describe (and hence re-transmit) the entire HTML document. This is both slow and inefficient.

Or as the coiners of Ajax put it, 20 month later:

Obviously, if we were designing the Web from scratch for applications, we wouldn’t make users wait around. Once an interface is loaded, why should the user interaction come to a halt every time the application needs something from the server? In fact, why should the user see the application go to the server at all?

So, you can see why I'm beginning to panic. I'm sitting on a report full of fabulous ideas, but I'm not implementing them, and other people are finally getting around to it. So it's time to take action. It has to be now: the technology is finally here in the browsers, the web development landscape is finally beginning to change with Ajax. A bucketful of technologies have been developed since I wrote my original report, which have done a lot of the heavy lifting for me. So I've taken it upon myself to start implementing Web2. I'm doing it because it's useful, I'm doing it so I get the credit, but most of all I'm publishing it, in the most public forum that will have me. The alternative is, after all, to be damned. I'm establishing prior art: no one (I'm looking at you and you, here) is going to patent ideas I came up with years ago and prevent others from using them. It's public domain and/or bust.

Web2, say Hello

So hop on over to that URL to see an introduction to the very first part of one section of web2, which I implemented this weekend.

What I got wrong

Of course, I wasn't bang on the money with web2. Amongst other things, I recognized that the only way to make web applications responsive was to improve the client. However, the way I envisaged this was by creating a browser plugin, or indeed an entirely new browser. What the coiners of Ajax (and those clever people at Google, Flickr, et al, who actually implemented the stuff those guys were talking about) realised is that the client can improve itself: JavaScript can be used to build the client-side engine, without needing to load any software.

This wasn't obvious, for a number of reasons:

Javascript is extremely buggy and notoriously browser-specific. This is still true, of course. Javascript is an awful abortion of a language, that routinely requires rewriting the same function several times over in different object models in order to get even the simplest actions to work. Luckily, with ECMAScript and the DOM, things are getting better.
Large JavaScript applications are inefficient. As usual, the power of Moore's law catches us by surprise. 2 years ago even relatively simple JavaScript was still slow, even on current machines. Now machines are so pointlessly overpowered that processor is not even an issue.
The XMLHttpRequest object. Frankly, if I'd known it existed, I'd have started work on this much sooner. Better late than never.

For the remainder of the blurb on web2, including what it does, what it will do, and what else I have planned, head on over to the web2 site now.

* Yes, I'm aware this all reads a lot like the ramblings of an arrogant child prodigy. Hopefully, it'll make everybody respect me as much as they do him. More so, since I'm older, dammit.