25 January, 2009

On Microformats

Back in March last year I declared that the next phase of the web is the emergent web, an accidental explosion of functionality caused when a large number of simple APIs start interacting with each other. At the same time, I declared that semantically marked-up data is impractical. I also had harsh words for microformats. I called them "junk" and "ludicrously inefficient".

But the weird thing is that microformats are still sort of... popular. I mean, not really popular, they don't have mass adoption yet. But nerd-popular. Lots of clever people are talking about them and implementing them. There is some value to be extracted by making the semantic nature of the data we publish on the web explicit, there has to be, or else all these clever people wouldn't be fighting with the frankly inconvenient and ill-defined world of microformats as they currently stand.

So why do people like semantic data? Because semantic data is important. By definition, it's the meaning of the data, the magic that changes raw data into information. That has to be important. So I have to examine myself: if I like semantic data, why do I instinctively recoil from microformats?

The trouble with microformats

The main problem with microformats is that there are not a lot of tools available for interpreting semantic data right now, which is a chicken and egg problem: the lack of tools means nobody marks up their data, and the lack of data means nobody bothers to write any tools, and if we're being honest the lack of practical ideas for what to do with microformatted data, even on microformats.org has probably got something to do with it. Nobody is giving me a right here, right now good reason to build microformats into my website.

There are several secondary problems: since the microformat data is embedded within the body of HTML, a hypothetical microformat-reading tool would have to ingest the entire page and search it for instances of every single known microformat and validate each one. At a small-scale, browser-plugin level that might be practical, but it seriously limits the utility of the data. Each microformat is itself ad-hoc, but once defined they can't really be modified or extended.

Finally, and very importantly, the way microformats use class names is wrong. Not technically wrong: the HTML spec says class names, in addition to being used for CSS selectors, are "for general purpose processing by user agents", which basically means "do what you like". But wrong in a practical sense that they would require us to change the way we use them right now: class names are, in the practical world of web development, the way you link your HTML elements to your CSS. You set them up arbitrarily, and then you build your CSS around them. If you need to change the look of your HTML, you can change the name of the class to suit the new styles you've created.

Microformats as designed break that: by defining meanings for specific class names in specific combinations, they impose a structure on your markup that needs to be known in advance, limiting -- no matter how lightly -- the flexibility with which you can mark up your HTML. Carving out namespaces in class names is also dangerous because they either have to be unique -- and hence not human readable -- or human readable, and hence prone to collision. As a dyed-in-the-wool web developer, having spent 12 years building web pages nearly every day, it just feels wrong to do it that way.

The joy of microformats

The mistake I made in March is to decide that because microformats were wrong, semantic markup was wrong too. Just because microformats are getting it wrong doesn't mean that they aren't a good idea. And there's much to like about them, too: ad-hoc, community-generated and easily extensible; these are great qualities that are very "weblike".

Adding meaning to web pages is also a wonderful idea: you knew what the data meant when you typed it in, so why lose that once it becomes a web page? Tim Berners-Lee's ideal of the web as a primary information store has not come to pass, but that doesn't mean we have to hide all of our semantic relationships in our databases. Exposing them to the world is a good idea, once it can be done cheaply and easily -- something microformats manage -- and consumed equally cheaply and easily, where I believe they currently fail.

This is all by way of preamble. My next post is going to be about my ideas for fixing the problems in microformats.