Introducing Cascading Semantic Descriptions
Cascading Semantic Descriptions, or CSD, are my idea for a new way of expressing microformats. In my last post I talked about what was good about microformats and what was bad. Now I'm going to put forward my suggestions for how to fix them, and in the process make them a whole lot more flexible, useful, and powerful. Remember: the problem microformats are trying to solve is "how do we add semantic information to web pages?"
Semantic information is web metadata; it should act like it
Semantic information is a type of metadata: information about information. However, HTML has lots of other types of metadata already: in the HEAD of any HTML document you can have the META tag which can contain the information itself (e.g. keywords and descriptions) or you can have a LINK tag which relates the document to other documents, such as RSS feeds, or CSS. I think the most interesting example here is CSS, which is literally a document full of more metadata, specifically data about how the contents of the document should be rendered by the browser, visually or otherwise. One could argue that JavaScript is also a type of metadata, describing how the document should respond to user action.
There are a few important things to note about how existing metadata formats work on the web:
- They are separate from the data itself, either in the HEAD or another document entirely
- They are not HTML in nature. HTML is used to relate the document to its metadata, but the metadata is not itself HTML.
- They are progressive enhancements, layering additional complexity and functionality onto the core document without significantly altering the form.
Thus microformats are an "unweblike" type of metadata in that they are none of these things: they are embedded into the content, they are arguably part of the document's HTML, and they necessarily alter the form and structure of the document -- not necessarily visually, but you have to alter your code for it to become a microformat. This already suggests microformats need to be reformulated.
Why be weblike?
There are a bunch of good arguments for maintaining the principles of web metadata I mentioned above:
- Separate metadata is easily ignored, meaning it is more likely to be backward compatible. This is a key aspect of progressive enhancement in general.
- A domain-specific syntax means metadata can be efficiently expressed. XSLT is the ultimate example of expressing a good idea in an unsuitable syntax.
- There is less chance of technology conflict. If two technologies came along that both required rigid class name definitions as microformats do, it is quite possible they would conflict.
- Keeping machine-readable metadata links in the document head instead of the body means they are also easily discovered and efficiently indexable. This is a key feature that microformats currently lack.
- Technologies with a small in-document footprint are more easily retrofitted into existing systems. If you have a huge and costly CMS, the prospect of modifying all your markup and thus probably all your CSS to accommodate microformats is prohibitively costly. This needs to be overcome.
Furthermore, there is an excellent counter-example for the current formulation of microformats: presentational markup. Back in the 90s, we added tags like FONT and attributes like BGCOLOR to HTML. This solved the immediate problem but as pages grew more complex it created more: bulky markup, laborious maintenance, and an unpleasant mixing of content and presentation which made specialized web jobs (editor vs. designer) difficult.
Microformats need scalability
Microformats currently have the same problems, for the same reason: their creators are thinking primarily in terms of one or two microformat implementations on a page of HTML, discovered and used client-side by browser plugins and the like. If one wanted -- as really should be the goal -- to mark up every single piece of content on your page in a semantically meaningful way, layering microfortmats pattern upon pattern, your code structure would become incredibly rigid and the CSS required to arbitrarily display your content progressively more complex.
Two more points against: firstly, a search engine trying to index the entire web for semantic data would have to read your entire page, parse it, and then search it for all known combinations of all known microformats. On the scale of the modern web, that's a gigantic additional cost to the search engines that would hinder adoption. Secondly, in a reasonably large website, the people developing the software that generates markup are probably not going to be the people creating and defining the content of pages: to keep the jobs separate, you need a mechanism to separate semantics from structure, in the way that CSS separates it from presentation.
Goals of CSD
We want to add semantic information to web pages. Our solution needs to be:
- Lightweight
- Simple
- Easily adopted into existing markup
- Elegantly expressed
- Easily parsed
- Efficiently indexed
It should also, as much as possible, build upon all the excellent work that has already been done in defining microformats themselves and formulating existing patterns.
So with that in mind, you should head over to Cascading Semantic Descriptions at Emergent Web to read the draft spec document, learn from the examples, and more as I get around to building it all.

Comments
Ryan Grove
Search engines are only just beginning to understand and make good use of existing semantic data via established microformats. Giving individual content creators the power to extend, remix, and combine these formats in such powerful ways is a boon to creators, but a serious burden on consumers.
It may be worth it, though. When CSS was introduced, it had similar benefits and drawbacks to web developers and browser authors respectively, but the benefits outweighed the drawbacks and it ended up making the web a whole lot awesomer.
Laurie
Steven Bedrick
Secondly, it seems like you may have accidentally stumbled upon a metadata corollary to Greenspun's Tenth Rule (essentially, that any sufficiently complicated C program will eventually come to contain an ad-hoc implementation of something approximating a functional programming language): any sufficiently complex approach to metadata will eventually come to approximate a buggy and incomplete version of RDF/OWL.
The beauty of microformats is their utter simplicity; once you start mixing and matching them in user-described ways, it seems to me that you may as well be using RDF and save yourself a lot of hassle (n.b.: that has *got* to be the first time anybody's ever typed *that* sentence!)
Now, one of the big problems with RDF has always been that there's not really an agreed-upon way to use it with HTML- in-lining RDF statements and using separate RDF sidecar files are two ideas that get around a bit. What I'd really like from CSD- and what CSD seems like it'd be able to provide- is a nice web-friendly way to link elements on my page with an RDF file somewhere- so, in addition to being able to specify which hCalendar field a particular tag represents, also be able to specify an RDF URI for a particular selector. For example, be able to specify that the span containing my name was the dc:author, and so forth. That'd make me very happy.
Anyway, I'll be curious to see what happens with CSD- good work so far! Looking forward to those posts on your scalability ideas.
Laurie
Glad you like the idea. Technically I'm a computer scientist, what with the degree and all :-) I don't see *too* much potential for computationally-expensive operations. In terms of loops and recursion, it would be technically possible to define a compound microformat which included another microformat which defined the first, but since that would have to be done at the spec level, not at the CSD level, it would be easier to spot and account for.
CSD does not provide a mechanism for defining new microformats -- there would be no point, since nobody would know what they meant. It is for mapping existing known microformats to existing HTML. For the same reason, I wouldn't include RDF ability here. RDF is for defining arbitrary semantic relationships, which CSD definitely does not want to do.
Personally, I think our computers are just not smart enough (yet) for the spontaneous AI that RDF seems to rely upon to become useful to appear any time soon.
thomblake
Laurie