Why I'm not sold on RSS

I don't know if I'm the only one but I've just never gotten on with RSS (under the umbrella of which I include Atom too). Nothing I've read about it resolves these open questions:

  • What is RSS for?
  • Why is RSS the best way to do... whatever it is that it's for?

I think that RSS's history lends credibility to the fact that nobody really has the answers to those questions.

I discussed this recently with Dee on IRC, and I think we both started to understand one another's views on the subject. He said he was a fan because it allowed him to set up notifications when websites were changed, and since then I've been using it for the same purpose. I added a couple of feeds to Thunderbird and yes, I can now easily see when a website has new content.But that hasn't really address the difficulties I have with the concept:

Linear view of a complex data structure

One of the strangest things about RSS is that it's been shoehorned into a variety of applications where it isn't ideal. Most dynamic sites will have a very rich structure and RSS is merely one projection of that structure onto a sequence. It's usually chronological, but it's always inflexible.

You are robbed of the richness of structure that the web interface provides. It's possible that the web author has gone to great lengths to provide a user-friendly way to navigate around the site and you're missing it by viewing merely a sequence of excerpts.

All blogs have a list of posts which correspond to the RSS feed. The problem is that there's usually much more on the page that you will miss out on.

Suppose I blog about cocktails (Cocktails are a really good example, that I spotted in Sean Kelly's screencasts about Plone. Cocktails are colourful, visual and rich in content: they have histories and ingredients). Maybe in the sidebar of each post I have a widget that links to other cocktails - a random cocktail, and "if you like this cocktail, you may enjoy these". Suppose I allow the posts to be filtered by what ingredients are available, and also by category/tag. The webpage also lets you sort the list in ways other than chronologically - by votes or by alcohol content. I've also got AJAX which pulls a little glossary popup on clicking arbitary terms. All very slick, integrated and non-linear. It's still a blog because the front page shows the most recently added articles. But an RSS feed of the blog wouldn't expose any of the richness. And surely that richness is what makes the difference between a brilliant site and Yet Another Blog?

To an extent there's a conscious choice that you might be cutting yourself off from that functionality by choosing to read it using RSS, but that doesn't detract from the argument that RSS isn't particularly appropriate.

A simpler example - a photoblog. Each post has a mini-gallery attached. The RSS feed can't (as standard) describe a gallery as nested within the articles - it can only provide the XHTML markup to describe how it looks.

Wordpress has to provide two feeds: one for posts and one for comments.

As I understand it, W3C's RDF format can describe these kinds of data structures and that seems much more appropriate. The argument that RSS is 'Really Simple' is nonsense: RSS is universally generated and consumed by software, not humans, so the complexity of the description is irrelevant to users.

Nobody knows what RSS is for

RSS isn't for anything specific. People use it in different ways. I recently looked through a list of aggregators on Windows, and I tried half a dozen, which lets me be somewhat authoritative on the subject. The aggregators vary between software which merely pops up a notification when there's new content, and full-blown power tools for notifying, merging and reading dozens of feeds. There is almost no commonality among feature sets. Firefox's RSS folders don't notify, they just silently list items. And then there's Planet, which creates a webpage by merging feeds.

This makes it difficult, as far as I'm concerned, to conceptualize uses for RSS. If authors don't know what their audience would like to do, then how can they know that RSS is providing the capabilities they want to provide? And so I suspect many authors just blindly provide a basic RSS feed, in case visitors want to do something with it.

RSS nominally does syndication - ie. it can allow publishers to collect and republish new articles, or adverts from other sources. This requires metadata, and to provide the requisite metadata requires some knowledge of the problem domain - knowledge which blog maintainers users don't have and which RSS doesn't encapsulate (although DC does).

More importantly, what does RSS do for me that I can't do without a new format? I can already poll for updates on a page by using an HTTP query like

HEAD / HTTP/1.0

If-Modified-Since: time-I-last-checked

and waiting for 200 Ok rather than 304 Not Modified.

If I was intending to improve on such a scheme I would think about using notification rather than polling. Or simply notify that there's something to poll.

Non-normalized XML

RSS is bad XML. It's rather ambivalent about using namespaces, allowing namespaces for DC and others, but the main problem is that it doesn't use namespaces to embed HTML content (and in most versions, doesn't provide a default namespace). Instead it usually (I think it's possible to do the right thing, but it doesn't appear that this happens in the wild) embeds the HTML and escapes it as CDATA either with an explicit CDATA section or character entities.

This is undeniably bad form. It means that to parse RSS you need not only an XML parser but a tagsoup SGML parser. And all the functionality that has been built for XML is lost. Validation, transformation, query, character-set awareness. Embedding structured data within XML CDATA is equivalent to storing non-normalized data in a relational database.

The argument that "well, we need to put HTML in there if that's what people are blogging in" doesn't hold water. The requirements of the format must be defined by what the consumers of the feed need. Data consumers want to work with a known data type, not whatever language I happen to be blogging in.

Atom is better XML but still allows for mixed content formats.

Comments

Comments powered by Disqus