« Programmers' block | Main | The integrator's dilemma »

Generating feed identifiers can be tricky

There is a bug in Roller whereby the guid of the RSS entry changes if the date changes. It seems to be affecting Javablogs at the moment.

Debates on the means and structure of feed guids have taken up a lot of time on the Atom WG (we call them ids over there); debate has occasionally been heated.

Roller's generator is a good example of what not to do, if the goal is to create a stable identifier. The issue here is that Roller is synthesizing ids from feed data (in this case, a date), and the data is mutable. Unless the generator is one-shot, then the id will change each time the data changes, which is undesirable. Even if the generator is one-shot, you will be left with an id that is dissonant with the data, unless the id obfuscated by something like a hash. The temptation not to use a hash is understandable since the Roller guid is in the form of a URL. However, the key problem is that the Roller id not so much an id, as it is a signature or digest.

Someone who's been tasked with generating globally stable identifiers might frown on the Roller code, but mixing up identifiers with signatures is an easy mistake to make in a web context - there are seemingly contradictory aspects to consider. I also think specifically the case of using a date in stable URL identifiers and then recomputing the id is an easy mistake to make - URL fragments of the form YYYY/MM are a popular Cool URI technique for everything from versioned namespaces, to W3C specifications, to blogs. Using them as source material for ids is understandable.

This also highlights a usability issue with using URLs as identifiers. Cool URIs, according to W3C doctrine, don't change, but Cool URLs are also meant to be comprehensible to human beings. Roller guids meet the latter criteria but not the former. URLs double up as locators (addresses) as well as identifiers. In RSS2.0 this is achieved on the guid element by the 'isPermaLink' attribute, which is telling you the guid can be used as an address (making guid the moral equivalent of a URL).

So, what's the answer? In Roller's case, the first thing to do is decouple id generation from mutable data like dates so as to produce a time-stable identifier. The downside is that this is probably not going to look like a 'Cool URI'.

By the way the current Atom spec (draft-05) text on identity constructs has this to say about stability:

When an Atom document is relocated, migrated, syndicated, republished, exported or imported, the content of its Identity construct MUST NOT change. Put another way, an Identity construct pertains to all instantiations of a particular Atom entry or feed; revisions retain the same content in their Identity constructs.

We hope that's enough to guide developers away from the pitfalls. The problem with being more specific, ie saying "the content of its Identity construct MUST NOT be computed or be sourced from mutable data items" is that spec writers tend to not want to base their specs on what are considered 'implementation details' rather than 'architectural constraints' - altho' implementation details matter a lot in this particular case.


February 13, 2005 03:25 PM

Comments

Phil Ringnalda
(February 13, 2005 06:11 PM #)

Yeah, still waiting to see some sign that all the talk about identity is getting through to implementors. With the current Movable Type default Atom template, an entry changes identity if you change hosts, if you change the year of publication (at least a little better than changing on a changed day), if you change the path to the archives, or if you export and then import (because that doesn't preserve either the blog ID number or the entry ID number). About the only thing that doesn't affect it is a change in title. Though that's nothing compared to the feed/id, since MT feeds completely change identity the first of January every year. Sure hope nobody's planning on actually using identity, one of the main reasons for having Atom, any time soon.

Michael Koziarski
(February 13, 2005 08:24 PM #)

There's no reason that the ID can't be *generated* as it is currently. It just needs to be stored against the entry immediately and never re-generated.

I think the main problem with most blogging software is that it doesn't seem to have the concept of 'external id' for its entries.

Robert Sayre
(February 14, 2005 03:03 AM #)

OMG THE SPEC IS DONE AND THE IMPLEMENTATIONS AREN'T PERFECT.

Identifiers in syndication are doing spectacularly well. You just have to think of it like a house settling into its foundations. Email Message-Id headers work pretty well... after decades of work.

Aleksander Slominski
(February 14, 2005 05:08 AM #)


Actually I would argue that if entry content changes then it means it deserves to get new id but the address (locator) for it should stay unchanged.

And I am rather unhappy with seeing GUID/UUID used for addresses (locators) - this is just plain ugly: ReallyUglyPermalinks

I think that HTTP Redirect should be used more freely to implement easy renaming of entries including their permalinks changes: refactoring-rename and redirections-and-aliases

That should help to avoid problem of FragileLinks