Friendfeed - nice service. I pulled this Atom Entry out of my Friendfeed feed:
<entry>
<title type="text">Free as in electricity (via Blog)</title>
<updated>2008-07-12T15:33:12Z</updated>
<published>2008-07-12T15:33:12Z</published>
<id>tag:friendfeed.com,2007:1d0c3524-1c88-d01e-5864-21953bef1ca1</id>
<link href="http://www.dehora.net/journal/2008/07/12/free-as-in-electricity/"
rel="alternate" type="text/html"/>
<content type="xhtml" xml:base="http://friendfeed.com/">
<div xmlns="http://www.w3.org/1999/xhtml"
style="font-size:10pt;font-family:Arial,sans-serif;color:#222222">
...
</div>
</content>
</entry>
Here's the original Entry from my feed:
<entry>
<id>http://id.dehora.net/entry/2008/07/12/ea18ed947f5ba3e06ff02528f7e6b150</id>
<title>Free as in electricity</title>
<updated>2008-07-12T15:59:34Z</updated>
<author>
<name>Bill de hÓra</name>
</author>
<link href="http://www.dehora.net/journal/2008/07/12/free-as-in-electricity/" rel="alternate"></link>
<content type="html"><p> ... </p> </content>
</entry>
Some observations:
- The Friendfeed updated value happened a few minutes before mine did. It doesn't seem to be a multiple of anything, so let's assume it's a clock skew. What's interesting there is that Friendfeed's Entry could show up after mine in an aggregated feed.
- The Friendfeed Entry doesn't have an author element. The enclosing feed did have one, and the author is "Friendfeed", which means the author of that Entry is "Friendfeed", not me. I guess that makes sense. Or maybe it doesn't.
- Friendfeed added their own id. It's a Tag URI that has a uuid embedded. This makes me think their backend is decentralised or at least has the potential to be. The date part of the tag appears to be hardcoded to 2007.
- Friendfeed munged the original title with a suffix. They added a type of text/html as well. I wonder if that was derived or assumed. And I wonder what would happen if the source type was set to say, xhtml. In any case what's nice about my use of Atom here is that Atom Text constructs have well-defined escaping rules - unlike RSS2.0, there's no need to second guess the upstream provider (unless it's borken).
- Friendfeed have copied the link@rel=alternate across. They've added a type of text/html. Again I wonder if that was derived or assumed. It would be interesting to have a JSON alternate.
- Friendfeed have written out their own content. That's fine; what's notable is not providing a summary of the original. Or maybe that's because I don't server summaries, just full content.
- Friendfeed use xml:base. It would be interesting to see if what would happen if relative URLs were in the source title, which seems to have been copied into their content.
- Friendfeed are not proxying or rewriting upstream links. A naive system would do something like that for tracking reasons, and eventually melt down under load. The only interesting redirect walling they perform is to route the "link to this entry option" on the site back to their permalink and not mine. The slug for the permalink is the same as the uuid in the tag URI.
But for a syndication wonk, by far the most interesting thing is that Friendfeed aren't using the Atom source element. The feed could have looked like this:
<entry>
<title type="text">Free as in electricity (via Blog)</title>
<updated>2008-07-12T15:33:12Z</updated>
<published>2008-07-12T15:33:12Z</published>
<id>tag:friendfeed.com,2007:1d0c3524-1c88-d01e-5864-21953bef1ca1</id>
<link href="http://www.dehora.net/journal/2008/07/12/free-as-in-electricity/"
rel="alternate" type="text/html"/>
<content type="xhtml" xml:base="http://friendfeed.com/">
<div xmlns="http://www.w3.org/1999/xhtml"
style="font-size:10pt;font-family:Arial,sans-serif;color:#222222">
...
</div>
</content>
<source>
<id>http://dehora.net/journal/atom.xml</id>
<author>
<name>Bill de hÓra</name>
</author>
</source>
</entry>
Not many people seem to use Atom source. It can contain the whole original entry, but the most useful parts to pass along are the id and the updated elements. Why? Well, aggregators and clients can immediately pick those out, tell you if you've smelt what the Friendfeed is cooking and mark it as read. In this particular case the fact that Friendfeed's outer updated date is ahead of the source might confuse some code, depending on the implementation logic. Possibly they should consider treating updated as an offset to the original Entry timestamp. It might not be precise relative to a particular server/system but globally it will be more accurate and the partial ordering would be preserved. You could also put the source author in there as well ;)
Wrong. Oops, big mistake. The atom:source copies over Feed metadata, not Entry metadata. The author element could be copied across in it was in the feed, but there's no way I can see to preserve the original id/updated pair if they are rewritten by an aggregator. Ben Darnell commented: "The spec's answer to determining whether you've seen an item in another feed is that atom:ids should be globally unique, so equal atom:ids are proof that two entries are the same. However, many aggregators don't trust the web to use ids correctly and end up generating their own." So I guess this means there's no way to track an Entry that has its id rewritten except through eyeballing feeds and searching for the URLs of the original atom:links. This feels like a gap in Atom spec maybe?
13 Comments
Actually, that's not quite how atom:source works. It's supposed to be used to copy feed-level metadata along with a copied entry, so the correct source here would be <source><id>http://www.dehora.net/journal/</id><title>Bill de hOra</title>...</source>. There's no way to indicate differences between a copied entry and the original, which is why Google Reader had to use an extension attribute gr:original-id for this purpose (http://www.tbray.org/ongoing/When/200...).
The spec's answer to determining whether you've seen an item in another feed is that atom:ids should be globally unique, so equal atom:ids are proof that two entries are the same. However, many aggregators don't trust the web to use ids correctly and end up generating their own.
From http://www.atomenabled.org/developers... :
<blockquote>When an Atom Document is relocated, migrated, syndicated, republished, exported, or imported, the content of its atom:id element MUST NOT change. Put another way, an atom:id element pertains to all instantiations of a particular Atom entry or feed; revisions retain the same content in their atom:id elements. It is suggested that the atom:id element be stored along with the associated resource.</blockquote>
How is that a gap in the spec? We put IDs into Atom for two reasons: because URI equivalence is Hard with a capital H, and because IDs should be eternally static and globally unique so they would, y’know, *identify* entries, even if they came from an intermediary or after someone had switched weblog software or moved hosts or whatever.
If aggregators don’t trust publishers to mint IDs correctly, what makes you think they will trust publishers to correctly assign any other property of the entry intended for the same purpose?
Aristotle: "and because IDs should be eternally static and globally unique so they would, y’know, *identify* entries"
But we didn't mandate a technique to create them.
"If aggregators don’t trust publishers to mint IDs correctly, what makes you think they will trust publishers to correctly assign any other property of the entry intended for the same purpose?"
ids by their very nature are different to the other elements.
The point is that there's no place to carry them forward.
Bill: "The point is that there's no place to carry them forward."
Yes there is, in the id element.
If you don't do that, and the spec says you must, then you have created a problem.
Sam: "In the id element.
If you don't do that, and the spec says you must, then you have created a problem"
You're assuming these are the same Entry - arguably they're not. But *by definition*, the friendfeed Entry isn't the Entry I served. So, once you overwrite the Entry id you can't track - that's the point - there's no provenance, no natural key equivalent of "via".
Assaf: "I think this would be more interesting if we looked at the use case"
For me it would be - this entry is about that entry. It's suspiciously like a classic RDF use case; or maybe we can define a new rel value - however, via seems wrong here.
<a href="http://intertwingly.net/blog/2007/09/13/Planet-Pruning">Minor deviation</a> into how Google Reader treats <id/> elements, and no, I don't think Google have altered the behaviour since this was first raised.
nm0ZlJ gjiodsjg iosdjgoijsdo sdjo gjsdigj osdjgi dsog jidjg iosjgiodj giojsdigjdjsog
Gjchmq <a href="http://lptlevmmagmd.com/">lptlevmmagmd</a>, [url=http://oddusaagtubx.com/]oddusaagtubx[/url], [link=http://qkygyerbaebr.com/]qkygyerbaebr[/link], http://ikrbwdwxocjh.com/
Hni3M6 <a href="http://lfuexwvukjnw.com/">lfuexwvukjnw</a>, [url=http://cspmaqsdrpmj.com/]cspmaqsdrpmj[/url], [link=http://pcdlhfnaolar.com/]pcdlhfnaolar[/link], http://upnjpvmnxnqg.com/
J6YQLQ <a href="http://zeghkxsvylvd.com/">zeghkxsvylvd</a>, [url=http://voumzbdryzta.com/]voumzbdryzta[/url], [link=http://vziqteyeocfq.com/]vziqteyeocfq[/link], http://iccsnfzozgbb.com/
VymdcT <a href="http://dvgaurjvshtj.com/">dvgaurjvshtj</a>, [url=http://uzkfzpoiyhoi.com/]uzkfzpoiyhoi[/url], [link=http://grladessbgxc.com/]grladessbgxc[/link], http://hdzjttbtpowe.com/