« Open Source views | Main | RSS Bandit gets ambitious »

Atom/RSS: relating entries and feeds

I threw an XSLT stylesheet together that maps Atom onto RDF triples. It's still an alpha, but it's producing decent information. Here are some thoughts that came out of that exercise.

Feeds and Entries

One decision to make was how to relate an entry to its feed - after all, the point of having RDF is to have a graph relating all the information items to each other and having a feed subgraph detached from the entry graphs isn't that useful!

It was interesting then to find out that Atom doesn't relate an entry to a feed. It was even more interesting to find out that none of the RSS formats do this. What they do is relate a feed to an entry. This is implicit in the XML document structure - an entry is a child of the RSS feed so we can assume it belongs to that feed. In the case of RSS1.0, a feed has to explicity state its entries using RDF constructs. On the other hand the feed document structure won't be always be around, as we'll see shortly.

Composite Feeds

There is an increasingly common syndication use case called "composite feed" (aka synthetic feed). This is a feed made up of entries from other feeds. Bob Wyman et al's pubsub service relies on composite feeds, as does javablogs, java.net and a lot of others. As RSS/Atom usage grows, this kind of feed is bound to become more common. In terms of filtering, theming and aggregating likewise content, you could make a non-specious argument that composite feeds are potentially more valuable than individual ones. Aggregator authors in principle could also find this relationship useful to avoid displaying duplicate entries (atom:id can also be leveraged for this purpose). However none of the current RSS formats will support this use case - you have to infer or guess the source from the entry URI or hope you can introspect the derefenced entry's entity body for a feed URI.

Relating feeds and entries

Here's a picture of the RSS1.0 relationship:

FeedHasEntry.png

Here, a feed points to its child entries (rss:items). This is useful, but does not cover off the composite feed case where you want a detached entry to point to its origin feed.

At the moment the XSLT I wrote inserts an atom:feed tag into each atom:entry. Here's a sample taken from the output:

  <atom:entry  
rdf:about="http://www.dehora.net/journal/2004/05/mt3">
    <atom:feed rdf:resource="http://www.dehora.net/journal/"/>
    <atom:title>MT3: are you not entertained?</atom:title>
    <atom:link>
      <rdf:Description 
       rdf:about="http://www.dehora.net/journal/2004/05/mt3" 
       atom:rel="alternate" 
       atom:type="text/html" 
       atom:href="http://www.dehora.net/journal/2004/05/mt3"/>
    </atom:link>
    <atom:modified>2004-05-21T20:57:11Z</atom:modified>
    <atom:issued>2004-05-21T20:57:11+00:00</atom:issued>
    <atom:id rdf:resource="http://www.dehora.net/journal/2004/05/mt3"/>
    <atom:created>2004-05-21T20:57:11Z</atom:created>
    <atom:summary>foo</atom:summary>
    <atom:author>
      <rdf:Description rdf:about="mailto:bill@dehora.net">
        <atom:name>dehora</atom:name>
        <atom:url rdf:resource="http://www.dehora.net/journal"/>
        <atom:email>bill@dehora.net</atom:email>
      </rdf:Description>
    </atom:author>
    <dc:subject xmlns="http://purl.org/atom/ns#"></dc:subject>
    <atom:content atom:type="text/html" atom:mode="escaped"/>
  </atom:entry>

The picture of that relationship looks like this:

FeedHasEntryHasFeed.png

Which implies that you can find your way back to the origin feed when the entry is detached from it. [By the way, there is a problem with applying this inference in the general case - kudos to anyone that spots it.]

Entry uber alles

Over the last month there's been some discussions in the Atom community that seem to lean in favour of pushing information down into the entry from the feed.

I suspect that the argument to support composite feed will only continue to grow and by the time Atom gets to 1.0 atom:entry will have to be a first class resource that can be fully detached from its orginating atom:feed while maintaining some kind of link back to that feed. Much the same can be said for any RSS format.

For Atom, one option is to place atom:feed inside atom:entry rather make the inference as I did. The more I think about it the more I think it's needed and I hope to put a proposal together for Atom soon. With it you don't need to make hazy guesses or embed RDF/XML constructs into the markup.

Ephemera

Other things have come up from this exercise. The conversion into RDF of atom:link is clearly a mess:

  <atom:entry rdf:about="http://www.dehora.net/journal/2004/05/mt3?id">
   ...
    <atom:link>
      <rdf:Description
        rdf:about="http://www.dehora.net/journal/2004/05/mt3?id"
        atom:rel="alternate" 
        atom:type="text/html"
        atom:href="http://www.dehora.net/journal/2004/05/mt3"/>
    </atom:link>
    ...
  </atom:entry>

but it's hard to know whether this is result of atom:link being something of a woolly construct or RDF/XML tag noise - I suspect it's a bit of both. Also there's more going in the RDF /XML for author than I would like; again this may have something to do with how authors are modelled in Atom.


June 11, 2004 12:08 PM

Comments

Henry Story
(June 15, 2004 05:27 PM #)

I agree that the case for having the link from the entry to the feed is a good one. But it is not clear that one needs a bidirectional link. After all one is the inverse of the other. And in RDF everything is just a triple. So when you have one, you have the other too.

Henry Story
(June 15, 2004 08:21 PM #)

On the bidirectional link, I see the point of them now, if one wants to have feeds point to entries directly.

On the other hand, can one perhaps not get everything one wants by having entries point to entries, as described here:
http://www.imc.org/atom-syntax/mail-archive/msg04771.html

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1322

Listed below are links to weblogs that reference Atom/RSS: relating entries and feeds:

» Relating Atom entries and feeds from phil wilson
This is a very interesting area to me at the moment, and it seems strange that an explicit child -> parent relationship hasn't previously been defined in the Atom syntax [Read More]

Tracked on June 23, 2004 02:05 PM