« Just use Atom | Main | It's not a science project »


Keith Alexander said TheyWorkForYou.com markup "is a dataset that would really benefit from being available as RDF". If you don't know, TheyWorkForYou allows citizens to track their representatives in the UK Parliament, and other assemblies, by making things like the Hansard available online. TheyWorkForYou is a brilliant site, every democracy should have one.

Looking at the markup formats, I can see why Keith is interested in a mapping. It's attribute driven XML, not unlike a specialized OPML. Attribute driven XML tends to be markup that contains properties about a thing - the attribute/values being property value pairs about something, with one attribute usually designating the noun - the element itself tends to describe the class of the thing. In TheyWorkForYou's case that special attribute is called 'id' and you can see how that's a straight mapping onto rdf:about, once you resolve the ids into absolute URLs. The reasons developers tend to like attribute markup is that it can be slightly more convenient to program to initially - it avoids having to deal with significant whitespace, and processing of child elements, as there aren't any.

There are a few issues with attribute based markup however. One is extension; you keep adding attributes and eventually you end up either with co-occurrence constraints between the attributes, ordered evaluation of attributes, or a private subset of XML whose sole constraint is where you can put element content. Another is expression; ultimately how the thing is classified is just another property about the thing, and making the class an element is something of an optimization for a particular domain (something can be classified in many different ways). The markup then ends up sort of "inside out", where the elements are the classification scheme the authors cared about most; the entities tend to get duplicated under different elements. When it coems to integrating this data, you'll probably need to transform "outside in". Another is indexing. For larger datasets, you will need to do work to relate and index the ids (effectively these are joins to answer questions about the entity itself). Yet another issue is textual. Putting descriptions and general text into attributes is awkward; there's a fair bit of OPML out there that does this and it's not pleasant to look at Attributes really aren't designed for carrying text. In any case, these are technical nits about data formatting, and don't really detract from the fact that having this data openly available is wonderful.

So, can RDF help with all this? Absolutely. It's often easier to name things with URLs and and loosely tag them with classifications. This allows federation - others can classify and describe without having write access to the originating system, as each noun will have a global name. Then again, there's a straight mapping onto Atom; using atom:id to name the entities and atom:category for theyworkforyou's classifications and entity types.

Finally, there's TheyWorkForYou's "rest.cgi", an API into the data - long time REST proponents will appreciate naming a CGI 'rest' for all kinds of reasons.

May 27, 2007 12:36 PM


Post a comment

(you may use HTML tags for style)

Remember Me?

Trackback Pings

TrackBack URL for this entry: