« Jython new style classes | Main | HTTPLR: reliable message delivery over HTTP »

Content trumps Architecture

David Megginson is asking about content in terms of the REST style:

RESTafarians can argue that the lack of content standardization is a good thing, because it leaves the architectural flexible enough to deal with any kind of resource, from an XML file to an image to a video to an HTML page moving the last two using XML-RPC or SOAP can be less than pleasant. On the other hand, the lack of any kind of standard content format makes it hard actually to do anything useful with RESTful resources once youve retrieved them.

Open content has been a social problem, not a technical one. I think the REST style did the right thing under the circumstances in shying away from over-specifying what it calls representations, since there was no chance of obtaining agreement 10 years ago; or 5 years ago; or even last year. I'd speculate that specifying content to the level David is interested in would have hurt adoption as it would be an easy excuse not to use the architecture in question (all aside from the protocols fundamental that data and control are orthogonal).

Today, that punt seems smart. Sjoerd Visscher's comment is an example of this.

You shouldnt use elements from foreign vocabularies like the Dublin Core. There are often subtle semantic differences between the defined meaning of the elements and the way they seem to be applicable in other vocabularies. Those differences tend to reveal themselves only in practice (when the semantics are actually used, still quite rare on the web). Its better to only use your own vocabulary, and then provide a translation (like f.e. XSLT) to the other vocabularies. When the differences show up, you only have to change the translation, not your format.

Now, Sjoerd's a good guy, and there is some truth in this. And while I firmly agree that content transformation is the optimal approach to data integration, it's not hard to see how his observation, that semantic drift can occur through content reuse, could be leveraged as an excuse to maintain more choices that you strictly need. People historically have not agreed on content at the level David is arguing for. In that sense it's not a REST specific problem. To paraphrase Jim Highsmith, content trumps architecture, people trump content, politics trumps people.

This won't always be the case. The plumbing and infrastucture is at the level now where the the obvious bottlenecks will be seem to be content related. Those who are not IT mavens will surely someday run out of patience with RSS v SOAP v XMLRPC v Atom, or with SUO v OWL v Cyc v WSDL. This is what happened to plumbing and infrastructure; it will happen to some extent with content. Don't hold you breath for a perfect language, but do expect content formats to rationalize as some light is shed on the matter.

Another reason to standardize content is to commoditize the very services David mentions - the likes of Google, Amazon, Flickr. Almost all 'Web2.0' services are predicated on data franchises, not the API or platform franchises, as was the case in the 1980s and 1990s. Wanting to make the data liquid and drive value out of Web2.0 businesses to somewhere else will effect content standardization and interoperation.

David mentions some formats and those are worth highlighting:

  • Dublin Core: DC is way useful, but it has the problem of being underspecified and being associated with RDF when too many people thought RDF sucked.
  • xsi:type: I find WXS types overrated for interchange and I tend to agree with the piece David links to by Norm Walsh. In general much the value of XSD is tied up in being able to bind onto programming language type systems; but they mismatch sufficiently badly that the programming languages will probably have to have their types systems coerced to make things effective [1].
  • xlink, xml:id: xlink seems to have languished. I think the W3C was overambitious - instead of focusing on cleaning up href and incementally improving it, there was an overhaul which resulted in ideas like link bases and n-way links, which killed it for practical use. It's too early to say on xml:id, some people seem buzzed about having it in XML, but in the near term I think RSS and Atom linking constructs are much more likely to gain traction.

While RDF got a mention, rdf:type did not. I think rdf:type annotation is more flexible and less brittle over the Internet as it's closer in spirit to Postel's law than xsi:type (not to say xsi:type isn't useful - xsi:type has a very 'middleware' feel to it). A important aspect of RDF is that typing is optional, a hint to processors (and if those are RDF processors they will tend to be robust in that regard).

Overall the content model most likely to become the gold standard in the next few years is Atom, plus a few things that tunnel some extra semantics through, such as InterWiki URLs, a cut-down RDF, or WHAT Web Forms [2].

[1] More likely is that a smaller de-facto set of types that inteoperate will be found due to the efforts of the likes of SOAP Builders

[2] Mark Baker might have something to say about that - RDF Forms

February 24, 2005 01:00 AM


Sjoerd Visscher
(February 24, 2005 06:49 PM #)

What do you mean with "to maintain more choices that you strictly need". I'm arguing to restrict choices. Either you use an existing vocabulary as a whole, or you create a new one. You shouldn't go looking around for using single elements from other vocabularies.

Bill de hOra
(February 24, 2005 08:45 PM #)

Ah, ok, that's not how I read your comment. I'll edit that part. Although I think that if you tramsform between vocabularies you end up maintaining choices (and transformations).

Sjoerd Visscher
(February 24, 2005 10:09 PM #)

Oh, absolutely. But the alternative to maintaining your transformations is to alter the format or the applications. The good part is that a lot more people are available to maintain transformations, than there are to alter formats or applications. My aggregator doesn't support Atom, RSS isn't going to be changed to be Atom compatible, but Atom to RSS transformation services are abundant.

Ken Meltsner
(February 25, 2005 08:24 PM #)

Content (data, metadata, unstructured text) usually lasts longer than the application that created it. This has been one of the driving forces behind relational database systems as separate systems shared by a group of related applications. (And why object persistence using an RDBMS is so hard to get right -- object persistence works "best" when it's tightly coupled to a single application, but databases are most useful when they're usable by more than one application or version.)

The problem is that long-term use requires good definitions for the content's meaning, and that's an extremely difficult task. We used to handle it with human-friendly methods -- the binders full of documentation, or later, tools that extracted and indexed all data references and definitions in a single system -- but now we want our machines to understand it as well. And that ties our definitions to poorly-understood and complicated tools like description logics, ontologies, etc.

If we don't mind that our definitions will require human assistance, straight XML, plus a little XSLT or another magic transformation technology, is fine. If we're really set on machine-friendly definitions, we're stuck with a bunch of research-level problems in knowledge representation techniques. In my opinion, a little bit of human effort is a lot cheaper than solving the full problem automatically.

Post a comment

(you may use HTML tags for style)

Remember Me?

Trackback Pings

TrackBack URL for this entry: