« Instructions per byte | Main | Frameworks are leading indicators for programming languages »

Communication Languages

Mark Nottingham is noodling on data models

If you're the only person who every has to look at the XML or write software to work with it, you’re fine, because you'll say exactly what you mean (no more, no less). When you start involving other people, however, things get complex pretty quickly.

That's because you have to agree on what’s being said, and just as importantly, what isn't. What's that attribute? Is the ordering of the children of that element significant? What's the relationship between 'item' and 'entry'? And so forth.

In a nutshell, I see this problem as one of choosing a data model along with an appropriate constraint (i.e., schema) language, and then figuring out how to get from that data model to angle brackets. Therefore, I present a choose-your-own-adventure guide to using XML in data-oriented formats, with the pros and cons of each choice.

Mark gives 2.5 options

  • 1. Your data model is based on XML: "This path is, in my opinion, the major reason behind the wailing that we hear when people actually try to use Web services and XML (lots of people seem to agree). It isn't pretty, and I don't see it easing significantly, despite the advent of better bindings of XML into languages, or better schema languages. I suspect that Infoset-as-metamodel is the root of the problem."
  • 2a. Static, fixed mapping from that data model to XML: "Basically, these approaches are using XML as a serialisation format, in the sense that they're using it to mindlessly serialise an object or other model into XML. The integration into the XML stack is almost accidental where it happens, and for these reasons, I don't think this is much of an option."
  • 2b. An application-specific mapping from that data model to XML: "many XML-based specifications are actually described in a separate data model, even if it's just a set of XPath expressions. Disconnecting from the constraints of the Infoset frees you to think about what the data model should be, not what it should look like in bits."

Option 3 would be to focus on the application protocols rather than the content models of the payloads. That would mean new methods and header metadata or possibly a tad more formalism than the current processThis() (MEST) or HTTP POST (REST) styles. Certainly you'd be starting with a set of well defined communication primitives than could be re-combined or extended rather than providing a bucket method for any semantic that the the protocol designers did not forsee or a method free for all.

You can think of it this way - if Lisp and Smalltalk represent some kind of maxima for expressivity in programming languages then speech act languages like KIF and FIPA-ACL represent a maxima for expressivity in Internet application protocols. Enough research and experimentation has been conducted on software agent communication languages over the last twenty years to gives us an inkling of how this might work. The primary problem with this option is social - my guess is that it's more difficult to innovate with application protocols than content models since app protocols tend to be baked into the infrastructure as of the beginning of the 21st Century. And we do tend to think of protocols in engineering terms (interfaces, structures. bits, wires) rather than as linguistic phenomena. It's not called the Internet Language Task Force after all.

Given the circumstances, one hack then, would be to tunnel an extensible protocol language through the popular deployed protocols in way that was consistent with the protocols' performatives. RDF is not that language but could serve as some kind of Linear B for a more evolved protocol language. The concept of protocol neutrality in WS remains misguided as do bucket methods in app protocols.

March 2, 2005 10:42 PM

Trackback Pings

TrackBack URL for this entry: