« A break from the Norm | Main | Rational in Blue »

XML versus APIs

I saw two interesting, conflicting, posts in the last few days regarding XML processing.

Adam Bosworth is starting a series on processing XML. Forget raw XML- apparently dealing with DOM and SAX is too clumsy for programmers. So we need to bind to APIs.
The article starts out really well:

Mapping XML into program data structures inherently risks losing semantics and even data because any unexpected annotations may be stripped out or the schema may be simply too flexible for the language.

Yes. Moving information from code to XML can be a risk.

Today's programmer has two tools available to parse and manipulate XML files: the Document Object Model (DOM) and Simple API for XML (SAX). Both, as we shall see, are infinitely more painful and infinitely more prolix than the previous code example.
While the DOM can be used to access elements, the language doesn't know how to navigate through the XML's structure or understand its schema and node types. Methods must be used to find elements by name. Instead of the previous simple instruction, now the programmer must write something like:
Tree t = ParseXML("somewhere");
PERatio = number(t.getmember( "/stock/price")) / (( number(t.getmember( "/stock/revenues") - number( t.getmember("/stock/expenses"))
In this example, number converts an XML leaf node into a double. This is not only hideously baroque, it's seriously inefficient. Building up a tree in memory uses up huge amounts of memory, which must then be garbage collected - bad news indeed in a server environment.

Now I have huge respect for Adam Bosworth, but try as I might, I can't agree with the line of this article. If you start binding XML to APIs and making things API-centric, you risk going back to the non-interoperable systems quagmire that XML is supposed to get us out of. As well as this it moves the developer to an API/object oriented view of the world rather than a document oriented mindset. This is a mistake. APIs/Objects have traditionally not interoperated, not even with backend databases and other object systems in the same administrative domain. What objects and APIs have done is allowed us to build large maintainable systems. Objects help us talk to the machines and build comprehensible systems. Interoperability, getting machines and systems in different domains, with different owners, running on different technology, to talk to each, is a different problem again and not something that objects were designed to solve.

I don't dispute the points about inefficiency or even that dealing DOM and SAX can be awkward and clumsy. Buit this seems like an argument from performance and optimization (very J2EE!) rather than a real usability concern with XML processing APIs. If it were, I expect the argument to be that we need better APIs as ER Harold and Microsoft keep telling us, not that we need to hide the XML completely (if you're a Java programmer working XML, take the time to look at the .NET System.XML library). Maybe some years out we can think about hiding the AngleBracketedUnicodeText

Om xml-dev, Tim Bray puts forward the opposite argument (xml-dev - Re: [xml-dev] Typing and paranoia). I'll quote:

There's a deep tension here that won't go away. Some of us really
REALLY want to be able to deal with the bits on the wire and REALLY like
the open-ness and interoperability that gives us. Others really REALLY
want to take the bits on the wire away and present us instead with an
API that has 117 entry points averaging 5 arguments and try to convince
us that this is somehow equivalent. XML, for the first time in my
professional career, represents a consensus on interoperability: that it
is achieved by interchanging streams of characters with embedded markup.
Since about 15 seconds after XML's release, the API bigots have been
trying to recover from this terrible mistake and pretend that the syntax
is ephemeral and the reality is the data structure, just read chapters 3
through 27 of the API spec, buy the programmer's toolkit, sign up for
professional services and hey-presto, you'll be able to access your own
data, isn't that wonderful!?!?

I'm not sure people that like APIs are bigoted, but I am sure that if you think you can eradicate XML from your programs in favour of APIs and object models, there will come a day when you systems will decay and will cease to interoperate. If you really value interoperability, if you really want to get systems hooked up and keep them hooked up, you will want stay close to the XML.

Tim Bray is right - XML hands down wipes the floor with any previous attempt to get systems interoperating with each other, especially when you combine with MIMEish protocols like HTTP.

I see the same tension coming to the Open Office community's doorstep. Currently there are two flavours for programming Oo. You can unzip an .sxw file, manipulate the XML, and zip the results back up. Or you can go through the Oo API. In the last month, I've done both. Now the surface area of the Oo API is vast. There are hundreds of objects to know about, there's understanding how to interact with the Oo object broker, UCB. If that wasn't enough, there's the Oo IDL format and a scripting language to get to grips with. In fact it's more like a platform a la the JDK, than an API. The XML format is no lightweight either- the spec document is a 500+ page .pdf), but my impression so far is that its a more tractable and cohesive approach that the API platform.

December 6, 2002 11:19 AM


Trackback Pings

TrackBack URL for this entry: