« Phew | Main | Setting up Cruisecontrol... »

Die, default namespaces, die

I'm currently implementing XMPP core (client and server) in Python. It's meant to be a fun/educational thing. But there aren't many compliant XMPP stacks out there, and there seems to be inertia within the Jabber community in getting momentum around a compliant stack. So I want to have a crack at an open and plug extensible reference implementation. XMPP is a fascinating technology. Within, a rant about design decisions that are taking the fun out of things for the moment.

Slump.

This is from the XMPP Core draft (draft-ietf-xmpp-core-24), section entitled XML Usage within XMPP:

A default namespace declaration is REQUIRED and is used in all XML streams in order to define the allowable first-level children of the root stream element. This namespace declaration MUST be the same for the initial stream and the response stream so that both streams are qualified consistently. The default namespace declaration applies to the stream and all stanzas sent within a stream (unless explicitly qualified by another namespace, or by the prefix of the streams namespace or the dialback namespace). - 11.2.2 Default Namespace

"A default namespace declaration is REQUIRED". Wow. I've never seen that before.

It goes on to say:

An implementation MUST NOT generate namespace prefixes for elements in the default namespace if the default namespace is 'jabber:client' or 'jabber:server'. An implementation SHOULD NOT generate namespace prefixes for elements qualified by content (as opposed to stream) namespaces other than 'jabber:client' and 'jabber:server'. - 11.2.2 Default Namespace

So here's what I think the problem is. Any XML content going into an XMPP stream/stanza that is not namespace qualified will inherit either the jabber:client or jabber:server namespaces. That's exactly how default namespaces are expected to work. When the content is lifted out it will have to be pulled out of those namespaces, otherwise the markup will be trashed by merely having passed through the XMPP application layer. Most likely fixing up such content will require a design-conflicting hack to obliviate the default mechanism - off the shelf namespace aware tools will correctly preserve the namespace binding and leave the embedded markup borked. Granted what XMPP calls Stanzas and markup folks sometimes call fragments are coming in as discrete chunks and you could silently ignore the default namespaces that came beforehand, but as far as I can tell, you're supposed to treat the entire XML stream as a logical document so the normal rules appy (and it would be weird to spec the rules and then not apply the rules). There are layering problems here.

I'm surprised to see an XML application envelope insisting on a default namespace on the document element. I don't know anything at all about the design decisions, but having seen some namespaced weirdness in my time, my first reaction is that this is not something you want to be designing into Internet technologies.

Choices.

I'm fairly conflicted on this. Entertaining non-compliance is a real option despite my design goals. Still, it's a just a draft and it may be possible to get this language turned around, assuming I have the wherewithal to present a coherent argument to the XMPP WG and not just a tirade on a blog - but my experience is that technical criticism or judgement on the merits of XML Namespaces is not always wanted, no more say than criticism of Web Services are wanted. People do believe Namespaces are an important technology. I suspect the reason this hasn't come up as an issue with XMPP yet is that as I said, compliant XMPP code is thin on the ground and XMPP doesn't seem to be used much yet to carry XML markup, something that will certainly change. And it seems that Jabber code has been fast and loose in the past when it came to XML processing. Consider this outtake, again from XMPP Core:

The element names of the element and its and children MUST be qualified by the streams namespace prefix in all instances. An implementation SHOULD generate only the 'stream:' prefix for these elements, and for historical reasons MAY accept only the 'stream:' prefix.

'SHOULD', here, means "do it, unless you have a really good reason not to". I think the only reason you would specify the above is if you were worried about breaking legacy implementations that were doing dodgy things. My reason not to obey this is it's broken to assign significance to namespace prefixes in this way. In some places it's acceptable to apply significance to a prefix (XPath comes to mind), but not here.

Maybe we'll get lucky and no non-namespaced markup will ever get sent over XMPP ;)

Permathread.

Now, this issue of namespace defaulting has come up before in Atom. There was a thread a while back with the premise, 'we need say something about xmlns="" for content', a thread I mostly instigated. My basic argument was that since you can't guarantee that Atom will be the outer document format, you should advise on being robust with respect to namespace pollution in content. So we talked about it and I got strong pushback on saying anything of the sort in the format spec - 'put it in the guide', ''not core' - that sort of thing. However, pushing Atom around using XMPP/Jabber is hot area at the moment and both technologies are receiving interest for applications outside their target domains (syndication and instant-messaging). A number of people think XMPP and Atom are a good fit, myself included. But the jabber:client and jabber:server default namespaces by design ensure that XML content embedded within an XMPP stream will be polluted. Well behaved namespace aware tools will carry those two default namespaces right down into the content, and leave them there after the markup has passed out of XMPP. There's nothing in either specification that would indicate a problem. For Atom, I suspect it's something we might have to revisit at some point. As Tim Bray puts it - Broken As Designed.

Who pays?

The question then is: should Atom pick up the tab on this? Should XMPP? This is a great question, because it presents a working group with wiggle room to disavow responsibility for ensuring inteoperation and robustness between XML formats used at different layers.

I'm not involved in specifying XMPP, but as a member of the Atom-WG I think Atom should do its utmost to enable its users' content to be robustly carried irrespective of the external packaging format. In terms of specification text, we are talking about a couple of sentences that do not impact on any other aspect of the format - in fact telling users they should shroud their content in xmlns="" before dropping it into an Atom feed doesn't impact on Atom at all.

No wonder then people feel it's superfluous or unneccessary to specify this. And of course, if Atom was all the markup in the world, there is no point. You see, within the bounds of any given spec or layer, default namespaces are a non-problem; and this is perhaps why Atom folks are deeply reluctant to say anything. It's only as you try to get the various markups to play nice you get bitten, and you tend to get bitten with respect to an enclosing XML format. Usually this is an enveloping or packaging technology and usually this is happening far away in time and space from any working groups :) As such it's entirely contextual and by following a certain line of reasoning, the idea of adding such text to the Atom format spec was not accepted.

It is precisely that ability to wiggle out of saying anything that lets you know default namespaces represent an architectural problem.

Ziggurat.

We can explain the fundamental architectural issue using 3 pictures.

The first one is what most technologists have in their heads when they are thinking about a stack or a layered architecture:

stack.gif

The thing to remember here is that layering is an unquestionably good principle to hold; any interactions and couplings between layers are highly controlled or simply do not exist. Much of the problems in computing arise from not having sound first principles to base things on; layering is about as close you can get to one.

This next one is what a stack or a layered architecture predicated on XML packaging looks like:


nest.gif

The important point to remember here is that most technologists often have the first picture in their head even when they are dealing with this second structure. In itself, that's not a problem; you can still preserve a clean layering.

When it comes to using default namespaces, this 3rd picture shows a line who's direction indicates the scope of a default namespace when applied to the outer layer:

nest-ns.gif

Now there is a problem. It should be clear that default namespaces trangress the layers. In turn this means they violate one of the few engineering principles we have in computing that could be considered a universal.

Default namespaces offer no benefits that justify breaking with the layering principle.

The rabbit's not like us.

The thing to remember is that Namespaces is in XML terms, ancient. It goes back to 1998 and was controversial at the time. But it was designed before the rise and rise of XML as a protocol and packaging technology and it's not obvious to me anyone could have predicted the extent that XML has pervaded protocol and interchange design since then. Thus the impact of default namespaces on system layering would not have been entirely clear over half a decade ago. However, if there are only prefix namespaces, this entire problem evaporates. Everyone can play together up and down the stack as no enclosing envelope or XML packaging technology, one that you don't know about or perhaps one that hasn't even been invented yet, can come along and break your content (only versioning is allowed to break your data like this :).

Twenty-eight days... six hours... forty-two minutes... twelve seconds

The default namespace is a bizarre construct. It's a like a macro and a lexical scope rule rolled up into one, but from an alternate universe. In modern protocol construction, it conspires to produce an architectural prank of the first order. As a result of the way a single XML document tree can operate at different application layers, having a globally scoped namespace snatches defeat from the jaws of victory.

On the other hand it's 2004 and we've learned a lot since then. It has to asked of XMPP why it takes this approach. Here's XMPP Core's rationale for using XML Namespaces:

Namespaces are used within all XMPP-compliant XML to create strict boundaries of data ownership. The basic function of namespaces is to separate different vocabularies of XML elements that are structurally mixed together. Ensuring that XMPP-compliant XML is namespace-aware enables any allowable XML to be structurally mixed with any data element within XMPP. Rules for XML namespace names and prefixes are defined in the following subsections. - 11.2 XML Namespace Names and Prefixes

"The basic function of namespaces is to separate different vocabularies of XML elements that are structurally mixed together". Well, jabber:client and jabber:server through their use of default namespaces subvert that rationale and ensure that that ownership can cross boundaries. I'm surprised this passes muster with the draft-hollenbeck recommendations on XML markup for use in IETF protocols tho' I haven't looked at it in a good while - if it does, that document is also in need of attention.

Fix.

The immediate solution is to wrap markup with an xmlns="" declaration before letting it loose on the world. If you don't you are commiting what might be termed a fallacy (in the Peter Deutsch sense) of XML - that the root element in front of you is always the root element. It would help if we could get this bootstrapped into protocol and format specifications via non-invasive text, especially in formats such as Atom, SOAP and XMPP that are taking responsibility for carrying arbitrary content.

Ultimately however, the xmlns="" declaration is a workaround - the proper solution is to deprecate and then eliminate the default namespace from acceptable XML usage. I would love to see a future edition of the namespaces spec start that ball rolling.


September 5, 2004 07:46 PM

Comments

Elliotte Rusty Harold
(September 6, 2004 11:19 AM #)

Am I missing something here? I just don't see the problem. You say, "The immediate solution is to wrap markup with an xmlns='' declaration before letting it loose on the world. If you don't you are commiting what might be termed a fallacy (in the Peter Deutsch sense) of XML - that the root element in front of you is always the root element." so you obviously know this and considered this. Why isn't this a complete fix? And why is this any different for the default namespace than prefixed namespaces? Namespace conflicts can exist between prefixes too.

If I embed document A in document B, it's my responsibility to make sure the namespaces work out. Any decent API like XOM will make this happen automatically. If you're just cutting and pasting text strings, maybe it won't work; but a lot of other things will break too. Cutting and pasting is rarely a robust solution.

There is no need to "wrap markup with an xmlns='' declaration before letting it loose on the world" as you suggest. Such a document is correctly described. If someone else takes that markup and embeds it, then they do need make sure they add xmlns="" where necessary; but again: all XML-aware tools should do this automatically. Only copy and paste has a problem.

The same issues arise when processing document fragments as opposed to complete documents. Here however, you really see that prefixes are no different. You need to add namespace declarations in all sorts of places to fix up everything. The defualt namespace is hardly alone.

I just don't see why this is such a big problem.

Amy!
(September 6, 2004 03:47 PM #)

Hmm. This problem arises when well-formed XML that doesn't care about (or is hostile to) namespaces is embedded in something else, yes? That is, if you have xhtml, *without* an xmlns="[xhtml namespace]" attribute/namespace declaration, then, when that content is embedded, it inherits the default namespace of whatever it's embedded in (unless there is none).

But it really is a problem of namespace-positive versus namespace-negative XML, isn't it? The default namespace declaration on a piece of XML, even if it is merely well-formed rather than valid-according-to-joe-random-schema, will be there, for namespace-positive tools. DTD based tools tend to be namespace-negative, and since you can't define entities without DTDs, there are a number of applications that won't use anything else.

Still, the default namespace, even when absent, is no greater a problem than the XML declaration (when present) or the doctype declaration, since the presence of either of the latter makes a document effectively non-embeddable without munging (and the use of an internal subset tends to make a document non-embeddable altogether, as it requires that that internal subset get promoted, *somehow*, at which point it loses its specificity).

Bill de hra
(September 6, 2004 11:22 PM #)

"If someone else takes that markup and embeds it, then they do need make sure they add xmlns="" where necessary; but again: all XML-aware tools should do this automatically."

1. Namespaces are a breaking change with XML (they don't have to be). 2.The right behaviour around default namespaces is unspecified by namespaces. 3. Default namespaces are redundant.

The fact that a library like XOM takes certain measures or that the right thing is 'obvious' doesn't change that.

Elliotte Rusty Harold
(September 7, 2004 01:13 AM #)

Replying to Amy,

1. There is no such thing as XHTML without an xmlns declaration. The XHTML spec is quite clear that an xmlns declaration is required on the root element.

2. If a well-formed HTML (but not XHTML) document does not have an xmlns declaratiom, then as soon as it's embedded in something else it will be added. Anything more XML-aware than copy-and-paste handles this automatically. There just isn't a problem here.

3. DTDs can be made namespace aware. It's tricky, but doable.

4. You're right that it's no more a problem than the DOCTYPE declaration or the XML declaration; and these are handled the same way. Manual copy and paste requires manual fixup. Anything smarter than that handles it automatically.

Elliotte Rusty Harold
(September 7, 2004 01:24 AM #)

Replying to Bill:


What spec doesn't mandate this? It's handled in XOM, XPath, XSLT, XQuery, JDOM, XInclude, SAX, W3C XML Schema Language, and others. I can't think of a single namespace aware technology that doesn't handle this. And yes, DOM Level 2 handles it. DOM Level 2 does screw up some aspects of namespaces (including the namespace URI of xmlns attributes and the distinction between namespace declarations and attributes) but this one point it gets right. If you import a DocumentFragment from a default namespace or no namespace into another Document object that maps the default namespace differently, the original namespace URIs are preserved. The only technology I can think of that doesn't behave this way is DOM Level 1, because that predates namespaces.

You're seeing mountains where there aren't even molehills. I challenge you to come up with any modern, post-namespace example where this is actually a problem using anything more XML-savvy than copy and paste or the equivalent. Sure you can have problems if you're using non-XML aware tools like regular expressions; but if you're doing that, you've got bigger problems than this. XML tools, APIs, and languages simply don't have this problem.

Amy!
(September 8, 2004 04:32 AM #)

A minor challenge to Rusty:

"3. DTDs can be made namespace aware. It's tricky, but doable."

I don't believe that it can be done, while maintaining the decoupling between the namespace prefix and the namespace URI. Parameterized DTDs can provide namespace compliance, but they aren't at all namespace-positive, and can't handle changes to the DTD's namespace-to-prefix mapping.

DTDs can be made namespaces 1.0 (or 1.1) compliant, given restrictions on prefix usage. I don't believe that they can be made namespace *aware*.

Amy!

Elliotte Rusty Harold
(September 10, 2004 11:20 AM #)

As far as I know the techniques for making DTDs namespace aware were invented by the MathML working group for MathML 1.0. I've explained them in both XML in a Nutshell (Chapter 4) and the XML Bible.

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1413

Listed below are links to weblogs that reference Die, default namespaces, die:

» xmlns="" from Hugh's ramblings
Bill de hra, about parts of the the XMPP (draft) spec:The default namespace is a bizarre construct. It's a like a macro and a lexical scope rule rolled up into one, but from an alternate universe. In modern protocol construction,... [Read More]

Tracked on September 6, 2004 03:22 AM

» Default namespaces design choices from Jabber Architecture
I'm surprised to see an XML application envelope insisting on a default namespace on the document element. I don't know anything at all about the design decisions, but having seen some namespaced weirdness in my time, my first reaction... [Read More]

Tracked on September 9, 2004 07:37 AM

» Default namespaces design choices from Jabber Architecture
I'm surprised to see an XML application envelope insisting on a default namespace on the document element. I don't know anything at all about the design decisions, but having seen some namespaced weirdness in my time, my first reaction... [Read More]

Tracked on September 9, 2004 07:43 AM