« TSS.NET | Main | Extending REST »

JunkLetterQueue: when XML envelopes go wrong

Last year we created an XML mesaging hub for eGovernment here at Propylon. The hub connects a number of government agencies interested in life-events such as births deaths and marriages, by bridging various transport and application protocols (which we dub Channels). The original purpose of the system was as a proof of concept for a larger inter-agency hub, but also as an interim solution to provide sufficient connectivity until the main hub was built out. As is the way with these things, the hub has evolved through a couple of iterations, services have been added as needed, and is ticking away nicely.

There is a standard XML envelope that all parties agree to, we didn't use SOAP for this - it was simpler to define an envelope in plain XML (in much the same way RSS is simpler by not being SOAP). This envelope is independent of the details of any particular life-event. One aspect of the envelope is that it provides an identity for each message sent through the system along with the ability to associate messages as being part of a conversation. As long as the envelope identity set is carried through or referenced into backend systems and business processes, there's a fighting chance that the message can be tracked across arbitrary network and organizational boundaries (auditing, reconcillation and tracking is a tough nut even when you have the luxury of an homogenous network and a single network owner).

One the problems in the protocol bridging scenario is what to do when a malformed XML envelope arrives at your front door. Bridging Channels is at heart asynchronous; internally there may be a number of hops across processes any of which can result in corrupted markup. You can bet that when you get junk XML there will not be chain of a processes blocked happily waiting for you to return control - managing referential integrity and call stacks across networks is too hard (this is why many people won't recommend RPC and distributed OO outside a cluster). You might know the Channel it came in from, but you won't always be able to query the Channel in question and even if you could, what can you ask it? If you can't parse the message to find out who it's from there's no easy way to pull out the minimal information to make the query. And importantly (in our case) there's no way to pull out the identity set to log the audit.

While there has been plenty of handwringing about whether XML is a good carrier format (compared to say, multipart MIME or BEEP frames) I haven't seen much discussion about what to do or how to fail on bad XML. It does happen that markup gets corrupted over a network hop or between two processes, but it does not happen very often and you have to weigh the risk of it happening against the engineering cost of handling it when it does. You also have to take the SLA involved into account - some messages must get there no matter what, some can fall by the wayside.

Given all that, our approach for the hub and messaging endpoints was simple - follow the XML spec and give up. This means avoiding heuristics, avoiding regexen, avoid excess engineering, avoid distributed transactions, avoid cleverness. If the message doesn't parse (or something goes wrong) we:

  • trap the exception
  • log that there's problem
  • dump the message received to disk. This is the alluded-to JunkLetterQueue . We choose disk over a database because it makes minimal assumptions about what's running on the server (one less process to worry about).
  • email someone with the message in the body
  • log you're sending an email to someone
  • exit

For example, in java you might write something like this:

  public void execute( String in ) throws JunkEnvelopeException {
    boolean possibleLogFailure = false;
    Document doc = null;
    String decodedReachEnvelope ="";
    try  {
      doc = DocumentConverter.readString(in);
      // ...
    }
    catch(Exception e)   {
      possibleLogFailure = true;
      throw new JunkEnvelopeException(e);
    }
    finally  {
      if(possibleLogFailure)  {
        String location = "";
        try  {
          Log.EnvValidationLog(LOGNAME + " processing of incoming envelope failed");
          location = writeMessageToDisk(in);
          Log.EnvValidationLog(LOGNAME + " writing envelope to [" + location + "]");
        }
        catch(Exception e)  {
          throw new JunkEnvelopeException(e);
        }
        finally   {
          sendWarningEmail(in, location);
          Log.EnvValidationLog(LOGNAME + " sending warning email to [" + getEmailTo() + "]");
        }
      }
    }
  }

(as an aside: the above is an example of a rare case when catching and acting on an exception is a useful or even optimal option)

If the machines can't parse the XML, there's an option that a person can fire up a text editor and derive enough information to inform the sender that there was a problem with message XXX. Perhaps they can fix up the message and push it through again, perhaps it will be resent, but in this case (citizen data) you don't ask software to judge what's best. I feel this fail-fast approach also applies to SOAP messages travelling across heterogeneous networks and of course to application protocols that use self-describing messages as mandated by REST (such as a combining HTTP+XML/RDF sans cookies). It will also apply in the future when RSS/Atom feeds start to be used beyond their target domains of blog and news feeds and instead for enterprise critical data (well-formedness and appropriate aggregator behaviour for malformed feeds is an ongoing argument in the Atom community).

I thought about using "DeadLetterQueue" for this entry, but that's commonly used for messages that can't be delivered and have expired, which indicates a connectivity, protocol or addressing problem rather than a data integrity problem, hence the moniker JunkLetterQueue. The EIP site makes a similar distinction and calls it InvalidMessageChannel but doesn't discuss it - however in the XML/Webservices world an invalid message is quite different from a malformed or junk one. Most important is remembering this constraint: there is no sound way to process or act on a malformed XML envelope. If you can't parse, don't process.


January 17, 2004 04:45 PM

Comments

DEC HL
(January 25, 2004 01:30 AM #)

Right on - if you aren't 100% sure of an incoming piece of data, ignore it. If the app is designed to use acknowledgements, it will right itself. Better to let the app recover from the lack of an acknowledgement, than to act incorrectly on what you think the bad message might mean.

Of course, this also highlights one of the ways a disadvantage of XML can become an advantage. If, instead of XML, the message was specified in a binary protocol, then there's every chance the mangled incoming message could have another (semantically valid) meaning. But with XML, there's very little chance that a mangled envelope could happen to form another valid envelope.

Patrick Logan
(January 29, 2004 03:30 AM #)

I like this approach for either text or binary messages.

I don't see this as a text vs. binary issue. A well designed binary format should have some minimal indicator that it is invalid.

Trackback Pings

TrackBack URL for this entry:
http://www.dehora.net/mt/mt-tb.cgi/1141

Listed below are links to weblogs that reference JunkLetterQueue: when XML envelopes go wrong:

» RDF/XML is Readable from About Kim
Here's an interesting discussion of whether RDF/XML is readable, compared to a custom-made XML syntax for Atom. My first reaction to this discussion was that when I looked through the example RDF/XML on that page, it seemed pretty straightforward, assu... [Read More]

Tracked on March 4, 2004 10:20 PM