« How long would it take to move Mount Fuji? | Main | Jabber needs to die so XMPP can live »

HTTPLR draft-01 published. Some background. Some futures.

HTTPLR is an application protocol for reliable messaging using HTTP. The protocol specifies reliable delivery for upload and download of messages. A new revision, draft-httplr-01 is now available and supersedes draft-httplr-00.

Feedback on httplr-draft-00

I've had great comments and observations so far, which has fed directly into the 01 draft. A big big thank-you to everyone that took some time to comment.

To be honest I wasn't sure what to expect - deafening silence came to mind, and maybe some heat from WS and REST advocates, but so far it's been positive [1] and quite insightful. There were some thinkos and things I had straight-out missed in draft-00 that were graciously pointed out and I think draft-01 is a better document as a result of that.

Some people asked about code - there will be an RI (see the Roadmap below). As far as 'does it work?' goes, the answer is yes. I've mentioned before that an earlier draft of the upload protocol has been running in production for over a year - the essential protocol logic is the same but the public versions are more detailed with respect to things like headers and status codes. In work, we're currently implementing the download protocol.

Changes from draft-00

The revision history is in the spec, but I'll pull out the highlights here:

  • 8: Removed Allow: from "Sending a message to the exchange URL" figure (Allow only applies to the created exchange URL when PUT is used to create the resource).
  • 8.2: Clarified PUT/POST/DELETE negotiation/interoperation for message delivery
  • 8.2.1: Moved to 8.3 (as a special case)
  • 8.2: Changed response code for informing a client the message has been reconciled from 405 to 410
  • 8.2: Highlighted Behaviour where PUT delivery is not supported (as section 8.2.1)
  • 8.2: Highlighted Behaviour where delivery is repeated (as section 8.2.2)
  • 8.2: Added DELETE to list of allowed methods in figure 2
  • 8.3: Highlighted Behaviour where DELETE reconciliation is not supported (as section 8.3.1)
  • 8.3: Highlighted Behaviour for Rejection of out of order DELETE requests (as section 8.3.2)
  • 8.3: Use 410 Gone to repsond to repeated reconciliation requests
  • 9.1: Changed example to use atom05
  • 9.2.1: moved to 9.3 (as a special case)
  • 9.3: Removed inconsistent text around client behaviour when sending repeat reconciliations.
  • 10: Clarified dictionary attack as pertaining to both digest and basic auth.
  • A: added 501 to the list of status codes.

Backgrounder and design notes

Some people have asked me about the design rationale and background of the spec. I'll take a stab at that here.

History

I suspect that reliable-over-HTTP has a longer history than I'm aware of, but has been one of those things that no-one bothered to write down. I would guess the basic 2-step send and reconcile model has been around for years, possibly going back to the Nineties. With the possible exception of trying to stay within the confines of HTTP (no URI deconstruction, no peeking in message bodies) HTTPLR isn't all that innovative, and is in one sense merely documentation.

I've been working on reliable HTTP in one form or another since early 2002, which is when reliable messaging over HTTP started to pique my interest. In 2001 in the UK, I started to see scenarios under which failed content delivery unbeknownst to the client could cause problems.

It was Paul Prescod's Reliable delivery in HTTP that initially inspired me to write the process down formally and that was cemented by some experiences during my time so far with Propylon. The earliest articulation of a HTTPLR protocol document dates from 2002 and the IETF style document started in summer of 2003. The protocol had gone through several revisions and implementation experience before releasing draft-00. There's even a document on the web dating from August 2003, but I never linked to it as I didn't think it was baked enough at the time (whereas at around the same time something like "Click Submit Only Once" was).

At Propylon I had direct exposure to problem spaces where documents had to arrive once and only once, and eventually that resulted in an opportunity along with Praveg Arkadi (a colleague) in late 2003 to implement a solution against an early draft of the protocol. Two senior work colleagues, Sean McGrath and Conor O'Reilly, had also been looking at messaging along much broader perspectives for eGovernment; that was influential in us taking a protocol oriented approach.

Paul in his document claimed it was so simple it was hardly worth bothering with from a protocol perspective, and if he were bothered he would use a new verb. I think he's right that is at heart, simple, but wrong that a new verb is the way to go, purely for traction purposes (technically I have no argument against new HTTP verbs). IBM's HTTPR had new verbs and I think failed to catch fire in large part due to that (although there is other incidental complexity in that spec that prevented adoption). WebDAV created new verbs, and despite being really useful and extremely well thought-out, continues to have a slow (but steady) adoption rate - that alone should give anyone pause about verb proliferation on the Web. But if that's not enough, HTTP itself has had problems - the deployment reality is that GET|POST is what most people are working to.

So, if you are after adoption on the Web, you will tend to avoid creating new verbs [2]. The problem is that the restricted verb set complicates the protocol and makes implementors go through more hoops that is strictly necessary (more on this below).

Why Client/Server? Why HTTP?

Most WS RM specifications are what an early version of HTTPLR termed 'heavyweight'. They all assume both parties have a web server exposed and function by correlating IDs between pairs of clients and servers. In principle there's nothing wrong with this and you can build high-throughput systems by emulating full duplex with a pair of HTTP connections (which is why I don't use the potentially pejorative 'heavyweight' anymore). I have a good bit of experience for example with the BizTalk Server series which uses connection pairs in this way and it's a perfectly fine way to realize reliable messaging.

In more restricted scenarios however, replacing a HTTP client with a Server in a DMZ and doing the consequent integration to the backend is not always an option - some users just need to send messages and don't want to invest a lot of infrastructure in having a server just for that purpose. Having a 'half-duplex' client that (in theory) they can run from a desktop is an attractive proposition. Aside from system integrations, there are also consumer-oriented scenarios where reliability using pairs of servers isn't an option, but client/server is (J2ME mobile devices come to mind).

Why HTTP? Because it's there.

PUT v POST, DELETE v POST

Much of the feedback I've had so far has been around the choice of verbs. Opinions conflicted on this. Some people thought that POST was noise - use PUT|DELETE. Some people thought PUT|DELETE was non-interoperable - use POST.

I'm not surprised about this - one of the reasons HTTPLR has a long history is because of this tension between the HTTP as specified and HTTP as deployed. Clearly PUT and DELETE are the right choices for specific operations. But the fact is that most of deployed web is configured for GET|POST. The best thing to do after a lot of thought is to keep as many options open as possible and that means allowing users to choose PUT or POST, DELETE or POST.

Feedback on draft-00 has helped clarify a means of interoperation amongst the options [3]. Nonetheless I'm not happy about it. It adds complexity to the spec and to implementations.

The main technical problem with using sequential POSTS is that you need something else to switch on to figure out what state in the exchange you're in, in case you need to right yourself (what Paul Prescod dubbed a "confused client"). This doesn't matter too much for a person working the state machine via forms, but it sucks for automated clients. Figuring out a control code to switch on that isn't in a URI or in a served representations requires a lot of fiddling with response codes and headers. HTTPLR is choc-full of it.

The beauty of PUT|DELETE or GET|DELETE is that you can use the methods as the control set. Both the specification and code for state transitions in HTTPLR gets a lot simpler. The problem is that PUT|DELETE deployment is sketchy, so you're automatically excluding a swathe of people from using HTTPLR aware toolkits, be they using J2ME enabled phones, or Apache servers with no administration options.

[In the Web protocol space, I believe Joe Gregorio is due a lot of credit for popularizing verb transitions and demonstrating them as a workable idiom. I also think he has much the same feelings as I do about the current state of affairs with GET|POST.]

How did we get here? That's easy - HTML forms. Be in no doubt, the subsetting of HTTP by the W3C in its HTML specs is a bad design decision that has affected consequent REST and WS efforts. XForms redresses this situation, but without DELETE (I don't know why). It bothers me that the W3C TAG can agonize over http-range-14, but hasn't yet (to my knowledge) considered the consequences of HTML subsetting HTTP by design - HTML still being the premier format on the Web.

MEST and processThis() [4] advocates should read HTTPLR just to see what happens when you don't provide sufficient verbs to communicate. The MEST approach seems to favour an adverb/adjective approach to communications and is being well thought out - I think they will deal with this by constraining the content model beyond SOAP. Mark Baker's explorations with RDF Forms are another approach to constrained content. Anyway, I reckon ~30% of the protocol document is given over to dealing with GET|POST reality - in other words a lot of verbiage and conditional complexity could be removed from HTTPLR given two more verbs. In my experience ~30% is also roughly the code blowup that will result when you have insufficient verbs to work with. The operations have to go somewhere and the empirical evidence is that CRUD represents a useful minimal verb set unless all what you want to do is grunt rather than speak. That or you could wind up with two protocols interacting across architectural layers.

[By the way, with HTTPLR, URLs are under the server's control and no URIs are being created via PUT requests. But, we are left with tunneling PUT|DELETE over POST nonetheless. That's not a good situation. If I sound annoyed about this, well, I am.]

Futures

Roadmap

I expect the spec to go through a few iterations this year. I haven't put it on any kind of release cycle yet, but I reckon the next draft will appear no earlier than June 2005 with maybe two more drafts before the end of the year.

The long term intent is that it end up in the IETF, but I would like to have running code in the wild before that.

Source code

There will be a reference implementation (RI) of HTTPLR, client and server libraries. The server RI will be Servlets/Jython with plugin persistence support for ActiveMQ, Joram and MySQL backends. The client RI will be Python, but I would like to see .NET client as well.

The main thing holding back an RI is, as ever, finding the time to get code out the door. Other niggly things that are holding me back are public hosting (I want Subversion, not CVS, so Sourceforge is out) and choice of licence - LGPL v ASF (I'm leaning towards ASF at this point).

If you are thinking about implementing the spec, that would be cool. Ping me about it.

Relationship to WS

Just in case anyone thinks that HTTPLR is meant to be a counter-proposal or a foil to the various WS reliable messaging specs, it's not. It's only meant to be a guide for doing something specific with HTTP as deployed, that HTTP does not provide for "out of the doc".

The WS RM specs are fine by me, the problem is there's too many of them. I for one would like to see the industry coalesce around one spec and have that available in as many stacks as possible [5].

I think it's good to have a raw HTTP protocol option especially for scenarios where pairs of servers aren't an option. And I see that Tim Bray says this area is important. You know, he's as good an arrow shooter as any :)




[1] Robert Sayre even had an exclamation mark in his blog entry, so I guess he must have liked it ;)

[2] In time this might be not true, the signs are that people are waking up the GET/POST situation. But I think it will be years before the full HTTP+WebDAV are ubiquitous.

[3] Sean McGrath, my CTO, has reminded me many times that optionality can be a curse. I believe Sam Ruby holds a similar, hard-earned, viewpoint.

[4] I'm prone to calling processThis() 'NOOP' - it helps me remember where the real action is at ;) Jim Webber, it turns out, is a Thoughtworker - that's news to me (those guys are everywhere).

[5] Increasingly this looks like WS-ReliableMessaging.


March 14, 2005 01:44 AM

Comments

Robert Sayre
(March 14, 2005 02:14 PM #)

"The long term intent is that it end up in the IETF, but I would like to have running code in the wild before that."

You could submit them as I-Ds. No WG BS required, but it gets your protocol in front of more eyes.

Bill de hOra
(March 14, 2005 02:42 PM #)

"You could submit them as I-Ds. No WG BS required, but it gets your protocol in front of more eyes."

Robert, hadn't thought of that (doh). But yes, I think if there was available running code an ID would be good to have (it's more more credible in the IETF world when there's code to back things up).

Aristotle Pagaltzis
(January 5, 2006 12:00 AM #)

Bill: whatever happened to this? It looks promising, but seems to have gone dormant?

Aristotle Pagaltzis
(January 5, 2006 12:13 AM #)

Bill: whatever happened to this? It looks promising, but seems to have gone dormant?