TOC 

HTTP-LR: Lightweight Reliable delivery over HTTP
draft-httplr-1

Author: Bill de hÓra
Date: 26 July 2003
URL: http://www.dehora.net/doc/draft-httplr-1.html


Status of this document

This document is a draft and may be updated or altered at any time. It should be referenced as non-normative material by other documents.

Copyright notice and distribution

Copyright © Bill de hÓra (2003). This document is released under the Creative Commons Attribution-ShareAlike 1.0 licence.

Distribution of this document is unlimited. Please send comments to

Abstract

This document describes a protocol for reliable transmission of messages over HTTP, something that HTTP does not guarantee. It is not concerned with endpoint availability, robustness of components, or details of persistent storage. The technique provides a measure of reliability within the client server model of HTTP. Reliable variants of HTTP often require a peer to peer model, where both communicators are HTTP servers. These peer to peer models are termed heavyweight.



 TOC 

Table of Contents



 TOC 

1. Introduction

This document describes an application protocol for guaranteed once and only once transmission of messages using HTTP, something that HTTP alone does not guarantee. It is not concerned with endpoint availability, robustness of components, or details of persistent storage. it is not concerned with message order.

The technique described provides a measure of reliability within the client server model of HTTP. Reliable variants of HTTP often require a peer to peer model, where both communicators are HTTP servers. These peer to peer models are termed heavyweight.

The first published description of a reliable protocol using a HTTP client and server is attributed to Paul Prescod [Prescod]. That document 'Reliable delivery in HTTP', along with the author's experiences implementing messaging systems with HTTP were the basis for this protocol.

1.1 Terminology

The words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" in this document must be interpreted as described in [RFC 2119].

A client in this discussion is whoever begins the message exchange via a HTTP request and a server is whoever responds to the request. A message is the entity body sent by a PUT request to a server from a client.

1.3 Intellectual Property Notice

This document is released under the Creative Commons Attribution-ShareAlike 1.0 licence.


 TOC 

2. Requirements and assumptions

The key requirement for designing a reliable delivery protocol is agreement. The two parties involved in a delivery must agree that a message has been delivered.

The protocol must not result in duplicated messages. This is achieved by enforcing certain constraints on the client and server. Specifically we will require them to hold to minimal state during the exchange by using an identifier as a shared key, until they come to an agreement that the message has indeed been delivered.

2.1 Eventual arrival

The protocol makes one important assumption. An infinite number of requests will result in an infinite number of responses. This assumption is known as eventual arrival - it is seen in formal models of distributed systems, and implicit in most deployed reliable protocols. It allows us to disregard arbitrary (often called 'Byzantine') failure modes for which a reliable protocols can never be modelled.

2.1 Partial failure

An interesting characteristic of distributed systems is that senders and receivers of messages can't know with certainty either what went wrong or with which part - in fact without catering for agreement, they might not know if anything did go wrong with a transmission. Our primary concern for failure is dealing with partial failure amidst eventual arrival. Partial failure is where one component in the system fails while the others continue. The protocol must meet its requirements under partial failure. The HTTP client-server model has three failing parts, the client, the network, and the server. For examples, if the network fails mid-transmission, a request might be arrive to the server but not a response to the client. Or if the server's firewall rules are mis-configured, requests are rejected out of hand. Counter-intuitive as it may sound, a total end to end system failure is both safe and acceptable, since the system can't move to an unstable state - aside from being unlikely, if nothing is happening nothing bad can happen with a transmission.


 TOC 

3. HTTP Methods and message delivery

One way to make sure a message went through in HTTP is to resend until the client gets some acknowledgment from the server. Some HTTP actions (GET, PUT, DELETE) allow this to happen safely, because they are idempotent - repeating the action does not alter the result of the first successful action.

However this assumes applications are modelled precisely as the HTTP intended, that is, each item of interest is given its own URL, URLs are not recycled, and idempotent methods are just that. Not all HTTP applications are modelled this way - for example GET can return different values over time, URL recycling is common enough, and a sequence of idempotent messages by more than one client can result in a non-idempotent outcome. [Naturally, it is not possible to ensure that a server does not generate internal side-effects as a result of performing an idempotent request; the important operational distinction here is that the client user and very possibly the owner of the server, did not request the side-effects, so therefore cannot be held accountable for them.]

The HTTP POST method is popular in web services for message transmission. POST is close in meaning to a file write or an SQL UPDATE. Since a repeated POST is not sure to be idempotent, a message sending strategy based purely on client retries is not guaranteed to be safe.


 TOC 

4. Identification of exchanges

One way to avoid overwriting data with POST is to put a message ID in a header or the message body. This is not required by HTTP. HTTP is stateless protocol and as ID tracking requires (some) mandatory state on the server.

Assigning identifiers to message exchanges is a standard networking idiom (for example it is used in TCP). [Lynch]. describes the general process as the FivePacketHandshakeProtocol and its formal properties are well-understood. This document's protocol is based on the FivePacketHandshakeProtocol.

4.1 Who generates identifiers?

Normally we would like to entrust this to the server particularly where there are multiple clients. That requires only one generator reducing the likelihood of error. Appendix C reformulates the protocol for client side supply of identifiers. It should be noted that the algorithms for generating highly unique identifiers like GUIDs are complex and difficult to get right. A published algorithm and implementation for GUIDs unique until 3400AD is described in Leach and Salz


 TOC 

5. The Protocol

This section describes the basic protocol exchange. Section 6 discusses it in detail.

5.1  Establish identifier

A request-response exchange between the client and server establishes a message identifier and a place to send the message to, described in the Location header:

  -->
  POST /rmservice HTTP/1.1
  host: http://www.example.org

  <--
  HTTP/1.1 201 Created
  Location:http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A

The client request for the identifier/location pair can be made until it receives a response. The server must supply a specific identifier (a URL). [Appendix C shows how a client may supply an identifier].

5.2  Send message

Delivery of the message by the client to the specified URL, and an response (agreement the message arrived) from the server. When the client receives the server acknowledgment the client can reach agreement on delivery. The server response may contain a Location header to indicate the URL the client must use for the next request. Usually this will be the same URL to which the message is sent. If the Location header is not present, the client must assume the URL is the same as the one the message was sent to:

  -->
  PUT /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org
 
  <message body/>

  <--
  HTTP/1.1 202 Accepted

To deal with partial failure, the message may be PUT repeatedly without incurring side effects, as the URL is unique to the client. This implies the server must record the state of the message as identified to ensure idempotency. It also implies the client must record state to 'remember' the state of the exchange. 'Record' is understood to imply persistent storage outside working memory.

5.3  Reconciliation

An indication by the client to the server it has agreed the message was sent successfully. Up to this point to server knows the message was sent to it, but does not know if the client agrees it has been sent (as it does not know for certain if the client received the response). This exchange allows the server to know that the client is in agreement the message was delivered:

  -->
  POST /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org

  <--
  HTTP/1.1 200 Ok
  Location: http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A 

The indication may be delivered multiple times if necessary. When the client receives the server response, it may release any state it's holding since both parties are in agreement that the message was delivered. As before, the server response may contain a Location header to indicate the URL the client must use for the next request. Usually this will be the same URL to which the message is sent.

If the Location header is not present, the client must assume the URL is the same as the one the message was PUT to.

5.4  Method constraints

There are two constraints regarding allowed HTTP methods to the exchange, indicating the acceptable Methods.

Constraint 1: The server must, after it provides the message delivery location, respond to all POSTs as indicating agreement cannot be reached before a message was sent:

  -->
  POST /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org

The server responds with an error, 405:

  <--
  HTTP/1.1 405 Method Not Allowed
  Allow: PUT, GET, HEAD, OPTIONS
  Location: http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A 

Constraint 2: The server must, after it receives the reconciliation POST request, respond to a PUT at that URL as indicating the message cannot be redelivered after that point in the exchange:

  -->
  PUT /rmservice/id/249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org
 
  <message body/>

The server responds with an error, 405:

  <--
  HTTP/1.1 405 Method Not Allowed
  Allow: POST, GET, HEAD, OPTIONS
  Location: http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A 

The server should at all times send the Allow header in its responses.


 TOC 

6. Protocol walk through

6.1  Establish identifier

First the client requests a place to send the message, by asking the server for a message id:

  -->
  POST /rmservice HTTP/1.1
  host: http://www.example.org

The server response includes the Location: header, telling the client where to send the message:

  <--
  HTTP/1.1 201 Created
  Location: http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A

The use of query strings is optional for the Location; a path scheme would do just as well. The main differences are that some web proxies will not cache a URL with a query string and that most HTTP APIs assume that query strings are designed to be inspected and will decompose the query part on demand - path based URLs usually need to be decomposed by other means. Overall, the structure of the URLs used is incidental to the protocol.

6.2  Send message

Now that now both client and server are sharing a single message ID, the client can send the message to the location specified using PUT:

  -->
  PUT /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org
 
  <message body/>

The server acknowledges successful delivery with 202:

  <--
  HTTP/1.1 202 Accepted
  Location: http://www.example.org/rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A

There are no side effects of repeated PUTs, in the case where the client did not receive the server's acknowledgment but the server did send it. The server must ensure that any further PUTs made to URL after the first successful PUT are idempotent (for example any back end activities or onward message routing are not to be re-enacted), but can be responded to with a 202. The simplest way to achieve this is for the server to record the state of the message exchange using the message ID as a key. In turn the client must agree that POSTs will cease once a 202 has been received.

6.3  Reconciliation

On receipt of the 202 response from the PUT, only one party, the client, knows the exchange was successful. Since HTTP is a pure client server protocol, the server cannot know with certainty that the client received its 202 Accepted, because it cannot ask the client. So while the client is reconciled, the server is not, and will be maintaining state (and therefore resources associated with the exchange). Further PUTs to the URL from the client provide it with no further information. Eventually, the server has to be able to release resources associated with the message or perform any local reconciliation it needs to (for example the server may need to forward the message onto a third-party or somewhat incidentally, the server back end may simply want to periodically flush a database).

To provide information to the server, the client must send a second POST request:

  -->
  POST /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org

and the server can acknowledge in turn.

  <--
  HTTP/1.1 200 Ok

6.4  Interleaving PUT and POST

We use a POST because we have already agreed that the semantics of further PUTS are that they are to be ignored (so the server won't know whether the client is retrying to send a message or signing off). Once the client receives the 200 for the POST, both parties are reconciled and both know that the other knows the delivery has completed successfully.

6.5  POST and idempotency

As mentioned before HTTP POST is not idempotent. The server must ensure that repeated POSTS are idempotent. This is an inexpensive burden - all the server has to do with a POST request it does not recognise or has already responded is simply to keep returning 200 Ok. After the server responds to the reconcilliation POST, it can reconcile itself and release any resources it's holding onto for the message exchange. The client can safely continue to POST until it receives a response. After the client receives a response, it knows the server is in agreement and it must not send further POSTs.

6.6  URL recycling

Note that in this protocol, the ID component of the URL cannot be recycled, so a GUID program must be used as generator. This is because it is possible that the server reassigns an ID to one client before the original client has received a response to the reconcilliation POST. Remember the server has no idea whether the client received a response to this POST request and it must allow for multiple POSTs requests over an arbitrary length of time. We could add a fourth request response pair, but once a GUID class generator is used, IDs are cheap and readily disposable that it is arguably a redundant exchange.

6.7  URLs and GUIDs

The protocol makes use of both URLs and GUIDs, and it may seem ambivalent as to which can be used to identify the message during the exchange. In fact the protocol does not identify the message. It identifies the message exchange. The most important resource to model in this protocol in the exchange, not the message. Therefor client and server must use the URL returned in the Location header as the canonical identifier for the exchange - this affords the loosest coupling between the two while in principle allowing permitted intermediaries to inspect the state of an exchange by performing GET against the URL. The GUID embedded in the URL merely allows servers and clients to have unique identifiers. The GUID may be inspected or even used as an implementation detail, but implementers must note that 'understanding' the URL structure might couple client and server more tightly than is necessary and this understanding very possibly will not transfer across implementations, as URL details may differ from server to server.

6.8  Regression

Theoretically, as HTTP is an asymmetric protocol we keep needing the client to send a request acknowledging receipt of the last response, so we regress infinitely. However the given the assumption eventual arrival, we can ignore such regression.


 TOC 

7. Protocol characteristics

7.1  What's to like?

We can identify a number of desirable properties of this protocol:

7.2  Objections

We can identify a number of objections to the protocol:


 TOC 

8.  Termination and retries

6.8  Terminating an exchange

To terminate an exchange the client may send a DELETE request to the URL:

  -->
  DELETE /rmservice?id=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org

and the server can acknowledge in turn.

  <--
  HTTP/1.1 200 Ok

This terminates the exchange. Any further requests sent against the URL must be treated as if the èxchange were completed.

6.8  Retries and timeouts

Under failures in reqeusts and responses, the onus to continue the exchange is placed on the client. The number of times a client retries to send a request in order to get back a response and the duration between retries is not defined here.

8.3  State management and duration

Once an exchange begins, Servers and clients must hold exchange state until one the following happens :


 TOC 

Appendix A: extensions

Open exchanges, The server may provide a list of current message exchanges that have not reached mutual agreement:

  -->
  GET /rmservice/sessions HTTP/1.1
  host: http://www.example.org

The server responds with a list of message ids and their state:

  <--
  HTTP/1.1 200 Ok 

  <list message ids and urls/>

The extension offers global visibility - third-party monitoring and reporting tools may determine the status of a message delivery by simply making requests on URLs. Potentially, it can allow reconciliation of phantom messages (Appendix B).

Appendix B: Phantom Messages

If the client requests an identifier, but did not receive the response from the server, the server may be holding onto an identifier for a message the client will never send (the client will simply ask for another identifier). These are phantom messages. If the client was allowed to examine a list of incomplete exchanges, it could identify phantom messages and terminate them (see section 8).

As the existence of phantom messages is not actively harmful, HTTP-LR does not describe an exchange to remove phantom messages. We mention the possibility for completeness. Following feedback from implementations, future HTTP-LR versions may provide an exchange pattern as an optimisation to allow the server to release resources.

Appendix C: Client provides identifier (Normative variation)

If we are happy to let clients present identifiers to the server, we can adjust the protocol's first step as follows:

Establish identifier

A two way exchange between the client and server establishes a message identifier and a location to send the message to:

  -->
  POST /rmservice?cid=249D6557-CA00-4f71-92B6-BB05302E940A HTTP/1.1
  host: http://www.example.org

  <--
  HTTP/1.1 201 Created
  Location:
  http://www.example.org/rmservice?cid=249D6557-CA00-4f71-92B6-BB05302E940A?sid=2238244

Here the server uses the client identifier along with an session style identifier of its own. The session identifier does not have to be a GUID, though use of a GUID is encouraged. The client request for the identifier/location pair can be made multiple times. Steps 2 and 3 are as before and constraints 1 and 2 still apply. There is no more state being carried than before, so this should not harm the server's ability to cluster or load balance requests.

Note: in this case URL structure is important. Servers must provide a base URL to which clients can post identifiers. Clients must append a query string with this syntax:

  {base-url}?cid={identifier}

Strictly speaking the server-sided identifier is unnecessary, but it serves to protect the server from clients creating clashing identifiers by placing the client in a namespace.

Where used, the server identifier must not be the same as the identifer provided by the client, and servers must augment the query string using this syntax:

  {base-url}?cid={identifier}?sid={identifier}

Implementing this variation is optional. It requires client and server have a shared understanding of the URL structure. Servers that do not implement this variation should return 501, Not Implemented.


 TOC 

References

Leach and Salz, UUIDs and GUIDs, http://www.dehora.net/doc/draft-leach-uuids-guids-01.txt

Prescod, Paul: Reliable delivery in HTTP, http://www.prescod.net/reliable_http.html

Fielding et al: Hypertext Transfer Protocol -- HTTP/1.1 , http://www.ietf.org/rfc/rfc2616.txt

Bradner, Scott: Key words for use in RFCs to Indicate Requirement Levels, http://www.ietf.org/rfc/rfc2119.txt

Lynch Nancy: Distributed Algorithms, ISBN 1-55860-348-4