James Snell: "Let’s set a few ground rules. First, assuming that we need batch updates at all, we need a format that can work with more than just Atom entries. Second, we need a format that does not require us to bastardize Atom or use it for purposes for which it was never intended and is not well suited. Third, we need an approach that does not duplicate the disadvantages of the WS-* model by circumventing key elements of HTTP and REST.
Consider the following:
PATCH /my/atompub/collection HTTP/1.1
Host: example.org
Content-Type: multipart/mixed; boundary=batch
–batch
Content-Id: <batch-1>
Batch-Operation: POST /my/atompub/collection
Host: example.org
Content-Type: application/atom+xml;type=entry
<?xml version=”1.0″?>
<entry xmlns=”…”>…</entry>
–batch
Content-Id: <batch-2>
Batch-Operation: DELETE /my/atompub/collection/entries/2
Host: example.org
If-Match: “ABC123XYZ”
–batch–
The PATCH operation is telling the server that the request entity contains a set of instructions for how the server resources are to be modified. The Content-Type value “multipart/mixed” tells the server that the ordering of the parts is significant. Each part represents a single batched HTTP request, complete with headers. The Batch-Operation header represents the HTTP request method and uri. The Content-Id header provides the identifier for each batched request. Notice the each batched request can target a distinct resource."
James is trying to provide a better option than Google's batch uploading protocol. This is better, but it's still problematic.
I've done enough messaging over http and bulk publishing work at this point to believe that web batching is
- Complicated because HTTP and REST does not support batching (or contrapositively, iterators).
- Requires its own uniform semantics independent of any other method.
- Should be avoided, if possible.
I think if it's to be done at all, a BATCH method is required. It really is so different from everything else, that overloading any existing method will be broken. Some specifics:
- PATCH. PATCH is incredibly useful, and I'm glad to see it being revived, but not like this. PATCH is suited for fine grained manipulations of individual resources, the exact details of which vary on the media type. Using PATCH here does consider the media type (multipart/mixed; boundary=batch), but it fundamentally switches on the resource as well, making the resource 'typed'. You have to know that the resource is some kind of master container for surrogate resources. It's the wrong tool for the job.
- Response codes. I think a batching operation has to return 201, 400, 401, 403, 404, 5xx and nothing else. Your batch op is accepted (even if nothing in the payload was), sucked, unauthorised, refused, or the server/upstream bailed. In any case, the response codes for this need serious thought because of the way they react with the nested responses. This alone is good enough for me to say we need a different uniform method.
- multipart/mixed; boundary=batch. Perhaps we need a new media type as well. I think I'd like to see each mime block contain a full and standard HTTP request instead of the Batch-Operation header. Batch-Operation is another clue that we need a new method.
8 Comments
How about `multipart/mixed` with `message/http` parts? And returning 207?
Yeah, I agree, having a separate BATCH method would be best. I'm still stuck on the "do we really need this" issue tho. There seem to be quite of few people who are convinced it is necessary but I've yet to see a really compelling use case beyond offline synchronization (e.g. an offline client queues operations that are sync'd with the server all at once when the client reconnects).
It really seems as though any sort of Batch and Patch operation should stay application specific. It makes more sense to POST a set of operations as an XML file that is read and acted upon, instead of placing this within HTTP. Patching also seems like a very tough candidate simply because patching something like XML or an image is not trivial. This might be my own misunderstanding, but it seems like specifying how to patch different files based content types would be nothing but a mess in the long run, with little to no gain.
That said I be misunderstanding the problem BATCH and PATCH solve. I followed along the discussions within AtomPub regarding how a server should handle updates and felt the solution that the server is in control makes the most sense because it means clients should expect to learn what the server will do. I think this subtle point is exceptionally important in that it means if client implementors do not want to be surprised (and surprise users), they must use the server's definition of operations as their guide. This means that the server can document how PUT operations update Atom entries.
The imperfection is that you lose some interoperability in the sense that any AtomPub client cannot effectively work with any AtomPub server. Again, the AtomPub WG suggested profiles as a solution, which seems to be a great way to add a slightly more detailed layer on top of AtomPub without convoluting the core protocols.
Eric, one of the primary characteristics of the PATCH draft is that it explicitly states that *how* the patch is applied is dependent on the patch format and the type of resource being patched. The method is defined only so that the intent of the request can be expressed explicitly in the typical HTTP-Way, and so that details such as response caching can be dealt with consistently. The definition for BATCH should follow the same philosophy. The method indicates that a BATCH operation is being requested; it's up to the server implementation to determine whether it supports the data format used to describe the batch and whether the resource identified by the request URI supports BATCH operations.
James, I think I understand what you mean. It sounds like the patch file itself defines how the patch is applied. I do like the idea of response caching and that other HTTP benefits get applied. I still don't know if it is totally necessary, but the concept does make more sense to me now. Thanks for the explanation!
I'm looking into this as well right now; this post motivated me to put my thoughts online. Yes, I think we do need this.
http://www.abstractioneer.org/2008/02...
I suggest just using POST with a well known URI, which is responsible for orchestrating the batch operations. If we had BATCH, that'd be great, but we don't. And in this case a unique URI actually makes sense, as the batch could be applied to any set of resources (even, potentially, resources not under that server's control... if the right authorization data is passed along).
Regarding James Snell's comment on use cases, yes, there are good use cases. A good example is a client uploading email addresses of 100 people to some server managing address book. Due to some environmental constraints, the client can't do 100 GETs, update those address book entries in memory, and do 100 (conditional) PUTs (too slow). There are also cases where the server would like to execute the batch atomically because it may be more efficient to do 100 database updates in one go than doing the same 100 updates one after the other, which, by the way, James Snell's post does not address.
Interesting, we have been debating how to do a similar thing with CouchDB for batch document updates. Something to think about.