« links for 2007-08-08 | Main | Phat Data »

Web resource mapping criteria for frameworks

Here's an axis for evaluating a web framework - how well does it do resource mapping?

Updating web resources

Let's take editing some resource, like a document, and let's look at browsers and HTML forms in particular, which don't a do a good job of allowing you to cleanly affect resource state. What you would like to do in this suboptimal environment is provide an "edit-uri" of some kind. There are basically 5 options for this; here they are going from most to least desirable

  1. Uniform method. Alter the state by sending a PUT to the document's URL. The edit-uri is the resource URL. URL format: http://example.org/document/xyz
  2. Function passing. Allow the document resource to accept a function as an argument. URL format: http://example.org/document/xyz?f=edit
  3. Surrogate. Create another resource that will accept edits on behalf of the document. URL format: http://example.org/document/xyz/edit
  4. CGI/RPC explicit: send a POST to an "edit-document" script passing the id of the document as a argument. URL format: http://example.org/edit-document?id=xyz
  5. CGI/RPC stateful: send a POST to an "edit-document" script and fetch the id of the document from server state, or a cookie. URL format: http://example.org/edit-document

The first option, uniform method, isn't really available to us, unless we intercept all POST HTML form actions with javascript and map them to PUT.

The second option, function passing, we can have if we can name a resource and the code handler for that resource can in turn handle parameters on a POST request (or in the POST body). We have to be careful that GET and HEAD don't result in edits, but typically they'll result in the form being served. Visually, it's nice and clean; we've identified an action surrogate URL that you can use with web technology which doesn't support the full complement of HTTP methods but will send via POST. With a function passing, the resources are still first class citizens, not the methods.

The third option, surrogate, we can have if we can pick apart the URL so we know it's an edit operation on a specific document. We have to be careful that GET and HEAD don't result in edits, but typically they'll result in the form being served. Even though we've increased our URI space as a function of functions, this will do fine as a way to let HTML clients POST new state.

The fourth option, CGI/RPC explicit, names the method instead of the resource. It's non uniform RPC. You also see web APIs follow this pattern. Sometimes that's because Web APIs get built out separately so the API functions get hived off from the actual resources and it can seem hard to see how to integrate a web site with a bolted-on publishing engine. I suspect it's often it's because people think Web APIs are supposed to be like actual APIs.

The last option, CGI/RPC stateful, is non-scalable and violates any number of design tenets; we won't discuss it any further here. It's rare enough anyway (unless you're doing SOAP RPC-Encoded, but then I can't help you) .

Surrogates and function passing solve in particular the problem of HTML forms technology, namely that it subsets the available HTTP methods. The intent of either is to provide a sane workaround for clients that don't or aren't allowed to support PUT/DELETE directly to resources because HTML itself doesn't allow PUT/DELETE in forms and a lot of deployed web infrastructure has crystallised around that. Both are superior to the CGI style, and light years away from gateway/rpc programming in terms of intent.

Naming

There's an important point and it's this - you can only support uniform, surrogate and function passing styles if you can supply all of your documents with URLs. The two CGI options only require that you can name CGI scripts. The rest of this post is about how important it is that your framework supports that per-resource naming feature.

I've noticed that, according to some people, function passing and surrogate techniques are architecturally the same as the CGI explicit technique, So these,

 http://example.org/document/xyz?f=edit
 http://example.org/document/xyz/edit

present the same design value as this,

 http://example.org/edit-document?id=xyz 

because it's all just URLs. And you don't get hung up on URLs. They're opaque and the form of them is overrated. Right? At some carpet bombing level of web architecture, that might be. Other places where you might get away with that position are weblog comments, technical op-eds, or a drunken party where no-one's really paying attention. Otherwise, no. The CGI option is deficient. It promotes a non-uniform method to a resource at the expense of leaving your actual resources unnamed. Your world or resources, that is. Think of it this way - if I made every domain object in your Java middleware non-accessible, and all CRUD ops had to be done by passing HashMaps into Manager or Container classes, you'd scream bloody murder.

Framework restrictions

I suspect people do this CGI/RPC thing because their framework makes them do it, by making resource naming a PITA. All they have logically speaking, is CGI, or raw Servlets/ASP.

Older frameworks on the whole do a poor job of resource mapping. That's because most of these frameworks derive from CGI. What's CGI anyway? CGI is a gateway technology to hook "not web" stuff into the web. CGI was never for REST-centric identification of resources, that's why it's a gateway interface. CGI is a hole left in web architecture, for web dark matter. You know how in Stargate how SG-1 travel to other planets using a wormhole portal left by the ancients and kick weekly ass? A CGI script is like that wormhole.

Let's look at that CGI URL again:

 http://example.org/edit-document?id=xyz

The *actual* resource name is


http://example.org/edit-document

anything after a ? in a URL are /arguments/ for the resource. In this case a key for a document in our domain. What's going on here is that the function has been promoted to a resource. It's now a first class thing in our web application. The resource is the script, not the Document. That's a fundamental shift away from REST-centric development back towards RPC style.

If you want to do REST-centric programming in a CGI derived framework, you'll have to emulate resource identification by hacking around common gateways. That's like using a Stargate to pass through *everything*. Getting into your car would involve a Stargate. Going to the bathroom? Use a Stargate. That's what programmers call pointless indirection. If your default programming model for the web is like CGI (Stargates for the Web), The second you want to go REST-centric you'll be way off in terms of your solution space, because you shouldn't have started "from there" in the problem space. The right problem to start with is "how do I program to a world of resources?".

Resource mapping checklist

The first thing you have to be able to do to program a world of resources is name them. That way they can all have URLs. Then you'll want to map a finite amount of handler code onto requests against them. What you want from a web framework in terms of resource mapping is a language expressive enough to do the following:

  1. Quantify over the set of your resource names
  2. Match subsets of your resource names to particular code
  3. Allow passing of named functions into resource URLs
  4. Allow deconstruction of URLs to determine an actual resource from a surrogate.

If you're choosing a web framework and it can't provide a non-CGI URL for every resource in your domain, pattern match meaningful subsets of URLs to internal code, and, it won't let you either pass functions to resources or remap surrogates to your domain, then it's technically inadequate. It's failed its mandate.

I call these criteria out because I've heard arguments in the past that said CGIs and actions are necessary when you have a rich domain, or just a lot of objects. Here's how that goes - for each action you want to support you need to expose script to handle that action, taking a key as an argument. Perhaps the number of scripts is multiplied by the number of types in your application domain (eg edit-user, edit-blog, delete-user, delete-blog), or maybe you'll pass in type parameter as well. Unfortunately exposing scripts like this for state manipulation has nothing to do with your rich domain - it's a sign your framework is deficient and has lead you down the garden path.

The worst case scenario for web programming is that you can't name your resources at all (some frameworks will not let you do this, Struts being the example I'm most aware of). You're hosed unless you write a micro-framework on top that can distribute names. It's heartbreaking when a web framework won't let you do the right thing and forces you to expose CGIs instead of simply passing state.


August 11, 2007 09:45 PM