« Subversion repository merging | Main | To quote Will Smith »

Hey, I'm back

The GWA is back and following GET links again. No big surprise there - the web is going to be rife with this kind of automation before the decade is out.


The technology itself is interesting insofar as we are going to see more and more highly automated robots enter the web over the next few years, especially now there is more available out there than scraping HTML. Even more interesting is the kind of outrage holding forth in places like Signal v Noise:

"Google is wreaking havoc by assuming everyone is doing things 'right'."

It's hard to know where to begin with a statement like that. One wonders - what should Google assume? Perhaps the person that said that doesn't appreciate that what makes the Internet and Web actually work at all are are the collective of shared assumptions embedded in software that allows systems to function together. Take those away and you have no commons, no platform for innovation that makes the creation of buggy spec-unaware applications possible at all. Here's the deal - if you are in the business of allowing the use of GET for something like 'delete.php', the GWA has you more or less bang to rights.

Or how about this:

"For you knee-jerkers who like to go on about how it is the fault of the application designer and cite the HTTP 1.1 Spec (RFC 2616) where it says '[…] GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval.', you might also want to familiarize yourselves with RFC 2119, 'Key words for use in RFCs to Indicate Requirement Levels'.You will see that SHOULD NOT is distinctly different from MUST NOT, ..."

I talked about just this issue the last time GWA was let loose. Yes we need to familiarise ourselves with those specs. Yes, SHOULD NOT is different from MUST NOT. But it's not different in the way that allows us to abrogate responsibility or engage in misdirection. In the IETF and W3C, the specification directive "SHOULD" is a much harder and more stringent specification that it sounds, or is used in the commercial sector. Instead of hand-waving about RFC2119, let's read what the spec actually says:

"SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label."

I think that's quite clear, meaning roughly, "unless you have an exceptional case you MUST NOT do this". It is not meant to provide plausible deniability. It's not the soft 'P3' class kind of spec you see described get in commercial requirements documents - SHOULD does not mean optional, or nice to have. It will be interesting to hear whenever someone tries to play this card, just what their exceptional case was that lead them to using GET instead of POST. If this was a security issue, and do note, that under some interpretations it *is* a security issue, the action would be clear - fix it. It's a bug. I'm tending to fall on the same side as Sam Ruby on this one - web developers need to provide reasonable care.

Round up the usual suspects

So not only do we have people that do not read web specs when they need to, we seem to have people that read them at the wrong time to attempt to retroactively establish plausible deniability.


Perhaps it's too inconvenient to follow Internet specs. Perhaps it's easier to pretend to live in some kind of realtechnik psuedo-reality where specs don't matter, than fix your work. I'm shocked, just shocked :)

October 24, 2005 10:17 PM


(October 26, 2005 04:15 AM #)

As a stop-gap, what's your opinion on adding rel="nofollow" to these sorts of links, which I've seen a few web apps do?

Bill de hOra
(October 27, 2005 02:25 AM #)


nofollow: it (might) work if robots obeyed it. Is it uniform though?

But then might not people add nofollow and go home with a fix that has less standing than SHOULD NOT (for any definition of SHOULD NOT)? And... 6,12,18,24 months from now we'll be back here listening to people calling foul on some other accelerator that doesn't respect nofollow.

Scenario: someone writes one of these robots with the malicious intent of trashing data, as opposed to coding to spec. Then it'll be recast as a security problem, and people *will* fix their apps and frameworks.

(October 27, 2005 03:07 AM #)

Yeah, nofollow is only a step on the road to a proper fix.

It's a pity that the W3C won't be doing any fixing of HTML to make it easier to avoid this idiom. Maybe WHAT-WG will get some traction with nested forms etc.