" /> Bill de hÓra: October 2005 Archives

« September 2005 | Main | November 2005 »

October 27, 2005

To quote Will Smith

I have got to get me one of these.

October 24, 2005

Hey, I'm back

The GWA is back and following GET links again. No big surprise there - the web is going to be rife with this kind of automation before the decade is out.


The technology itself is interesting insofar as we are going to see more and more highly automated robots enter the web over the next few years, especially now there is more available out there than scraping HTML. Even more interesting is the kind of outrage holding forth in places like Signal v Noise:

"Google is wreaking havoc by assuming everyone is doing things 'right'."

It's hard to know where to begin with a statement like that. One wonders - what should Google assume? Perhaps the person that said that doesn't appreciate that what makes the Internet and Web actually work at all are are the collective of shared assumptions embedded in software that allows systems to function together. Take those away and you have no commons, no platform for innovation that makes the creation of buggy spec-unaware applications possible at all. Here's the deal - if you are in the business of allowing the use of GET for something like 'delete.php', the GWA has you more or less bang to rights.

Or how about this:

"For you knee-jerkers who like to go on about how it is the fault of the application designer and cite the HTTP 1.1 Spec (RFC 2616) where it says '[…] GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval.', you might also want to familiarize yourselves with RFC 2119, 'Key words for use in RFCs to Indicate Requirement Levels'.You will see that SHOULD NOT is distinctly different from MUST NOT, ..."

I talked about just this issue the last time GWA was let loose. Yes we need to familiarise ourselves with those specs. Yes, SHOULD NOT is different from MUST NOT. But it's not different in the way that allows us to abrogate responsibility or engage in misdirection. In the IETF and W3C, the specification directive "SHOULD" is a much harder and more stringent specification that it sounds, or is used in the commercial sector. Instead of hand-waving about RFC2119, let's read what the spec actually says:

"SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that there may exist valid reasons in particular circumstances when the particular behavior is acceptable or even useful, but the full implications should be understood and the case carefully weighed before implementing any behavior described with this label."

I think that's quite clear, meaning roughly, "unless you have an exceptional case you MUST NOT do this". It is not meant to provide plausible deniability. It's not the soft 'P3' class kind of spec you see described get in commercial requirements documents - SHOULD does not mean optional, or nice to have. It will be interesting to hear whenever someone tries to play this card, just what their exceptional case was that lead them to using GET instead of POST. If this was a security issue, and do note, that under some interpretations it *is* a security issue, the action would be clear - fix it. It's a bug. I'm tending to fall on the same side as Sam Ruby on this one - web developers need to provide reasonable care.

Round up the usual suspects

So not only do we have people that do not read web specs when they need to, we seem to have people that read them at the wrong time to attempt to retroactively establish plausible deniability.


Perhaps it's too inconvenient to follow Internet specs. Perhaps it's easier to pretend to live in some kind of realtechnik psuedo-reality where specs don't matter, than fix your work. I'm shocked, just shocked :)

October 19, 2005

Subversion repository merging

I knew that you could split a repository in two with svndumpfilter, but I didn't know the load command could be used to merge repositories together. Mike Mason explains how. Nice.

Mike's also the author of "Pragmatic Version Control using Subversion". We have a few copies in the office - good book.

October 17, 2005

Everything is pretty well fixed now, I think*

Michael Coté: "In so many cases in the past decade, the real risk was doing nothing at all.

* 17 October

October 15, 2005

On the impossibility of interoperation in 21stC Ireland

so: Stupid gravy granules

me: What?

so: I said - "Stupid gravy granules".

me: Huh?

so: Gravy granules. The recipe says three quarters of an ounce of gravy granules. The gravy granules thing is metric. And we only have a teaspoon.

me: That's what we call systems integration in work.

so: >sigh< Just get me a spoon, willya.

October 03, 2005

Enterprise Blogging in Practice

Good read from Michael Cote, that goes beyond the "usual what to do to not get fired" about corporate weblogs: Enterprise Blogging in Practice, Notes. I especially like the parts about maintaining these kind of things after you deploy them, where the expectations move from "wouldn't it be cool to have a..." to "it's down! oh! my! god!" If the service is remotely useful, that can happen very quickly.

Real programmers manage petabytes with Java and Python

Spotted on FoRK:

"I'll just mention that Fermilab uses a home-grown python package called Enstore to manage their data store of 3 Petabytes of physics data, growing at 1PB/year. The transfers of ~25TB/day to and from that system is what keeps me busy.
There's also a Java-based front-end called Dcache for caching and grid access. Part of that system just got a pile of raid units. 42 of them. They each hold 42 disk drives. Of 400GB. That's ~705 TB. " - Wayne Baisley

A pdf about it here

October 02, 2005

Serving up XML feeds from TextDrive

According to the feedvalidator my feeds as generated by Movable Type are being served up as ASCII, even though they are encoded as UTF-8.

Here's what to do. In the folder where your feeds live, create a .htaccess file and populate it with suitable AddCharset and AddType directives. For example I serve index.xml for RSS1.0 (which is RDF), index.xml (RSS2.0) and atom.xml ( Atom) along with an rsd.xml file for the RSD spec. Here's the .htaccess file:

  <Files "atom.xml">
    AddCharset UTF-8 .xml
    AddType 'application/atom+xml; charset=UTF-8' .xml
  <Files "index.xml">
    AddCharset UTF-8 .xml
    AddType 'application/rdf+xml; charset=UTF-8' .xml
  <Files "index.rdf">
    AddCharset UTF-8 .rdf
    AddType 'application/xml; charset=UTF-8' .rdf
  <Files "rsd.xml">
    AddCharset UTF-8 .xml
    AddType 'application/xml; charset=UTF-8' .xml

Incidently, it looks like JavaBlogs is barfing on Atom1.0 feeds (I imagine that's down to Rome only supporting as far as 0.3). I repointed JavaBlogs at an RSS1.0 Java category feed for the time being (no doubt the last 20 something posts will show up over there).

Integrating Ivy with AntAnt to manage jar files

I've been using an Ant script to generate Ant scripts and build structures for over 4 years now. Given that all anyone does with Ant files is cut and paste from the last one they used, it seemed that a scripted aproach would help. I've been planning to put it on a public subversion server and open source it since forever, but never seemed to have had the time. The current incarnation of this tool is imaginatively called AntAnt. In no way is AntAnt comparable to a rich build framework like Maven.

AntAnt is a cross between what Michael Feathers calls a stuntwork and what David Heinemeier Hansson calls opinionated software. It does a single job, gives precious little options, and then gets out of your way. The single job is to generate a consistent build setup for your java projects and allow you to get on with coding instead of understanding what the last guy did. It doesn't have options for laying things out - you get what you get. Aside from time, the latter has been another reason I've never gotten round to open sourcing this; I always figured that everyone else would need a raft of configuration options and I frankly couldn't be bothered doing the work when the feedback came in. Personally, after 8 years of Java programming, the options I need for build setups are baked into AntAnt. I haven't changed the basic functionality and layout for a few years now. The last chunk of working was upgrading to use Ant 1.6, which cut out a lot of duplication out of the generated build files, and have the project layout play nice with Eclipse. Options and flexibility just seem to cause trouble in build systems.

The one thing it doesn't do, and I have always wanted it to do, is jar management - not so much dependency management but an answer to the question "should jar files go into source control or not?". Putting them in source control is great because you can checkout and build. Putting them outside source control is great because your repository will suddenly be 20% the size it was. It's pretty nasty when you checkout a project and find you can't build it because a sysadmin took the web server hosting your jarfiles away. It's pretty nasty when you checkout a build find you can build it, but it breaks in production because the project is depending on the head of another project for development and version 2.6 of the other project in production. My observation is as follows: developers want jar dependencies in source control and everyone else wants them anywhere but source control. One compromise for large multi-project repositories is to check the jars into a global lib dir and point the sub-projects at it. At least that way you'll only have one copy of servlet.jar checked in instead of six. AntAnt allows you setup a global lib area like this if you need it along with the usual lib folder for a project.

Looking around, it seems that Ivy might have an answer to this question. Ivy's claim to fame is primarily transitive dependency management in Java. But it also lets you declare a local file repository and/or a manifest of servers (ie ibiblio) to pull down jars into its repository cache. It works like Maven, but is easier to setup for the local file system case (imo) and easier to get it going with Ant (again imo, and yes I know about the Maven ant tasks).

So I just integrated with Ivy this afternoon, from following the documentation and how the WebWork2 code base uses it. AntAnt now gives you an ivy.xml and ivyconf.xml to set up and automatically treats any global lib area as a file based repository for Ivy. It seems to work ok and will cut out some duplication. Conclusion: Ivy is a great tool and I should have done this a year ago. [Business people would get much better software if we had 3 day weekends.]

I'll be going with Ivy+AntAnt, at least until Maven2 is wrapped up. Maven2 is looking much much better than Maven (here's hoping they throw out XML scripting entirely for the final release). But it's still a lot of overhead. About the only thing I might do is get AntAnt to generate the standard Maven project layout instead of my own, so people have an upgrade path to Maven (in Subversion this isn't so bad, but in CVS doing surgery on folder layouts is a nightmare). Irrespective of my opinions on the Maven the software, Maven the collection of idioms will probably dominate Java development from here on out.