" /> Bill de hÓra: March 2007 Archives

« February 2007 | Main | April 2007 »

March 24, 2007

links for 2007-03-24

March 22, 2007

Cockburn wins Jolt Award

Books General: Agile Software Development by Alistair Cockburn

That's great news Alistair Cockburn is the agile method world's best kept secret.

Btw, some people thought I was referring to his book in my list. It was meant to be Bob Martin's, which has the same title.

Get Some Interop

Kohsuke Kawaguchi on a bug with Hudson/JIRA integration

I believe my WSDL came from the latest 3.7.4. And so it sounds like JIRA is indeed breaking web interface compatibility between dot releases, which is bad. This kind of errors is exactly why people should be using JAX-WS and JAXB, which is much more robust in the face of changing service definitions! With JAX-WS this change would have worked no problem. Presumably people aren't updating JIRA installations that frequently?

I just came across this problem today, with Hudson 1.87 and JIRA 3.5-#131.

So here's a better idea - JIRA stops using this stuff altogether and replaces it with Atom+POST, so client apps like Hudson can function properly and beyond some other systems release cycle. The root problem here, isn't library selection. It's WS-* based proxy service generation pretending to not be an RMI call that requires binary compatibility. And a binary compatible wire call is still a binary compatible wire call, no matter how much XML you put on it.

Closing thought: "information can be utterly, utterly application-specific and still 100% XML compliant."

March 21, 2007

links for 2007-03-21

March 17, 2007

links for 2007-03-17

March 16, 2007


Buko Obele: "What's a bit strange is the logic of replacing one container with another "simpler" container. There's a hidden assumption operating here, I think, that the problem is a specific container. But I wonder if the problem is the very notion of a "container" itself."

Got Grid?

Steve Loughran left an insightful comment on Dare Obasanjo's post, "Amazon S3 & EC2: What's the Endgame?":

"I think S3 is designed to make it easier to move to EC2...EC2 forces you to use S3 for persistence because when a VM dies, it dies completely. This gives you a clean split between transient machine state and persistent data, and is one of the things you need if you want the ability to deploy 50 servers under peak load, but roll back to 3 servers in times of idleness.

however, it does force you to rewrite every single application, or at the very least every database, because saving state to the filesystem doesn't cut it any more. And EC2 is not a file system, even though there is a user-mode linux filesys driver for it for sale. This is one reason why I'm more likely to use a 'classic' single host Xen hosting service in the near future. "

I agree with Steve, and would one go one further. Having the kind of logistic (both digital and physical) prowess they do, along with exposing their internal services as commercial options, makes them able to drive better deals with partner suppliers in the future. It's not just about chasing startups and web companies that want to scale without provisioning truckloads of hardware. If you're Amazon you'd be nuts to depend on SUNW or GOOG for compute and data grids as and when they come online. The Google angle is interesting; we know they're buying up fibre and power, but Google's see their internal IT as a competitive advantage and are notoriously secret about it. Hence they might prefer people use Google apps, not utility BigTables. Amazon can get to market first because their internal IT is the cost of doing business, and unbundling their compute and data grids doesn't present the same kind of competitive risk. Sun are interesting because of their service nous. They might be where you go to get grid when the others' SLAs and unreliability do you in. Otherwise, HP have grid services, which i haven't looked at, and eventually you'd expect MSFT to offer something.

links for 2007-03-16

Future proofing

"dude, where's my phone charger"

When Skynet comes online, that will be a useful link.

Could this be a rails app?

Thank goodness for url opacity - I wouldn't want to be able to see a url with "statuses" in it.

March 15, 2007

links for 2007-03-15

March 14, 2007

Ten books for the working programmer

  1. Agile Software Development
  2. Working Effectively With Legacy Code
  3. Refactoring
  4. The Pragmatic Programmer
  5. Code Complete, 2nd ed
  6. Patterns of Enterprise Application Architecture
  7. Mastering Regular Expressions
  8. Pragmatic Project Automation
  9. The Algorithm Design Manual
  10. The Art of Project Management

Colophon: I first wrote this list out for myself in 2004. Since then I added McConnell at 5, and Berkun at 10, removing Getting Things Done and UML Distilled. The intent of the list is to be language and platform neutral; these books should stay with you across projects and jobs. This isn't a "thou shalt" - I'm just saying I've found these books very, very useful. I suspect the first 5 have been indispensable to me since I joined the industry .

ps: Gunnar makes an important point - nothing on security.

links for 2007-03-14

Resource and Representation

written in 2003


We'll need a basic grasp of a Resource.

In HTTP parlance (or REST, the mandated architecture of the Web), resources are things that are denoted by URIs. Each URI implies a resource. In a positivistic view of things, this could suggest you are creating resources by creating URIs. In real reality new things are not coming into existence, but in web reality, yes they probably are. This should make you suspicious and uncomfortable, in the same way people are suspicious and uncomfortable about dark matter, superstrings or quantum mechanics. Try to find comfort in realizing it's only a model of the web, and only an architectural model at that; one that happens to require the presence of Resources in the same way Plato's cave required things outside the cave to make shadows in the cave.

It's notable that resources are not on the web - you can't reach them or interact with them directly. It's also notable that many many people have built perfectly good websites without ever hearing of, or running into, a Resource.

For the most part, Resources are different from the things people create and manipulate when getting something done on the web; things like web browsers, web servers, web pages, web applications, web sessions and web controllers. I'm saying this because a lot of people you come across working with HTTP aren't thinking about Resources. They're thinking about things like web browsers, web servers, web pages, web applications, web sessions and web controllers. Of course you could assign all of these things URIs and they would be thusly Resources, and 'on the Web'. Sometimes people do just that; for example to provide a management console to a web server, a firewall, or a running web application.


A Representation is what is transmitted when you "dereference a Resource using a URI"; you know, clicking on a link. They're real, more or less. You can send them over the network, fill up a hard drive with them or use them to attack people's computers; that sort of thing. The current state of of the Resource is represented by the Representation (these things are well named). REST, mentioned earlier, stands for REpresentional State Transfer.

In this model, state results when the application traverses a succession of URIs embedded in Representations; you know, surfing the web. Somebody once said "hypertext is the engine of application state", which is an opaque way of describing "surfing the web". But it is why some people, myself included, are wound a bit too tight about calling HTTP a transport protocol, which is very common. There's no application state in a transport protocol, there's just data and control.

How it works

Now let's gets to the Web's dirty secret - nobody builds applications according to the designated model.

When I say nobody, I mean nobody statistically - the number of web application developers informed by web architectural theory is tiny, really tiny. It's growing as the theory (REST) gets exposure and becomes popular due to push back against Web Services complexity and the fact that some big websites have admitted they use it. All that progress requires significant evangelism (read: being annoying on mailing lists, standards groups and blogs).

So. We have a constellation of Resources, named with URIs and clients using URIs over HTTP to get Representations which contain further URIs. Ok, enough with the theory - does that work? I think it does, and I think it can work better than idiomatic use of the web, but it's a really different way of doing things. The question is like asking do continuations, or events, or futures, or functional programming or capability based security work. Yes they do, but there are vast amounts of knowledge and infrastructure vested in the idiomatic alternatives. You'll be starting from scratch and doing a lot of unlearning. except for the ones you build yourself, the tools will not save you - they're dealing with technical idiom, not technical excellence.

March 13, 2007

The play's the thing

This has been sitting in my drafts since 2003: "Web services: Rosencrantz and Guildernstern are not dead". The idea of protocol transport and protocol transfer being very different beasts isn't controversial these days. I figure I might as well publish it, even if it's longwinded.

Operational Language

"A message is overdue when it has not arrived at the destination within the expected time of 24 hrs."


"A message is overdue when its state is 'open' and it has not been sent to the destination within the expected time of 24 hrs"


"A message is overdue if it was received more than 24hrs ago and is still marked as 'open'"

March 11, 2007

What would Guha do?

New York Times: "The idea of a centralized database storing all of the world’s digital information is a fundamental shift away from today’s World Wide Web, which is akin to a library of linked digital documents stored separately on millions of computers where search engines serve as the equivalent of a card catalog."

The NYT reporter doesn't know that search engines are centralized databases of the web's digital content? I'm shocked.

One part of me thinks that when the AI guys can get startup cash, there's too much money chasing too few investments. The other part of me is wondering, what would Guha do? Freebase seems like a rebranded TAP. Mind you, he's at GOOG right now.

I see Danny is being too polite - Tim O'Reilly is arguably willful in his misunderstanding of what the W3C are trying to do with the semantic web. In part that's down to DARPA's contributions - while technically valuable, they are mired in industrial ontology, KR and Agent speak, which rarely sounds convincing, and comes with a lot of baggage.

I still think that whomever can figure out peered search can win big. It's a genuine disruption to what the incumbent engines do - they all download the web into a database. The form of the data - unstructured, structured, semi-structured - isn't the only issue.

links for 2007-03-11

  • Or, how we learned nothing from clippy. What an appalling waste of bandwidth and cpu cycles.
    (tags: search)


Steve Loughran: "once you have your own branch to support, you lose a lot of the value of OSS software."

March 10, 2007

Test First

Google Testing Blog: TotT: Better Stubbing in Python

def Foo(path, **kwargs):
   if path_checker(path):
     return DoSomething()
     return DoSomethingElse()

This looks like a throw back to #ifdef. The problem with this approach is that instrumenting methods to do something based on the caller's context makes for context dependent code. It also makes it harder to read, which is no small concern. Reasoning about code that depends on something other than its inputs is arguably the operational meaning of 'complexity' in software. I'm not sure we want to be promoting one best practice, testing, via some other worst practices. "I'm in the test context" seems like a problem where attributes or aspect annotation would be more appropriate solution.


Cringely: "Server power is easy if we embrace peer to peer"

Interesting. What Amazon is doing right now for consumer storage and compute cycles with EC3 and S3 is relevant. Akin to Walmart building their own power stations and selling back the excess I've not quite understood why Amazon would get in consumer level grid applications. I understand how they can do it (given their in house team and their CTO), but not really why. Maybe it's so they can never be messed around by Internet infrastructure and platform providers when it comes to negotiating deals on these kinds of services.

Why you can safely ignore the ACM

"Of these, I've heard of seven and read five"*

I've felt, for some time now, that the ACM does not speak to me, as a practitioner. I haven't found it relevant since I did basic research in college. Which is to say, it ceased to be relevant when I entered the world of software work. The ACM seems to be stuck is some pre-Web timewarp, where things like object oriention, components and CORBA are exciting new ground.

I have the sense that others feel the same, given the poll's results.

* why don't people their names on their weblogs?

The Magic Numbers

2, 3, 5, 7 are optimal sizes for software project teams. A team of 11 will eventually fragment into 2 or more smaller teams of prime number.

links for 2007-03-10

March 08, 2007

links for 2007-03-08

March 07, 2007

links for 2007-03-07

March 06, 2007

links for 2007-03-06


Rys David McCusker on LtU: "Eventually everyone has to figure out how asynchronous concurrent behavior gets coordinated."

March 05, 2007

links for 2007-03-05

March 04, 2007

links for 2007-03-04

March 03, 2007

Money Down

Programming Erlang

Drum 'n Bass helps programmer to concentrate shocker

If there's better music to put me into the zone than drum 'n bass, I don't know what it is. It definitely helps me concentrate. What works for you?

March 01, 2007

Python: don't return from __init__()

Returning a value from __init__() will cause an exception in Python:

In [1]: 
In [1]: class Foo:
   ...:     def __init__(self): return self
In [2]: 
In [2]: f = Foo()
Traceback (most recent call last)
TypeError: __init__() should return None
In [3]: 

That's because __init__() is not the call to cons in Python; it's an initializer called after object instantiation, though most people (myself included) will treat intuitively treat __init__() as the construction point, 'cos that what it looks like and it takes the same arguments logically as the constructor would. It's rare enough you see anyone returning from __init__ as the code will fail fast (not swallowing exceptions is a Python tenet). Usually it happens when a programmer has conditional logic in the initializer or is doing factory type stuff, all combinations of which didn't get executed during development and testing.

The construction method in python is __new__(). There's a bit of detail here - python basic customizations - and the cookbook has an example of overriding __new__().

Operational mindset

Mike Cannon-Brookes on how Atlassian update Confluence in-house:

"It's two commands I believe and takes about 20 minutes to run, including doing a full backup of the data"

They need to release that job - the last time I upgraded confluence, it took a chunk out of a weekend. But it was worth it!

"The longer answer is that because we know we're going to update twice a week, we try to keep our... code clean at all times. "

I couldn't agree more. For that kind of thing, operational experience trumps a process manual or guidelines every time. It's like exercise - there's not much point talking about doing exercise. The last major project I worked on we built frequent upgrades into the development cycle. It was frustrating initially ("Upgrading? Again? Why!?"), but the project relied on open source and with open source you need to be able to stay near the latest and greatest. The other upside is that we and ops and support know how to upgrade that system because we have upgraded that system over and over. It's not guesswork. The impact of particular changes are well understood (risk) and as importantly an upgrade can be estimated (schedule). There are other good things that fall out of frequent releases and upgrades. First, you'll end up with a decent build and configuration system (out of necessity). Second, your code will tend to be non-monolithic (to support upgrading the minimal amount of code).

[Not unrelated: I was playing with a bamboo evaluation today with a coworker - very impressive for a 1.0. ]