" /> Bill de hÓra: February 2006 Archives

« January 2006 | Main | March 2006 »

February 27, 2006

Edge Case

Martin Fowler:

"Another good warning sign of trouble is the Data Class - a class that has only fields and accessors. That's almost always a sign of trouble because it's devoid of behavior."

This is good OO design advice inside the application boundary, for local cases. But when it comes to working at the application edges - at the network boundary, over HTTP, between applications, intra-app messaging - well, "I have a doubt." For example Data Classes are a really really good thing, if they happen to be Atom served over HTTP or across an MQ. The advice about Data Classes is not actually wrong tho', it's just highly context sensitive. It's suprising that Martin Fowler doesn't qualify his advice as pointing out edge cases is something I've come to expect in his writing.

Anyway. There's a lot of system edges these days, and agreeing where the edges are is hard. Arguing that data classes are bad design is the kind of advice that can result problems when working with HTTP or SOAP, or as Steve Loughran out it, "The doomed attempt to seamlessly map from local structures and method calls into XML documents". Object-centricity and one-size-fits-all approaches to modelling is the sort of thing that stops me really liking books like Domain Driven Design, which is also down on Data Classes as well as Service Layers.

update: Tony Coates goes into a bit more detail on this.

February 24, 2006

ROME article

Mark Woodman has written a good introductory piece on the ROME Java library for Atom/RSS on xml.com :
"ROME in a Day."

via Dave.


"My general experience has been if you have a shift-reduce, or worse yet a reduce-reduce, conflict in your LALR(1) grammar, your users will have bugs in their code. " - Brian Hurt

Unintended Consequences

A cautionary tale from Stefan Tilkov: "It was brilliant. It was hard to implement, but two very smart guys did it. It rocked. It was also totally unusable."

February 23, 2006


"The only people that can make a living from whining are standup comedians."- James Governor

Small Is The New Big. Or Something.

Mike Champion:

"I've yet to see compelling, detailed, practical examples where real business are solving the problems that WS-* addresses with only HTTP XML (unless one ignores the bazillion person-hours and immense amounts of code that industrial-strength web sites have so far had to deploy). And I'm still waiting for pragmatic guidance on exactly how to put these ideas in practice for an organization that needs something more complex than a stock quote service."

What are the complex problems that WS addresses? If WS-* is solving them why are people complaining about WS-* instead of exalting them? What is it about WS-* that would reduce code and bazillions of man-hours for industrial strength websites?

I have this sneaking suspicion that some complex problems are complex because there are intrinsically difficult issues, but that some complex problems are complex because of arbitrary and self-created matters, brought about by the technology, the process, bogus requirements. I'm not disputing at all there are classes of systems that are as Mike says they are, just wondering if perhaps they are edge cases, and how and why WS-* helps there, in a way where you could point to a solution and say WS-* was critical to success.

As for XSD it just does not seem to be helpful. I think that has to do with not only XSD being complicated (due to a requirements creep toward genericity that resulted in something of a monster spec). It's also quite low-level, and in practical terms tends to expose a platform's type system, which is invariably implementation specific. Which is to say there's an encapsulation problem when using XSD - too much how, not enough what. It would be cleaner to expose domain level structures like "Customer" that obey a RELAX or RDF schema, than an assemblage of somebody else's x86 CPU optimized primitives. The applications can map the domain object any way they see fit, rather than telling each other how to do it.

There is one nice feature about the REST or XML-over-HTTP approach - you can definitely scale down. Scaling down is important because if you can't scale down you presumably have to start big, which is risky for any project, unless you believe big working systems derive from big working systems.

Elsewhere I liked Steve Loughran's just use XMPP suggestion. XMPP rocks. Publicly the fuss around XMPP will be in the commercial sector - about Google Talk and Voice Over IM. I think it could quietly become the "other" protocol for the get-it-done school of systems integrations, mainly in situations where push or timeliness is a serious need, and people are talking crazy talk, like long haul JMS integrations. A while back Cote' and I had an email exchange about messaging and system monitoring. XMPP came up as something that has nice scale characteristics for recording events in a messaging system. This isn't just the usual scale handwaving - it's important because for every message that passes through a system or cluster of systems you'll have more than one interesting event to record. My experience seems to average out at 10 events per message. If so, the eventing/monitoring system has to scale at least order of magnitude greater than the messaging system. [Here's an example of a SOA that melted down for exactly that reason.] And if you have an event backchannel that scales at 10x the messaging system, at that point you may wonder whether you couldn't use the backchannel altogether.

February 18, 2006


Steve Loughran: "As a result, every OSS project that depends on JUnit has suddenly stopped building on Gump; That is, 57% of all projects, 441 teams."

Sounds bad. But doesn't that mean lots of apache projects were linking to the JUnit CVS MAIN rather than a label like 3.8.2, or via a versioned maven/ivy binary dependency?

Update from Steve: Gump builds against the head of everything. but:

"What junit did was not only break everything on gump, but provide no warning of it by not updating their public SVN repo. If they had done that, the problem would have been found and worked around a long time ago."

RSS2.0 - have it your way

On the Web there is a simplicity cult of people.

Adam Green: "And the simple fact is that RSS sucks. Anyone who works with it knows that there are huge holes and weaknesses in the spec and the current implementations"

Just when you thought RSS2.0 was frozen. Not at RSS2.0, but RSS2.0.1. RSS2.0.1 has in turn seen 6 revisions since 2003. There's now an RSS2.0.2 draft spec. There's a school of thought that says changes to the RSS2.0 spec should leave the version at 2.0 since all the changes are really just clarifications and filling out things that were once unsaid. As if we lived in some imaginary place where spec clarifications don't break software. The spec mind you, is "frozen" at 2.0.1. To be it precise has been frozen for the last 6 revisions of 2.0.1. Although there might be "clarifications" resulting in future minor versions - not errata - clarifications. And each minor version can presumably have revisions (in the same way RSS2.0.1 has had six revisions).

I believe I need new working definitions of "frozen", "versions", "revisions" and major.minor.patch numbering schemes if I'm to understand what's going on with RSS2.0. Overall governance of the spec is unclear.

And there's me worrying about Atom confusing the market.

February 14, 2006


Sean McGrath: "The problem of identification is ultimately a problem of language."

February 13, 2006



February 12, 2006

Subversion tips: working with branches

update: Ian Bicking has a good followup on branching practices

Subversion is great software, essentially a major upgrade of CVS. Its branch support is stellar, for a few reasons:

  • Visibility: Branches are physical copies, you can see all branches, stored by convention in the /branches folder. This is unlike CVS (or VSS) where branches are placed in the time dimension and are invisible, hidden "behind" the CVS HEAD revision.
  • Efficiency: Branches are calculated as deltas and are not full physically copies, they are efficient and cheap to create.
  • Global revisoning: the entire repository gets versioned on every change. As a result merging can be applied as the merging of two source trees; this is much easier to think about and execute than merging between two sets of files, as is the case with CVS.
Nonetheless there are some things you still need to take care of. As well as that many developers have learned to dread branches, either becasue of poor practices or weak tools, or both. With that here are twelve tips for working with branches in Subversion

Update before email. Updating against the repository should be the first thing you do in the morning, even before reading your email. This tip isn't specific to branching, but it's so central to having a good working practice with any form of source control, I'll mention it here. Some of the biggest development issues with source control can be traced directly to not updating frequently. Do it until it becomes muscle memory. Email is a terrible way to start the day anyway.

Put the branch revision number in the comment. When you create the branch from the trunk, make a note of the revison number the branch was created from. For example:

svn copy http://svn.example.org/foo/trunk \
http://svn.example.org/branches/foo/mybranch \
-m"Created foo/mybranch branch from rev [20] of foo/trunk"

Subversion does not constrain the scope of a merge to a branch, so you have to tell it to only merge changes on the branch that have happened since the branch was created. Otherwise you'll get everything that happended before the branch brought across which screws up the changeset. Treating branches specially is something that might get added to subversion in the future, but for now you'll have to do it yourself.

Backport to the branch. Come the glorious day when you merge your changes back into the trunk, things will go much easier for you if you have tried to keep up with the changes on the trunk. The easiest way to do that is keep merging changes on the trunk onto your branch as frequently as possible - aka "backporting". Here's an example of merging changes from a branch that was created in revision r20 above while the repository has moved on to version 25 due to changes on the trunk:

  cd /branches/foo/mybranch
  svn merge -r20:25 http://svn.example.org/foo/trunk . 
  svn ci . -m "foo/mybranch: merged to [25]"   
The smaller the changeset the easier it is to manage issues - so prefer lots of little updates than one big bang integration, that could take days, or just not be possible to complete at all.

Put the revision number in the backport. This for the same reasons as putting he revision number in the initial branch comment. You only want the changes on the trunk made since the last backport merge. Suppose the repository has moved on to version 30. Because we made a comment on the last backport telling us what revison we merged to (25) we know we only need to merge from r25 onwards:

  cd /branches/project/mybranch
  svn merge -r25:30 http://svn.example.org/foo/trunk . 
  svn ci .-m "mybranch: merged to [30]"

Take a merge for a dry run. The merge command has a flag called "--dry-run". This allows you to see what the result of the merge will be without actually applying it to the target. It's useful if you have any doubts that the merge will succeed or what it's ging to apply to. On this front if the merge goes to hell you can always run the revert command to clean up your working copy.

Don't forget to commit a merge. Merging only applies changes to the working copy. You have to check those changes into the repository with the "commit" command.

Prefix branch comments with the branch name. This makes scanning the log history easy. Those that come after will thank you. Here's an example of the right thing from Django magic-removal branch:


Merge from the target context. One thing that can be confusing with merging is making sure you don't get your merge sources (what you;re merging) and targets (where you're merging to) mixed up. It's much easier to get this right if you get into the habit of running a merge from the target. That way you can think of it as taking in merges from somewhere else.

Never check into a tag. The convention for tags is to place them in a /tags folder in the repository. Tags are meant to be read only snapshots of your code. It's tempting sometimes to check little fixes into tags. Try not to do this - someday you will forget to put that change into the trunk as well and the next tag will be hosed in a way that is difficult to track down. And those little changes will get bigger and bigger over time. In subversion creating new tags is a cheap operation (both time and space). Instead, check the change into the trunk/branch, retag and release. Aside from code management another problem is confidence - seeing commits into the /tags folder lowers confidence in the integrity of the codebase. Nobody wants to think about tags that are actually branches.

Minimize the number of active branches. Branches can be useful, but too many of them is indicative of problems, typically of poor communication amongst developers or an inability not to break each others' code. Branches should be created only when neccessary- they're not a good default approach. If you really want to work by having individuals merge changesets, you've probably been following kernel-dev too closely, but you should look at tools that support this model, such as SVK (based on Subversion), Darcs or Bazaar-NG. Subversion is a centralised revision control system, theres not much point fighting it. Ian Bicking's "Distributed vs. Centralized Version Control" is a good overview of the two approaches.

Prune back dead branches: branches that are no longer active or required should be deleted agressively. Developer and experimental branches typically flal inot this category, but it's mroe or less true of any branch that has been merged back to the trunk. Get rid of it and focus on the active lines.

Never branch a branch. Branching a branch is sometime called "Staircasing" since a drawing of branching branches looks like a staircase. In general staircases happen because active development drifts away from the trunk and onto a branch, in turn that usually happens because merging back onto the trunk was too hard to do, and in turn that happens because backporting wasn't done. Crazy as it sounds, branching off branches can happen in CVS almost by adcident. This is because CVS records branches in the time dimension, so you can't see them as you could when branches are physical copies. In Subversion as branches are copies this problem should be alleviated, but it's still its something to be watchful of. Regression merging from branch to branch is a nightmare to manage and is understood to be a revision control worst practice - any configuration manager worth a salt will go a long way to make sure it doesn't happen on their watch. This is what a staircased repository eventually looks like:

escher staircase

So you see, I wasn't kidding about updating first thing in the morning.

Happy branching!

I think I figured out the list comprehensions thing...

Warning - this one is about programming language esoterica.

For the life of me I've never been able to understand why anyone would pefer:

  print [x*x for x in range(1, 10)]


  print map(lambda x:x*x, range(1,10))

and so much so, that if lambda, map() and their ilk ever got dumped from Python, I'd probably be looking at Ruby or Lisp as my upgrade path from Python 2.4. Which is to say, it would *suck* if the day Python turns mainstream is the same day I stop using it outside work. So I turned it around - what is it I like about lambda, map(), et al, beyond some questionable handwaving about functional programming? At least that way, I don't have to go around pestering people about language trivia.

I asked myself, is lambda a cargo cult?


I think I have my preference figured out - it's the "for x in y" bit within the listcomp. That exposes an implementation detail - namely the mechanics of processing a list. I'd much prefer to arrange the function, the sequence and the data structure in concert and let the computer have it, instead of telling the computer how to work the list. Seeing a "for" inside my list comp is like seeing ".cfm?entry=12" in my URL. Yup, it's pretty thin justification. At one level, it *just doesn't matter*. Listcomps are seriously useful. At another one, it matters a lot - how do I || a listcomp? That said, it's nice to know for 99% of the work I happen to do, using list comprehensions over anonyomous functions won't matter. Which is to say, I'm truly being a fusspot about this.

if you are scratching your head now going wtf, that's not your fault, it's mine - this is all quite esoteric, like discussing the relative earthiness of red wines, without the side effect of drinking the stuff.

February 09, 2006


"Switching from 'I hope I can hack this up' to 'I desire reproducible results' mode has sped things along." - International Man of Transparency

SPARQL v XPath v XQuery. It's on!

Joshua Tauberer: "I don't want to talk about SPARQL in this article. I just wanted to show that the types of questions we can ask can easily grow in complexity and 'interestingness' using RDF. No XPath or XQuery query is going to be nearly so concise for those questions."

That's quite a claim.

February 08, 2006

Cote joins Redmonk

Coté joins Redmonk, and finds his true calling. His new analyst stuff weblog is over at People over Process. It'll be a must have subscription for anyone interested in the intersection of agile processes, enterprise computing, systems-management, and zombie flicks. Congratulations all round!

A shift in the mindset

Steve Loughran: "Let's rephrase that. Some teams are happy checking in broken code, and are not prepared to fix this behaviour. That's why I gave it 4/10. Tech good; process bad. If the team is checking in broken code it means that you can never check out good code. Which means that you can never be sure in the morning whether or not the stuff in the SCM repo is any good or not, So you branch for weeks at a time, have integration hell at the end, no stable images in SCM."

February 07, 2006

Design Sketch

"My current experimental system consists of a MoinMoin wiki as the editor/CMS, combined with a Django frontend for rendering."- Fredrik Lundh

Check out the design sketch.

February 05, 2006

Web frameworks reloaded. Just use...

Last year I was dismal about the state of web frameworks, or more accurately the sheer number of them. Things are looking up. I'm down to four stacks!


Powered by Plone..

Ok, so Plone strictly isn't a web framework. But it does a bunch of things that you often end up needing from a framework. User management, document templates, rich editing, i18n, search, live editing, extensibility, portlets, wiki, forum, accessibility, skinning. Stuff like that. All there. In terms of an ootb experience in getting something useful done with content, Plone is unmatched. Plone's actual framework is Zope2 (via a layer called the CMF), which is complicated and not always liked by developers (best explained here by Jason Huggins), but Plone itself is focused on users, not developers. Developers have to suck it up, especially developers who are used to working with database backed sites, ie where the answer to everything is an RDBMS. So be prepared to roll up your sleeves if you want to integrate Plone with another system.


Powered by Struts..

update: turned this one around; StrutsTi means that WebWork becomes a safe bet too.

The recent unification of WebWork and Struts roadmaps surprised me, along with a lot of others. Up to then WebWork and Struts were competing frameworks, where WebWork won the technical battle, and Struts the adoption one. What organisations like about Struts isn't technical, it's social. Most enterprise Java devs know their way around Struts, thus it's less of risk for an enterprise to commit to it. Aside from some things I just don't like about Struts, it was starting to show its age, and with the vendors pushing JSF as the de jure JEE framework over the de facto Struts, you had to wonder about its long term prospects. What Ti means is that Struts ancien becomes a solid bet all round, since the community will be forced to figure out how to migrate onto Ti when the time comes and Struts is less likely to be abandoned. WebWork is a good engine to work with, better than Struts, so Struts devs will naturally upskill to Ti rather than migrate something like Tapestry, SpringMVC or whatever the JBoss crowd are doing this month. All WebWork really needs is first class integration with Jython so it can dispatch actions to Jython scripts. And to turn it around - if you're thinking about deploying on WebWork, the Ti announcement is nothing but good news.


Powered by Django..

update: contextualised the effbot's django quote to mention Turbo Gears.

In describing his recent experience with Django Fredrik Lundh has more or less echoed my own. I installed it, skimmed the tutorial, started writing a webapp, and had something up and running in just about the most enjoyable first half an hour of learning a new technology that I can remember. Django is good software. Really good. The community is in rude health. Even the BDFL is coming around. Python, going mainstream, is in need of a major cull on the web frameworks front, and Django seems to be just the thing to help that process along. As Lundh puts it "At this point, I'd say Django *is* the winner in the LAMP-as-in-Python space, with TurboGears as a "worth keeping an eye on" second -- it's not quite there yet, as a quick scan of the mailing list headers shows.".


Powered by Rails..

Rails is the project I think that convinced other communities, especially Java and Python, that accomodating lots of "me too" frameworks doing essentially the same thing, is really really dumb. [James Governor, is more articulate in describing this as "web framework consolidation" - perfect.] Rails is great fun to use, you get to learn Ruby as a side-effect, and producing decent sites is straighforward. Rails bundles with good Ajax support. One scenario for Rails is that it becomes the Lisp of web frameworks as bigger language communities learn and clone its approach rather than switch. Nonetheless Rails would still be a good framework whatever other communities choose to do.

Naturally plenty of people are going to disagree with these choices and will have good things to say about other stacks. Personally tho', I'm done seeking.

Colophon: why no PHP/Perl stuff? Simply ignorance - I just don't know or use either language for webapps.

February 04, 2006

The Old Fashioned

  • A big lo-ball tumbler, one of the ones that are wider at the top.
  • 1 cube or a teaspoon of brown sugar (not muscovado).
  • Generous dash of Angostura bitters.
  • Lashings of bourbon, I like Makers or Woodford* for this. Wild Turkey fans might need a bit more sugar.
  • Lots of ice.
  • A slice of orange peel, about the size and width of your thumb. Run the peel around the rim of the glass before dropping it in.

Keep stirring until the ice starts to round out and melt (it's easier to stir ice when the spoon's upside down). Then sip. Then stir. Sip. Stir. Add more more bourbon. More ice. Stir. Some sugar. Sip. Stir...

This about the best thing you can do with bourbon. Truly relaxing. What's great about the Old Fashioned is that since it's a true mixed drink rather than a shaken or blended item, it's very forgiving - some cocktails require high precision in the ratio of ingredients, they have to be just so, but not this one. Just add a bit of this and that - eventually it'll come right for ya. The secret to a soothing drink is patience - an Old Fashioned has to be stirred for a few minutes, or it will be too hard. Those who have the tools can crush the sugar with a mortar and use the base of a long cocktail spoon for stirring. You can safely ignore people who suggest using a lemon slice or a cherry by the way.

* Not that I've seen Woodford in Dublin. Any pointers?

February 01, 2006

Confluence niggle

update 2006-02-01: backbutton stuff.

I like Confluence. But using underscores for emphasis and asterisks for bold instead of quotes is a nuisance (on the basis that I shouldn't need to use a shift key to do basic formatting). And then underscore gets mapped onto the plus sign, which doesn't make sense, whereas underscore would makes sense for... underscore. Using hX for calling out headings is nice tho'.

Confluence seems to have some nasty back-button behaviour that lets you lose content. When you edit a page you can flip back between the edit and preview modes using the tabs - that works fine. But if you go to the back button on preview to "get back" to the edit you don't back to the edit, you go back to the page itself - clicking forward takes you the edit screen but with the content as it was at the last save point, since nothing was sent to the server as you were flipping between preview and edit. I've lost content 3 days in a row with this - it sucks. Optimizing server roundtrips at the risk of losing content doesn't seem like a good tradeoff. And "just use rich text editing" isn't a good enough answer to this. update: Charles Miller from the comments - "I'm surprised we don't save a draft when you flip between preview and edit, though. I've filed this to be fixed for the next release: http://jira.atlassian.com/browse/CONF-5366". Sweet!

Plone feed enclosure

"The current portal_syndication tool only provides feeds in the RSS 1.0 format, which doesn't have support for enclosures (RSS 2.0 and Atom do)."