Main

April 19, 2007

Patched svndumpfilter2

update: Simon Tatham has applied a better patch that deals with quotes in paths, as well as whitespace (quotes in svn paths - who knew!?). It's available as rev r7468 of svndumpfilter2.

Sometimes you want to export part of a Subversion repository. leaving the rest behind while keeping the repository history and metadata. The tool for this job is svndumpfilter which operates on Subversion dumpfiles. But svndumpfilter has a serious flaw - if a file or path was copied from a path you're filtering out to one you're filtering in, svndumpfilter won't be able to fill out the history and the job will fail. Simon Tatham's svndumpflter2 cleverly fixes this by looking up paths against the source repository the dumpfile was taken from, using svnlook.

In turn svndumpflter2 has a tiny bug; if a repository path being checked has white space in its name*, and is passed to svnlook as is, svnlook will only read up to the first whitespace, which results in a "path not found" error. A simple fix - placing all arguments to svnlook in the script inside quotes does the trick. As is often the case most of the work was running down why svndumpfilter doesn't work with copies, why svndumpfilter2 was reporting bad paths to begin with, and documenting what was done (this post). Otherwise svndumpfilter2 comes highly recommended - the repository I was working against is on the large side; "du -sh" on its repo folder comes in at 2.5Gb (the checkout is much bigger), with nearly 20,000 commits.

You can get a patched file here - svndumpfilter2. Alternatively "hg clone" the mercurial repository from http://www.dehora.net/hg/tools/ **.




* for this and many other reasons, avoiding whitespace in file names tends to be a good policy.

** Students of irony are welcome to savour the notion of keeping a subversion tool inside a mercurial repository.


November 09, 2006

Switch Blocks

A while back, I said,

"Put it this way - if I can't get down to the Burlo to hang out in the bar with Steve Loughran, I don't have time to change OSes. "

Well, I finally got around it, and spottily documented how it went in a post. That post got picked up by Digg, as a result there's nearly 40 comments, many of which deserve a response. I'll do that as another post real soon now.

Anyway, back then , I asked what does Ubuntu have that covers these off:

  1. Feeddemon
  2. Visio (as clumsy as visio is, dia isn't at the races)
  3. Word screen split (this is what stops me from using Oo all the time)
  4. Copernic

The answers turned out to be:

  1. feeddemon: Bloglines
  2. Visio: Dual boot into windows, be accepting of reality
  3. Word split screen: Write shorter better organised documents, have the discipline to write exec summaries and rollups at the end
  4. Copernic: Beagle

Eclipse is now my Python IDE of choice on Ubuntu thanks to PyDev. Using a Java app platform to write Python apps makes me quite the wit ;). Although a few colleagues had been telling me to get onto PyDev for while. Now I wonder if XULRunner could be packaged as an OSGI plugin for Eclipse - that would be interesting.

Finally - what's up with Debian stable, shipping with Subversion 1.1.4? 1.1.4 is about 18 months old; in the meantime Ubuntu is running 1.3.x. I wanted to use viewvc in work but couldn't as it requires a higher Subversion rev than Debian stable allows for.

February 12, 2006

Subversion tips: working with branches

update: Ian Bicking has a good followup on branching practices

Subversion is great software, essentially a major upgrade of CVS. Its branch support is stellar, for a few reasons:

  • Visibility: Branches are physical copies, you can see all branches, stored by convention in the /branches folder. This is unlike CVS (or VSS) where branches are placed in the time dimension and are invisible, hidden "behind" the CVS HEAD revision.
  • Efficiency: Branches are calculated as deltas and are not full physically copies, they are efficient and cheap to create.
  • Global revisoning: the entire repository gets versioned on every change. As a result merging can be applied as the merging of two source trees; this is much easier to think about and execute than merging between two sets of files, as is the case with CVS.
Nonetheless there are some things you still need to take care of. As well as that many developers have learned to dread branches, either becasue of poor practices or weak tools, or both. With that here are twelve tips for working with branches in Subversion

Update before email. Updating against the repository should be the first thing you do in the morning, even before reading your email. This tip isn't specific to branching, but it's so central to having a good working practice with any form of source control, I'll mention it here. Some of the biggest development issues with source control can be traced directly to not updating frequently. Do it until it becomes muscle memory. Email is a terrible way to start the day anyway.

Put the branch revision number in the comment. When you create the branch from the trunk, make a note of the revison number the branch was created from. For example:

svn copy http://svn.example.org/foo/trunk \
http://svn.example.org/branches/foo/mybranch \
-m"Created foo/mybranch branch from rev [20] of foo/trunk"

Subversion does not constrain the scope of a merge to a branch, so you have to tell it to only merge changes on the branch that have happened since the branch was created. Otherwise you'll get everything that happended before the branch brought across which screws up the changeset. Treating branches specially is something that might get added to subversion in the future, but for now you'll have to do it yourself.

Backport to the branch. Come the glorious day when you merge your changes back into the trunk, things will go much easier for you if you have tried to keep up with the changes on the trunk. The easiest way to do that is keep merging changes on the trunk onto your branch as frequently as possible - aka "backporting". Here's an example of merging changes from a branch that was created in revision r20 above while the repository has moved on to version 25 due to changes on the trunk:

  cd /branches/foo/mybranch
  svn merge -r20:25 http://svn.example.org/foo/trunk . 
  svn ci . -m "foo/mybranch: merged to [25]"   
The smaller the changeset the easier it is to manage issues - so prefer lots of little updates than one big bang integration, that could take days, or just not be possible to complete at all.

Put the revision number in the backport. This for the same reasons as putting he revision number in the initial branch comment. You only want the changes on the trunk made since the last backport merge. Suppose the repository has moved on to version 30. Because we made a comment on the last backport telling us what revison we merged to (25) we know we only need to merge from r25 onwards:

  cd /branches/project/mybranch
  svn merge -r25:30 http://svn.example.org/foo/trunk . 
  svn ci .-m "mybranch: merged to [30]"

Take a merge for a dry run. The merge command has a flag called "--dry-run". This allows you to see what the result of the merge will be without actually applying it to the target. It's useful if you have any doubts that the merge will succeed or what it's ging to apply to. On this front if the merge goes to hell you can always run the revert command to clean up your working copy.

Don't forget to commit a merge. Merging only applies changes to the working copy. You have to check those changes into the repository with the "commit" command.

Prefix branch comments with the branch name. This makes scanning the log history easy. Those that come after will thank you. Here's an example of the right thing from Django magic-removal branch:

django-magic.gif

Merge from the target context. One thing that can be confusing with merging is making sure you don't get your merge sources (what you;re merging) and targets (where you're merging to) mixed up. It's much easier to get this right if you get into the habit of running a merge from the target. That way you can think of it as taking in merges from somewhere else.

Never check into a tag. The convention for tags is to place them in a /tags folder in the repository. Tags are meant to be read only snapshots of your code. It's tempting sometimes to check little fixes into tags. Try not to do this - someday you will forget to put that change into the trunk as well and the next tag will be hosed in a way that is difficult to track down. And those little changes will get bigger and bigger over time. In subversion creating new tags is a cheap operation (both time and space). Instead, check the change into the trunk/branch, retag and release. Aside from code management another problem is confidence - seeing commits into the /tags folder lowers confidence in the integrity of the codebase. Nobody wants to think about tags that are actually branches.

Minimize the number of active branches. Branches can be useful, but too many of them is indicative of problems, typically of poor communication amongst developers or an inability not to break each others' code. Branches should be created only when neccessary- they're not a good default approach. If you really want to work by having individuals merge changesets, you've probably been following kernel-dev too closely, but you should look at tools that support this model, such as SVK (based on Subversion), Darcs or Bazaar-NG. Subversion is a centralised revision control system, theres not much point fighting it. Ian Bicking's "Distributed vs. Centralized Version Control" is a good overview of the two approaches.

Prune back dead branches: branches that are no longer active or required should be deleted agressively. Developer and experimental branches typically flal inot this category, but it's mroe or less true of any branch that has been merged back to the trunk. Get rid of it and focus on the active lines.

Never branch a branch. Branching a branch is sometime called "Staircasing" since a drawing of branching branches looks like a staircase. In general staircases happen because active development drifts away from the trunk and onto a branch, in turn that usually happens because merging back onto the trunk was too hard to do, and in turn that happens because backporting wasn't done. Crazy as it sounds, branching off branches can happen in CVS almost by adcident. This is because CVS records branches in the time dimension, so you can't see them as you could when branches are physical copies. In Subversion as branches are copies this problem should be alleviated, but it's still its something to be watchful of. Regression merging from branch to branch is a nightmare to manage and is understood to be a revision control worst practice - any configuration manager worth a salt will go a long way to make sure it doesn't happen on their watch. This is what a staircased repository eventually looks like:

escher staircase

So you see, I wasn't kidding about updating first thing in the morning.


Happy branching!

September 10, 2005

Moving host, contact details, outsourcing is bidirectional

After 4 years hosted at UKSolutions, I moved to TextDrive.

Weblog migration was straight forward, except for one gotcha (which in truth was to do with upgrading and not migration). Most of my other online stuff is static and/or dumpable. As Textdrive supports DAV and Subversion, I'll be uploading subversion repositories and calendars this weekend.

It all seems to have gone well.

And then there's DNS. UKSolutions, the registrar for my domains, will let you set wildcard DNS and that sort of thing, but can't by default allow the domain to have an authorative nameserver that isn't theirs. In fairness they hinted they might be able to set that up, but I decided instead to transfer registrars to domainsmadeeasy, and I'll eventually run the DNS records from dnsmadeeasy. The records are frigged for now so that http://dehora.net and http://www.dehora.net will point to TextDrive's server, but mail is still going to the UKSolutions mailserver for the time being. So the site was down for a bit yesterday while the records propagated. Neither registrar are dragging their feet; the transfer was begun the same day.

Contact: If you're trying to contact me by email, there's a chance some of my mail accounts - bill@dehora.net and lists@dehora.net- will go off the air at some point when I change mailservers. Instead try dehora@eircom.net (I have a Gmail account dehora@gmail.com which I never read - don't use that).

No, I'm not moving to the US. So, why the switch? I have nothing but good things to say about UKSolutions. However I live in Dublin, and the Euro/Sterling exchange hurts, and show no sign of hurting less in the future. I never found a suitable Irish ISP in the 4 years I've been here, and had been looking at US options for a while, given the Euro/Dollar rate. But it's not all about the bottom-line, the host has to be up for it (such as intangibles like being able to diagnose issues with a UKSolutions salesperson and overall competence). Now that I found a host in TextDrive that Definitely Gets It, the move made sense. The only pity is they're not domain registrars.

Armchair outsourcing pundits and globalization wonks take note. I just outsourced my IT services to the United States.

September 11, 2004

Setting up Cruisecontrol...

...took most of the day. It's a good tool, but finicky to setup. Also there isn't much info to be found out there on how to use it with Subversion. Here's what I threw together to port a largish modularized project over to it.

Fair Warning: At the end of the day I have no idea whether this is an idiomatic way to use Cruisecontrol - so - this howto might not be good for you. I also ought to say that I wouldn't have been able to set it up in a reasonable amount of time without having the source code to look at (and possibly, that's not the point of having the source code to look at). Once it's up and running tho' it takes care of itself.

Documentation

The documentation is... lacking. It makes you wish every open source project was like Hibernate or MySQL. Some of the attributes and element names are wrong or in camel case when they shouldn't be; some steps are left out; this is annoying but nothing you can't get past after flailing about for a bit and typing random stuff at the console until it works.

Layout

First, download Cruisecontrol and symlink it to /usr/java/cruisecontrol - then created a workspace under a user account's home folder:

    [propylon@cvsdub2 propylon]$ mkdir /home/propylon//builds
    [propylon@cvsdub2 propylon]$ mkdir /home/propylon//builds/work
    [propylon@cvsdub2 propylon]$ mkdir /home/propylon//builds/logs

work is where Cruisecontrol will find your projects; logs is where it will store log data.

    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 22:23 work
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 15:00 logs

Build

Building is the easiest part of the process. Here the documentation actually works out. To build a war file for web reporting add a file called override.properties to cruisecontrol/reporting/jsp pointing the properties at the logs directory you just setup,

    user.log.dir=/home/propylon/builds/logs
    user.build.status.file=currentbuildstatus.txt
    cruise.build.artifacts.dir=/home/propylon/builds/logs

and run the build script by passing a 'war' argument to it. Drop that war into a servlet container, and you're done.

Config file (almost, first a scripting diversion)

The Cruisecontrol config is clunky. But before we get to that, let's point out that Cruisecontrol at heart wants to fire up a JVM per project. Maybe there is some drop-dead simple way around this, but I couldn't figure it out beyond patching the code itself. To allow for multiple projects, we create the following artefacts, for each project:

  • run script: this is the file you'll use to boot Cruisecontrol for the project.
  • config: this is the Cruisecontrol config file for the project
  • ant file: this is a buildfile that lives outside the project structure. It does three things:
    1. blows away the last checkout under the 'work' folder
    2. runs a full checkout of the project
    3. calls a top level ant file in the project

The run file looks like this:

    #!/bin/sh
    export ANT_HOME=/usr/java/ant
    export JAVA_HOME=/usr/java/jdk14
    # put autobuild junk into a dummy folder
    export TOMCAT_HOME=/home/propylon/builds/HOME/TOMCAT_HOME
    # if you're reading a blog entry, skip these two
    export PROPELXBI_HOME=/home/propylon/builds/work/iams
    export IAMS_HOME=/home/propylon/builds/work/iams
    export PATH=$PATH:$ANT_HOME/bin
    ccmain=/usr/java/cruisecontrol/main/bin/cruisecontrol.sh
    $ccmain -projectname iams  -configfile cc-iams-config.xml &

and the build file looks like this:

    <?xml version="1.0"?>
    <project name="cc-iams" basedir="." default="build">
      <property file="cc-svn.properties" />
      <path id="project.classpath">
        <pathelement location="${svnjavahl.jar}" />
        <pathelement location="${svnant.jar}" />
        <pathelement location="${svnClientAdapter.jar}" />
      </path>
      <taskdef resource="svntask.properties" classpathref="project.classpath"/>
      <target name="build">
        <delete dir="work/iams"/>
        <svn  username="xxxxx" password="xxxxx">
          <checkout url="http://cvsdub2/svn/iams/trunk" revision="HEAD" destPath="work/iams" />
        </svn>
        <ant antfile="build.xml" target="build" dir="work/iams"/>
      </target>
      <target name="cp" description="print the build classpath">
        <property name="cp" refid="project.classpath" />
        <echo>${cp}</echo>
    </target>

All these files go under that builds folder we just made. Here's what things look like so far:

    [propylon@cvsdub2 builds]$ -rwxr-xr-x  1 propylon propylon   393 Sep 10 21:25 cc-iams-build.sh
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon  1599 Sep 10 22:28 cc-iams-config.xml
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon   905 Sep 10 12:27 cc-iams.xml
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 22:23 work
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 15:00 logs

You could just call straight into the project's ant file and skip the new build file, but I like the idea of a separation between automated and project build systems (must... use... more... indirection). Incidentally, the project itself has ten or so standalone modularized builds that can be run from the master build referenced above.

Config file (almost, really, some subversion first)

It can seen from the build file above that the project is in Subversion. That means we need to install the svnant libraries. This is easy to do, just unpack the distribution into /usr/java/svnant. The cc-svn.properties file can be reused across all projects and looks like this:

    svnant.version=0.9.1
    lib.dir=/usr/java/svnant/lib
    svnjavahl.jar=${lib.dir}/svnjavahl.jar
    svnant.jar=${lib.dir}/svnant.jar
    svnClientAdapter.jar=${lib.dir}/svnClientAdapter.jar

(it's lifted directly from the example provided by svnant)

So here's where we're at:

    [propylon@cvsdub2 builds]$ -rwxr-xr-x  1 propylon propylon   393 Sep 10 21:25 cc-iams-build.sh
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon  1599 Sep 10 22:28 cc-iams-config.xml
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon   905 Sep 10 12:27 cc-iams.xml
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon   608 Sep 10 14:28 cc-svn.properties
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 22:23 work
    [propylon@cvsdub2 builds]$ drwxrwxr-x  5 propylon propylon  4096 Sep 10 18:17 HOME
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 15:00 logs

Ant 1.6 (config file is next, I swear)

Cruisecontrol ships with Ant 1.5.3. I found this out when it couldn't run an ant script with an import task, but for the sake of this howto I'm going to pretend I knew that upfront. The way to get around this without changing its classpath in the startup script is to use the "antscript" attribute from the ant element to point to a script instead, ie:

    <schedule interval="21600">
      <ant  antscript="/home/propylon/builds/ant16.sh"  
          buildfile="cc-iams.xml" target="build" />
    </schedule>

The script in turn points at your own ant distribution:

    #! /bin/sh
    export ANT_HOME=/usr/java/ant
    antmain=${ANT_HOME}/bin/ant
    $antmain "$@"

let's add that to the builds folder:

    [propylon@cvsdub2 builds]$ -rwxr-xr-x  1 propylon propylon    77 Sep 10 22:18 ant16.sh
    [propylon@cvsdub2 builds]$ -rwxr-xr-x  1 propylon propylon   393 Sep 10 21:25 cc-iams-build.sh
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon  1599 Sep 10 22:28 cc-iams-config.xml
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon   905 Sep 10 12:27 cc-iams.xml
    [propylon@cvsdub2 builds]$ -rw-r--r--  1 propylon propylon   608 Sep 10 14:28 cc-svn.properties
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 22:23 work
    [propylon@cvsdub2 builds]$ drwxrwxr-x  5 propylon propylon  4096 Sep 10 18:17 HOME
    [propylon@cvsdub2 builds]$ drwxr-xr-x  3 propylon propylon  4096 Sep 10 15:00 logs

Config file (at last!)

Here's a basic config file:

    <cruisecontrol>
      <project name="iams" buildafterfailed="false">
        <bootstrappers>
          <currentbuildstatusbootstrapper file="logs/iams/currentbuildstatus.txt"/>
        </bootstrappers>
        <modificationset quietperiod="60">
          <svn
            LocalWorkingCopy="work/iams"
            username="xxxxx"
            password="xxxxx"></svn>
        </modificationset>
        <!-- 6 hours -->
        <schedule interval="21600">
          <ant antscript="/home/propylon/builds/ant16.sh"  
              buildfile="cc-iams.xml" target="build" />
        </schedule>
        <log dir="logs/iams" encoding="UTF-8">
        </log>
        <publishers>
          <currentbuildstatuspublisher file="logs/currentbuildstatus.txt"/>
          <htmlemail
              mailhost="mail.propylon.com"
              returnaddress="noreply-cruisecontrol-at-propylon.com"
              defaultsuffix="-at-propylon.com"
              buildresultsurl="http://cvsdub2:8080/cruisecontrol/buildresults/iams"
              css="/usr/java/cruisecontrol/reporting/jsp/css/cruisecontrol.css"
              xsldir="/usr/java/cruisecontrol/reporting/jsp/xsl"
              logdir="logs/iams"
              subjectprefix="[build-nanny] ">
            <map alias="list" address="S0070-at-724.ie"/>
            <map alias="tommy.lindberg" address="tommy.lindberg-at-propylon.com"/>
            <map alias="bill.dehora" address="bill.dehora-at-propylon.com"/>
            <always address="list"/>
            <failure address="tommy.lindberg" reportWhenFixed="true"/>
            <failure address="bill.dehora" reportWhenFixed="true"/>
          </htmlemail>
        </publishers>
      </project>
    </cruisecontrol>

The first thing to say about this is that the Subversion task's name is 'svn' not 'Subversion' (did I say the documentation was lacking?). Anyway, the above will do roughly the following:

  • Schedule a build for a project called 'iams'
  • Stop trying to build after a failure, unless there is a change in the repository (buildafterfailed="false")
  • keeping track of whether the project's being built (file="logs/iams/currentbuildstatus.txt")
  • look for projects changes via subversion (LocalWorkingCopy="work/iams")
  • Attempt to run a build every 6 hours (interval="21600")
  • Only run a build if something changed
  • Use a shell script to invoke the buildfile (antscript="/home/propylon/builds/ant16.sh" buildfile="cc-iams.xml" )
  • Log everything (dir="logs/iams")
  • Annoy people with the build results ("htmlemail"); some people will get annoyed at every build ("always"); some only when there's a problem ("failure")

By the looks of things, configuration has options for a number of other features, but this setup is fine.

Ant extensions

Cruisecontrol wasn't picking up jdepend or tomcat tasks during builds even thought these were installed into Ant's lib folder and are referenced as such by the individual build files (totally different classpaths of course, doh). This broke perfecty good builds. The run scripts (in cruisecontrol/main/bin) have a classpath variable called CRUISE_PATH which includes Ant. Dropping the jars in question into Cruisecontrol lib folder and hacking the paths onto the end of CRUISE_PATH variable in the script solved that problem (I'll have to remember to put them somewhere else or a Cruisecontrol upgrade will break the build). NB: I did this before discovering I needed to bung an Ant 1.6.x shell script into the process - pointing at another Ant might end up having things work for you without making modifications; if not, fix up the Cruisecontrol scripts.

Do one checkout

Cruisecontrol does not seem to check the project out the first time it's run; you have to do this (again maybe this is possible to setup). So cd to the work folder and:

    [propylon@cvsdub2 builds]$ cd work/
    [propylon@cvsdub2 work]$ mkdir iams
    [propylon@cvsdub2 work]$ svn co http://cvsdub2/svn/iams/trunk .

Murray Walker Moment: Go Go Go

Ok, we're done. Start up Cruisecontrol using the cc-iams-build.sh script

    [propylon@cvsdub2 builds]$ ./cc-iams-build.sh

End

Clearly, there are a number of alternate ways to do this; much of it will come down to how you like to organize specifics, ie, whether you put things like ANT_HOME in a .profile or in a script, or whether you want a separate buildfile for Cruisecontrol use. I've also left out a lot of details, like setting the executable bit on the shell scripts, testing the cc-iams.xml ant file is working, checking that user has permissions to write into certain folders and can run certain things and so on. The essential setup described here will also work on windows, once you fix up the paths and use .bat files instead.

Cruisecontrol has some nice touches: html mail, an indicator near the top mentioning (in red) that "this project doesn't have any tests", a web console, pie charts indicating the proportion of busted builds, build only if something changed, a pause if the project is being checked into, a list of changes made since the last build, a list of deployed artefacts, and test results. It seems once you have one project going, Cruisecontrol just works, and you can cut and paste the config file and various scripts for the next one. But I'm starting to understand why certain Thoughtworks and Atlassian open source bots are hacking out Damagecontrol.

August 01, 2004

I miss Lisp

Spotted in the Subversion 1.0.6 source: merge.c
          /* I miss Lisp. */

          SVN_ERR (svn_io_open_unique_file (&lcopy_f,
                                            &left_copy,
                                            merge_target,
                                            left_label,
                                            FALSE,
                                            pool));
          SVN_ERR (svn_io_file_close (lcopy_f, pool));

          /* Have I mentioned how much I miss Lisp? */

          SVN_ERR (svn_io_open_unique_file (&rcopy_f,
                                            &right_copy,
                                            merge_target,
                                            right_label,
                                            FALSE,
                                            pool));
          SVN_ERR (svn_io_file_close (rcopy_f, pool));