using mercurial with perforce (or some other centralised vcs)

There's probably a better way, but this post is about how I mix mercurial (aka hg) with perforce (aka p4) to work offline and cut personal branches.

The why. Isn't one VCS enough? Not quite. I've found mercurial a nice complement to perforce. Perforce has a "fast pipe always on" assumption that doesn't always pan out when you're not connected. Then there's the check-out-to-edit model, which cracks me up. So yes, I find Perforce fiddly and counterintuitive at times.  That said, Perforce has some great features, notably how it deals with branch metadata, large codebases, and when you are connected, it's way fast (precisely because it makes you tell it what you're editing). I'm aware that MSFT have used a p4 variant for years, and Google use p4 too - this is what Google have going on:

"Google's main Perforce server is very large, very active and requires 24x7 uptime. It has more
than 4000 active users, 200 GB of metadata, and performs more than 2 million commands a day.
In addition to this main server we run a number of smaller servers, read-only replicas, and proxies
worldwide. We use an extensive layer of tools built on top of Perforce, and make heavy use of
triggers. Perforce at Google is used by IT and other departments in addition to Engineering. Our
operating platform consists of an HP DL585 with 4 dual-core Opterons and 128 GB of RAM
running Red Hat 2.6 Linux and Perforce 2006.1. The depot is on a NetApp filer, and the
database and journal are on RAID-10 local disk using the ext3 file system.

We do a number of things right: we set MaxScanrows and MaxResults to limit large queries; we
aggressively clean up inactive clients, labels, and users; we make use of proxies and read-only
replicas; and we upgrade to new versions of Perforce within three or four months of their release
to take advantage of performance enhancements.

Unfortunately, we also do a number of things which are less than ideal from a performance
standpoint: we have a very complex protect table, our users run many automated scripts, we
have many large (30,000+ file) clients and branches, and we make heavy use of temporary
clients. We are working to change some of this, but much of it arises from Google's culture:
within limits, any engineer is free to use Perforce in any way he or she sees fit."


Those pesky engineers! That paper by the way is a goldmine of smart advice on perforce, especially around locking. Perforce probably the most important scale up platform Google run, so enjoy the notion of a company founded on scale out architectures depending on a traditonal N=1 setup to manage their code. You can find more goodness like that from the 2007 perforce conference (the EA paper in there is also a good read).

The how. If the code is already in perforce, I'll pull down a copy and hg init a local repository; otherwise I'll init using mercurial and check into perforce later (typically this is for a spike). The I'll do some work, committing as I go to mercurial. My perforce commit tends to be a rollup of smaller hg commits, alternatively if I want to track changes specifically I'll check in the individual hg commits  by rolling forward the history using "hg update" (some more on this below). Don't commit the .hg folder into perforce by the way. That's because it can interfere with mercurial based teamwork  - suddenly you and another person are committing on the same hg repository instead of push/pull changesets or queuing patches. Anyway, it's too weird to put a repository inside another repository.

Managing tags. In some cases that I tag the hg repository direct and hand over a clone for deployment. Deploying a hg clone instead of just a tarball is close to becoming a best practice for me; for example my weblog deployment process is, clone the mercurial repo. But sometimes I'll do some work, and realise I didn't tag upstream in perforce. Here's what to do to rectify:

p4 edit; checkout the entire project folder from perforce
hg tags; find the tag I want and the current rev (revnumber), which I'll need shortly
hg update <tagname>; this sets the mercurial repo to that version
p4 revert -a -c default; this reverts unchanged files
p4 submit -i; check my stuff into perforce
p4 label/p4 tag; create the label in perforce and tag the fileset

At this point I've switched my hg repo to <tagname>, checked that into perforce and labelled in perforce. When I'm done I can update back to the active revision and get to work on the new stuff:

hg update <revnumber>; this can either be the local revision number or the global hash

Being able to switch quickly back and forth between tags and the main line is a good reason to use mercurial. You can "roll forward" to an old version, patch it, clone a new release and switch back to your main work, all in place, with little overhead. It's also good for working with code that is configuration heavy (like server side Java), where you don't want to check in developer configurations but do need to manage them.

Managing fixes. Here's Bryan O'Sullivan explaining how fix propagation can work, from the mercurial book: (which I look forward to seeing in print) :

"In the simplest instance, all you need to do is pull changes from your maintenance branch into your local clone of the target branch.

1  $ cd ..
2  $ hg clone myproject myproject-merge
3  3 files updated, 0 files merged, 0 files removed, 0 files unresolved
4  $ cd myproject-merge
5  $ hg pull ../myproject-1.0.1
6  pulling from ../myproject-1.0.1
7  searching for changes
8  adding changesets
9  adding manifests
10  adding file changes
11  added 1 changesets with 1 changes to 1 files (+1 heads)
12  (run 'hg heads' to see heads, 'hg merge' to merge)

You'll then need to merge the heads of the two branches, and push back to the main branch.

1  $ hg merge
2  1 files updated, 0 files merged, 0 files removed, 0 files unresolved
3  (branch merge, don't forget to commit)
4  $ hg commit -m 'Merge bugfix from 1.0.1 branch'
5  $ hg push
6  pushing to /tmp/branch-repo9eWu3d/myproject
7  searching for changes
8  adding changesets
9  adding manifests, you have to
10  adding file changes
11  added 2 changesets with 1 changes to 1 files "


For surgical working, ie, bringing specific changes back to the mainline from an alternate tree, you can export patches and use mercurial queues (based on quilt) to bring them in. That's a more complex approach, but it reflects essential complexity in patch management.

Intermission: on branching.

Working across branches in centralised repositories (even with things like subversion switch or perforce integrate commands) is tough by comparison, tough enough that branching becomes a massive PITA and companies end up investing a lot of process and ceremony overhead to deal with branches, code lines, fix propagation, even hiring people just to manage that work.

The larger point is that branching is always taking place and avoiding can be painful, in the same way avoiding frequent integration is painful. Kent Beck in Extreme Programming says branching should be avoided  and it's one of the few things I disagree with in the entire book. It is however industry standard practice to treat branches with trepidation. But here's the the thing: branching isn't just a process or code duplication matter to avoid, it's inevitable - as soon as you check out code or locked a file, you've branched - checking back in *is* a merge operation. Using a distributed VCS drives branching costs to zero; they are to branching as xUnit is to testing, or CI is to integration :)

Risky business. Isn't messing about with local repositories somehow irresponsible, or faintly cowboy? Not really, but I can see how it might look to people who take code and release management seriously enough to have defined useful processes. The overall point is fourfold

  • you're productive even when offline/disconnected,
  • all code is versioned all the time,
  • there's less high drama around branching,
  • local versioning encourages spike solutions

The latter one is worth expanding on. A spike solution is agile-speak for a quick prototype to validate an architecture or design idea and/or get a handle on the amount of work involved implementing a feature. Prior to agile spikes were bigger, "write one to throw away" in Brooks' terms, but what a spike really implies to my mind is innovation, experimentation and real management of technical risk, all things to be encouraged.

Applicability. This is not just a model for working with Perforce. I've found myself doing this on other on projects - django, nutch, plone, sitemesh, the atompub spec (which I had write  access to). Reading around, it seems to be becoming a lesser idiom to check out from a VCS and work locally using a DVCS. A tool to automate the sync process would be great (I've tried tailor and failed). That said, I'm not at all suggesting to drop tools like subversion and perforce, only that DVCS like mercurial and git, or even the local history feature in IDEA/Eclipse can be a complement to classic version control practices. And there's no free lunch - you do have to make sure the changes you want to ship get into the main line, wherever it resides. In a DVCS case that means taking responsibility for your local versions and being disciplined when you can reconnect to the main repository.

Update (June 3, 2008): Here's some detail on setting up Mercurial specifically for a personal branch workflow.

Tags:

3 Comments


    Good article!

    Your comment on branching and working in a workspace I totally agree with. Whether you "sync" someone else's work and get a conflict, or branch and pull changes to get a conflict is effectively the same thing. If you are either on a private branch (Perforce or SVN), or using Mercurial (or other DVCS), the advantage (of a private branch or your own local repository) is the extra control you have.

    For a related discussion, see also http://www.cmcrossroads.com/articles/...

    Robert


    The link from the previous post changed to:

    http://www.cmcrossroads.com/component...


    Sorry, the right link is:

    http://www.cmcrossroads.com/content/v...