There's probably a better way, but this post is about how I mix
mercurial (aka hg) with perforce (aka p4) to work offline and cut
personal branches.
The why. Isn't one VCS enough? Not quite. I've
found mercurial a nice complement to perforce. Perforce has a "fast pipe always on" assumption that doesn't always pan out when you're not connected. Then there's the check-out-to-edit model, which cracks me up. So yes, I find Perforce fiddly and counterintuitive at times. That said, Perforce has some great
features, notably how it deals with branch metadata, large codebases,
and when you are connected, it's way fast (precisely because it makes
you tell it what you're editing). I'm aware that MSFT have used a p4
variant for years, and Google use p4 too - this is what Google have
going on:
"Google's main Perforce server is very large, very active and requires 24x7 uptime. It has more
than 4000 active users, 200 GB of metadata, and performs more than 2 million commands a day.
In addition to this main server we run a number of smaller servers, read-only replicas, and proxies
worldwide. We use an extensive layer of tools built on top of Perforce, and make heavy use of
triggers. Perforce at Google is used by IT and other departments in addition to Engineering. Our
operating platform consists of an HP DL585 with 4 dual-core Opterons and 128 GB of RAM
running Red Hat 2.6 Linux and Perforce 2006.1. The depot is on a NetApp filer, and the
database and journal are on RAID-10 local disk using the ext3 file system.
We do a number of things right: we set MaxScanrows and MaxResults to limit large queries; we
aggressively clean up inactive clients, labels, and users; we make use of proxies and read-only
replicas; and we upgrade to new versions of Perforce within three or four months of their release
to take advantage of performance enhancements.
Unfortunately, we also do a number of things which are less than ideal from a performance
standpoint: we have a very complex protect table, our users run many automated scripts, we
have many large (30,000+ file) clients and branches, and we make heavy use of temporary
clients. We are working to change some of this, but much of it arises from Google's culture:
within limits, any engineer is free to use Perforce in any way he or she sees fit."
Those
pesky engineers! That paper by the way is a
goldmine of smart advice on perforce, especially around locking. Perforce probably the most important scale up platform
Google run, so enjoy the notion of a
company founded on scale out architectures depending on a traditonal
N=1 setup to manage their code. You
can find more goodness like that from the 2007 perforce conference (the EA paper in there is also a good read).
The how. If the code is already in perforce, I'll pull down a copy and hg
init a local repository; otherwise I'll init using mercurial and check into perforce later (typically this is for a spike). The I'll do some work, committing as I go to
mercurial. My perforce commit tends to be a rollup of smaller hg
commits, alternatively if I want to track changes specifically I'll
check in the individual hg commits by rolling forward the history using "hg update" (some more on this below). Don't
commit the .hg folder into perforce by the way. That's because it can interfere with mercurial based teamwork - suddenly you and another
person are committing on the same hg repository instead of push/pull changesets or
queuing patches. Anyway, it's too weird
to put a repository inside another repository.
Managing tags. In some cases
that I tag the hg repository direct and hand over a clone for deployment.
Deploying a hg clone instead of just a tarball is close to becoming a
best practice for me; for example my weblog deployment process is, clone the
mercurial repo. But sometimes I'll do some work, and realise I didn't tag upstream in
perforce. Here's what to do to rectify:
p4 edit; checkout the entire project folder from perforce
hg tags; find the tag I want and the current rev (revnumber), which I'll need shortly
hg update <tagname>; this sets the mercurial repo to that version
p4 revert -a -c default; this reverts unchanged files
p4 submit -i; check my stuff into perforce
p4 label/p4 tag; create the label in perforce and tag the fileset
At this point I've switched my hg repo to <tagname>, checked that into perforce and labelled in perforce. When I'm done I can update back to the active revision and get to work on the new stuff:
hg update <revnumber>; this can either be the local revision number or the global hash
Being able
to switch quickly back and forth between tags and the main line is a good reason to use
mercurial. You can "roll forward" to an old version, patch it, clone a
new release and switch back to your main work, all in place, with little overhead. It's also good for
working with code that is configuration heavy (like server side Java),
where you don't want to check in developer configurations but do need
to manage them.
Managing fixes. Here's Bryan O'Sullivan explaining how fix propagation can work, from the mercurial book: (which I look forward to seeing in print) :
"In
the simplest instance, all you need to do is pull changes from your
maintenance branch into your local clone of the target branch.
1 $ cd ..
2 $ hg clone myproject myproject-merge
3 3 files updated, 0 files merged, 0 files removed, 0 files unresolved
4 $ cd myproject-merge
5 $ hg pull ../myproject-1.0.1
6 pulling from ../myproject-1.0.1
7 searching for changes
8 adding changesets
9 adding manifests
10 adding file changes
11 added 1 changesets with 1 changes to 1 files (+1 heads)
12 (run 'hg heads' to see heads, 'hg merge' to merge)
You'll then need to merge the heads of the two branches, and push back to the main branch.
1 $ hg merge
2 1 files updated, 0 files merged, 0 files removed, 0 files unresolved
3 (branch merge, don't forget to commit)
4 $ hg commit -m 'Merge bugfix from 1.0.1 branch'
5 $ hg push
6 pushing to /tmp/branch-repo9eWu3d/myproject
7 searching for changes
8 adding changesets
9 adding manifests, you have to
10 adding file changes
11 added 2 changesets with 1 changes to 1 files "
For
surgical working, ie, bringing specific changes back to the mainline
from an alternate tree, you can export patches and use mercurial queues (based on quilt) to bring them in. That's a more complex approach, but
it reflects essential complexity in patch management.
Intermission: on branching.
Working
across branches in centralised repositories (even with things like
subversion switch or perforce integrate commands)
is tough by comparison, tough enough that branching becomes a massive
PITA and companies end up investing a lot of process and ceremony
overhead to deal with branches, code lines, fix propagation, even
hiring people just to manage that work.
The larger point is that branching is always taking place and avoiding can be
painful, in the same way avoiding frequent integration is painful. Kent
Beck in Extreme Programming says branching should be avoided and it's
one of the few things I disagree with in the entire book. It is
however industry standard practice to treat branches with
trepidation. But here's the the thing: branching isn't just a process or code duplication
matter to avoid, it's inevitable - as soon as you check out code or locked a
file, you've branched - checking back in *is* a merge operation. Using
a distributed VCS drives branching costs to zero; they are to branching
as xUnit is to testing, or CI is to integration :)
Risky business. Isn't
messing about with local repositories somehow irresponsible, or faintly
cowboy? Not really, but I can see how it might look to people who take
code and release management seriously enough to have defined useful
processes. The overall point is fourfold
- you're productive even when offline/disconnected,
- all code is versioned all the time,
- there's less high drama around branching,
- local versioning encourages spike solutions
The latter one is worth expanding on. A spike solution is agile-speak for a quick prototype to validate an architecture or design idea and/or get a handle on the amount of work involved implementing a feature. Prior to agile spikes were bigger, "write one to throw away" in Brooks' terms, but what a spike really implies to my mind is innovation, experimentation and real management of technical risk, all things to be encouraged.
Applicability. This is not just a model for working with Perforce. I've found myself doing this on other on projects - django, nutch, plone, sitemesh, the atompub spec (which I had write access to). Reading around, it seems to be becoming a lesser idiom to check out from a VCS and work locally using a DVCS. A tool to automate the sync process would be great (I've tried tailor and failed). That said, I'm not at all suggesting to drop tools like subversion and perforce, only that DVCS like mercurial and git, or even the local history feature in IDEA/Eclipse can be a complement to classic version control practices. And there's no free lunch - you do have to make sure the changes you want to ship get into the main line, wherever it resides. In a DVCS case that means taking responsibility for your local versions and being disciplined when you can reconnect to the main repository.
3 Comments
Good article!
Your comment on branching and working in a workspace I totally agree with. Whether you "sync" someone else's work and get a conflict, or branch and pull changes to get a conflict is effectively the same thing. If you are either on a private branch (Perforce or SVN), or using Mercurial (or other DVCS), the advantage (of a private branch or your own local repository) is the extra control you have.
For a related discussion, see also http://www.cmcrossroads.com/articles/...
Robert
The link from the previous post changed to:
http://www.cmcrossroads.com/component...
Sorry, the right link is:
http://www.cmcrossroads.com/content/v...