« Realtechnik | Main | Get those 3d glasses on! »

Mercurial, Part I

What is it? Mercurial is a distributed version control system (DVCS), written in Python and released under the GPLv2. This post is an initial impression and notes after playing with it for a few days - please don't construe it as a recommendation pro or con, or in any way a complete overview or assessment of what Mercurial can or can't do. [calling this a "review" would be unfortunate.]

Why look at a DVCS? For version control I've used Subversion almost exclusively for over 3 years, before that it was CVS. I'm interested in the distributed VCSes pragmatically for 4 reasons. First to allow offline or disconnected development. Second, to deal with OSS codebases and codestreams that I either depend on, have to patch, or have to upgrade and thus end up having something else to manage - this issue gets bigger for me each year. Third, as a way of distributing code without having to manage or worry about forks or long running branches - this issue for me is small, but growing. Fourth, as a potential publishing tool for content management and distributed authoring. And personally I'm plain interested in distributed computing systems.

Another theme with DVCSes is scaling for the code and committer base. The Linux kernel is famously distributed. OpenSolaris is, and the JDK will by the looks f things, run on Mercurial, primarily it seems for scaling reasons. A list of projects using hg is on the mercurial wiki (my surprise inclusion was MoinMoin). Not everyone has this need, but Mercurial is claimed to be able to scale down to work in the small as well, which is interesting.

Installation and setup. I upgraded to Ubuntu Edgy so I could run Mercurial 0.91. Ubuntu Dapper universe defaults to 0.7x *, and I wouldn't recommend using below 0.9 because of the improved revlog and HTTP push support. Installation was breezy. Command line usage is like CVS/SVN and is highly idiomatic. The executable is called 'hg'. Subversion users will recognize many of the commands. Creating a repository is easy - the repository folder is yours and hg adds a .hg subfolder that contains the metadata and version history.

To be clear - even though I used apt to install it, Mercurial is self-hosting, which is a must imo before looking at any kind of VCS, and impressive given the project is barely two years old.

Concepts.In Mercurial you don't check into a central server, you work locally. To be clear, the repository is created on your file system - you work on top of it and not in a checked out sandbox (this is somewhat like RCS as I remember it).

To coordinate changes you either push changes to another repository or publish your repository for others to pull from. This means you are passing around sets of changes between repositories that must be integrated ("merged") locally.

Each commit to a repository results in a new version of the repository, the commit itself is recorded as set of changes, called a changeset, or cset. Changesets have globally unique ids, and can be given symbolic names, called tags; this is not the same concept as found Subversion or CVS, but in mercurial tags are useful as a vernacular for sharing changes across repositories.

Mercurial's branching/merging model itself is conceptually simple - a branch results in two child repositories coming from a common parent - a merge creates a new repository that is the child of two parent repositories. New repositories are created by cloning an existing one.

You can branch and merge locally or publish changes to others over a network. This is not the same as Subversion, where a branch is an internal copy of a subtree; and compared to Mercurial, Subversion doesn't have any notion of tracking merges.

The thing to figure out, coming from a centralised repository background, will be idiomatic use. For example: when it comes to release management, whether to use named branches or repository clones seems to be an "it depends" matter.

Bryan O'Sullivan gives a good overview of the concepts in this video.

Best bits. Cloning+update or upstream push provides implicit backup solutions. The source is in Python, so I can read it without bleating about C being hard. RSS feeds are available by default over the Web UIs. However the most appealing features is being able to work disconnected with the entire history available. That's huge, assuming you and your development methodology can get past the non-central model. My sense is that a distributed VCS requires more individual discipline in the development process and more inter-developer communications than the 'command and control' policy implied by a central server.

Definitely, tools like Mercurial are not for those who don't see version control and patch management as a critical part of the development process, or think that a VCS is a glorified backup server. There is an implied way of working with this kind of tool that not everyone will need or want.

A distributed VCS potentially helps solves a real problem - forking. Forking is not just an OSS thing. I take a more general view of it as being cut and paste in the large that results in duplication and (often) unwitting adoption of codebases. This kind of code adoption is especially hurtful to commercial projects. Here's an all too common pattern - checkout a 3rd party codebase from one repository, check into another. Customize, extend or fix the 3rd party code. Don't send back the upgrades, perhaps because you don't have time, perhaps because the adopted code is welded into the new software, but fundamentally because you've diverged sufficiently far away from the original (and now changed code) that you've got no easy way to rationalize the code you have to manage. Service and fixed price engagements exacerbate this, by not being optimized financially for long term code maintenance, support and reuse. Each individual engagement thus costs more than it should and scale opportunities are lost (local v global risks are traded off). Anything that ameliorates this is worth looking at imo.

Worst bits. I found hg push and repository publishing clumsy to setup. Coming from subversion, one the first things I wanted to do was publish a repository and be able to push and pull from it. Setting up for push/publishing is a nuisance. Tunneling over SSH is, as ever with all things, a pain. I gave up after a couple of hours of fooling around with authorized_keys, ssh-agent and friends. Error messages were not helpful. Instead I set it up to run multiple repositories behind a single Apache conf (which is how I generally setup Subversion). That took an hour due to a control character in the config file that stopped it from being read (arrgh). Eventually I'll get that behind SSL+Basic; for now I have a working master server multiple repositories. In fairness to Mercurial, getting the administration setup right is a usability thing that can be fixed, and not intrinsic to the VCS itself.

I suspect Mercurial might not be IDE friendly as each new clone will need a new project set up for it. How important this is to you will depend. Those on Linux will end using symlinks to dupe the IDE and emulate Subversion's switch command. This isn't to do with exposing metadata or SPI hooks to IDE tools, it's fundamental to how Mercurial branches work as standalone repositories.

The key message from the Mercurial community seems to be an emphasis on speed - in other words, it's like other DVCSes but niftier. My personal thing with any VCS is stability over the data, not speed - fast is good, but safe is better. Still, there's no rush; I evaluated Subversion for the best part of a year before moving my personal work to it, and hg won't be done for a while yet.

Conclusion. I liked it, despite some pre1.0 rough edges and some conceptual hurdles of my own I'll have to clear. I love the fact that I can work offline while subscribing to others' RSS feeds to pick up other changes. The distributed patch and branch management support seems to be extremely powerful (as in, I haven't entirely 'gotten' what's possible yet, and am sure to blow at least one foot off). The ability to manage 3rd party codestreams is given first class treatment in Mercurial whereas in Subversion you work with idioms like vendor branches. I hope they get renaming sorted out. It's fun to use; I'm going to move one non-critical project to Mercurial and continue to play with it over the next 6 months.


January 9, 2007 06:30 PM

Comments

Ryan Tomayko
(January 9, 2007 07:25 PM #)

This is a great way to start in on your new years resolution :)

Nicola Larosa
(January 11, 2007 09:39 PM #)

Mercurial is smaller, simpler and faster than Bazaar-NG:

Bazaar vs. Mercurial: An unscientific comparison
http://sayspy.blogspot.com/2006/11/bazaar-vs-mercurial-unscientific.html

bzr/hg performance comparison with newest versions
http://sayspy.blogspot.com/2007/01/bzrhg-performance-comparison-with.html

Bazaar-NG is created by Canonical (the Ubuntu maker), and it seems to be gaining mindshare faster than Mercurial; I surmise that this is not due to technical merit, but instead against it.

John D. Mitchell
(January 12, 2007 09:04 PM #)

Re: stability/reliability and speed

You might want to read the design docs for Mercurial. Hg is much better designed in terms of reliability than e.g., Subversion.

(January 13, 2007 02:06 AM #)

What about darcs?