What a DVCS gets you (maybe)

Czajnik responding to Jeff Atwood's howto on setting up Subversion on windows: "Have you tried any distributed source control? I've recently switched from Subversion to Mercurial, and I'm very happy about the change. The most important reason was the ability to clone the repository to my laptop, do some checkins there (without network access, on a plain, train, etc.) and resync with just one command when I'm back home. Distributed model seems cool even for a single developer :)"

Jeff Atwood responding to Czajnik: "What is the difference between what you describe, and working traditionally offline, then checking in when you get back into the office? If a "checkin" occurs on your local machine, and nobody else in the world knows about it.. did it really happen? Maybe I'm not understanding the distinction here. I still need to watch the rest of Linus Torvalds' presentation on this topic ( http://www.youtube.com/watch?v=4XpnKHJAok8 )"

I don't know that Torvald's presentation on Git is going to explain much (Randall Schwartz's Git tutorial is much better). I agree with Jeff when he says elsewhere it will take years to see distributed version control systems (DVCS) being generally adopted.  But having switched to a DVCS a while back (mercurial) coming from Subversion and before that CVS, let me lay out three things I think you get when you use a distributed model:

  1. Better branching, thus control over code
  2. speed, thus ease of use

  3. Better IDEs (potentially)
But the short version is this - distributed version control is the general case.

Better branching

This one needs some justification, when you think of the costs associated with branching and the folklore around them - industry consensus is that branching is a bad thing, a neccessary evil.

A while back I said, "branching isn't just a process or code duplication matter to avoid - it's inevitable - as soon as you check out code or locked a file, you've branched - checking back in *is* a merge operation." That was in the context of using mercurial as an offline supplement to perforce.

Saying that a local sandbox is a branch operationally can mean a few things,

  • update is a merge operation from another codeline,
  • checkin is a merge operation to another codeline,
  • you want to update and checkin frequently to avoid drifting.

Centralised systems like perforce, cvs, svn, do not treat checkouts as branching operations. As a result centralised systems support branching in a limited sense - that's often why branching is a bad thing. But once you accept this model of having your sandbox under version control, a lot of the pain (and fear) of dealing with branches evaporates. Passing around changesets and patches becomes normal and logical.

Centralised VCS also results in a bias towards the server as the single point of truth - your local sandbox can get messed up via conflicts, but a centralised model doesn't ever allow you to check in conflicted files. If the local merge after update fails you have to cleanup conflicts manually. This points to a limitation in centralised version control systems - the developer local history of changes is not preserved. It is as though you have a maintenance/dev branch where every time you commit to the branch, the checkin is routed to the code line where the branch was taken from. That means no branch history is kept, ever. The information is thrown away. And if your version of the file prior to the merge is never versioned, that in turn that means any post-facto work or cleanup of mistakes has to been dealt with manually. You can't go back through the history.

I have more than once seen a developer effectively stranded, where they can't checkin because they can't integrate locally without a lot of pain, and it's too late to cut a branch after the fact. Their sandbox is going to get hosed before they'll able to checkin. I've seen it enough that I'm inclined to think it's not a training/skill problem, it's a tooling problem. I also suspect it kills innovation and experimentation on codelines - when branching is that heavyweight and problematic, what's the incentive?

 

Speed

This one is easy. Once you start using a DVCS for local work, going back to a centralised model feels slow, as in your mind wanders and breaks flow, which is the worst kind of slow. Sometimes if something can be made much faster, it becomes a matter of improved usability rather than technology  - think broadband compared to dialup, or being able to run your unit tests in 20 seconds. If the basic versioning operations all become sub-second, this has the potential to impact your workflow for the better. The speed point takes some wind out of Perforce, which is the speed king as far as centralised models go (although it results in an ugly tradeoff with you having to tell the server what files you're working on).

Better IDEs (potentially)

Jeff also has a comment on using the IDE to do versioning: "This is sort of a religious issue. Some developers believe source control should *never* be done inside the IDE, and I've started to see their point after dealing with the many, many bugs in the Team Explorer Visual Studio integration point of Team System."

Long bet: all IDE-local versioning tooling will come to use a DVCS internally, probably one that supports renaming operations (for refactoring support). Using a real VCS instead of a private library is likely to be a good thing, as it opens up the toolchain, , in much the same way that all Java IDEs eventually supporting Ant/Maven directly did.

Tags:

11 Comments


    The one thing I will say about svn here is that you can copy a working tree, with modifications, directly to a branch. But there's still no local checkins, unless you use something like svk or git-svn.


    I'll admit I was secretly hoping you'd post a clarifying response, this helps immensely.

    I definitely am starting to see the *social* implications of DVCS as compelling. Even if Linus' presentation on Git wasn't the best, I totally got that part. I've seen so many teams struggle with branching merging. If DVCS makes branching and merging operations simple and almost mindlessly easy, that is a huge step forward in quality of life for developers -- not to mention software engineering as, well, engineering.


    Agreed that Randall Schwartz's talk may be better if you want to learn about Git. But the Linus talk is my all-time favorite Tech Talk. It's more about a paradigm shift in how we think about individuals' interactions with the network (i.e. it's not so simple as "the network is the computer"), and a complete turning-upside-down of the idea of a canonical "version" of ANY set of data. The full impact of that is going to take years to play out (and in areas far afield of source code management). So needless to say, my recommendation is to check out the Linus tech talk, too....


    I don't know if it was ever said but the point of committing code is not so that others can see it, it's so you have incremental backups of what you were working on. Say you screw something up, you can easily go back and start working it in a different way. I don't know how often Jeff actually codes if he doesn't see the potential there. So man people I've worked with use a weird copy and paste scheme when they are working on files "just in case." DRCS does this for you!

    I've written a similar article, being a Mercurial user, about the downsides of Hg:
    http://humani.st/shortcomings-of-merc...

    as well as a collection of tech talks and resources for learning DRCSs here:
    http://humani.st/learning-distributed...

    In the second link, you may want to note that I heard in person from a SVN developer that they are planning to implement Hg-like distributed features to SVN. That should be a catalyst to moving people along.


    Hi Bill,

    Interesting article. I'm started to get more and more interested in dvcs, but for a different reason than the ones you mention. I'm still a big fan of continuous integration where possible. And a bit surprised to see you enthusiastic about postponing merges. Do I misunderstand you, or have you changed your mind on that?

    What I find really interesting, apart from the nifty technical tricks with chains of MD5 hashes, is the fact that dvcs are peer to peer systems, not client server systems.

    As social networks of programmers become more and more important, this too becomes more important. As an example, right now I'm collaborating on a distributed project with a couple of friends, and I'm hosting a central svn server. Which is down at the moment, so I'm going to have to stop reading blogs and go fix that server :-) Ideally, there wouldn't be a central server, with a single superuser, but a cluster of peers.

    I don't think it is a coincidence dvcs were invented (?) by the open source community.


    Um, if you think DVCS were invented by the open source community, think again.

    The first usable one was BitKeeper, we designed and built it for Linus, the kernel used it for 5 years, and mysql is still in it and started using it 8 years ago:

    http://mysql.bkbits.net:8080/mysql-6....


    Yeah, agreed. Also a DVCS makes it way, way easier to just try something out. You just make a clone of the repo and commit as you go along making your change. If you're happy with what you've done, push it back up, otherwise just scrap it. With a centralized VCS, you either check out your working copy and then make your own backups as you go along implementing or you have to create a remote branch, provided you have the permissions, which is a big pain in the ass.


    I've yet to be convinced. I've just done a listing of free Git and Mercurial hosting services (http://www.straw-dogs.co.uk/04/06/git...) at my site and I'm planning on giving them a whirl. But for locally managed teams of 3-4 developers I think its overkill. Subversion all the way.


    Does anybody know of any unbiased comparison of the *current* versions of git, GNU Bazaar and Mercurial?

    All I've seen is rather old (DVCS are young and changing quickly).
    Bazaar is now pretty fast, git is becoming easy to use and crossplatform, Mercurial has reached v1.0...


    Too bad that the new DVCS systems don't use most of the conventions that non-DVCS users are already accustomed to. For example, git-revert does not really revert your local changes ... it reverts in the repo. And all that "staging" business of git, and adding changes to the same file multiple times, no known version control system does that.

    Git just feels like a hacked-up patch-management system, not a version control tool. It reinvents so many conventions, its disgusting. It's so damn frustrating to expect it to work in a similar way to cvs/svn/accurev/perforce/whatever ... and just discover that it's learning curve is so damn steep.

    And whats up with all those perl scripts running behind the scenes? And git-svn trying to suck-in the whole svn repository just so that it could make a half-assed guess of what was branched and where, and no way to just disable this stupid behavior.

    I am quite tired to see the BEST git feature being "ability to use a VCS when not connected to the network" ... just install svn on your localhost and be done with it, this is not THE feature I'd expect from a superior version control system.

    Something like the pending-changelist in perforce is a real feature, something that really can make a difference. But for some strange reason it's not included ... not in svn and not in git and not about anywhere (excl. synergy tasks and clearcase activities)

    Anyways, its all too young yet to be taken seriously. Even trough the paradigm may be convincing, the products mostly suck.


    >But for locally managed teams of 3-4 developers
    >I think its overkill

    In fact I think it is the opposite: git does not need a central server nor a complex infrastructure, it's so lightweight that I use it even for one-man projects.

    >I am quite tired to see the BEST git feature being
    >"ability to use a VCS when not connected to the network"

    Most people who say that, are not able to change the central repository they are using. When distributed VCS are widespread, these kind of use will be less and less common.