How to Choose a Distributed Version Control System

So it’s has come time for your team to pick a new version control system. Perhaps your team has outgrown the CVS server or maybe someone lost the keys to your Vault server. Or maybe someone has wised up to the idea that backing up software by creating zip files and storing on a network fileshare (yes I’ve seen this — oh, the horror) doesn’t work so well.

In any case, you’ve decided you need to update your method of source control. But what to choose? I will provide some guidance in helping to figure out which distributed version control system is a good fit for your team.

The steps described below may seem quite obvious, but time and time again I see people asking, “Which one is best?” The point here is that there is no “best” — from a technical perspective, Git, Mercurial and Bazaar are all very, very similar and provide most (if not all) of the same functionality. The difference is in how that functionality is implemented, packaged together, and provided to you for use. So the question is not, “Which one is best?”, but “Which one best suits my team’s needs?”.

NOTE: I’m assuming here that you’ve already weighed the pros and cons of centralized vs. distributed version control and have decided you want to give distributed version control a try. The differences between centralized and distributed version control deserves a blog post in itself.

Step 1: Don’t Just Go With What You Heard was Good

How well a version control system works in a particular setting can completely make or break a team’s productivity. Assuming you’re responsible for a team of developers who will be living with the consequences of your decision day in and day out, you can’t afford to just go with what you heard was good. Sure, gathering others’ opinions on the tools they’ve used and why they feel like the tools do or don’t work well is important (you’d be silly to make any kind of decision about tooling without doing this), but you have to figure out what will work for you. The process for doing this is so simple it should be obvious:

Figure out what requirements your team has.
Figure out how well the available options meet those requirements.
Draw a logical conclusion, putting personal preferences aside (remember: this is about your entire development team, not just you)

Step 2: Brainstorm Your Team’s Requirements

This is where you get some guys on your team together and you brainstorm the list if requirements you have. Only you know about an special needs you have within your own development environment, but here are some issues you probably want to consider:

What version control system (if any) is your team using now?
Do you require good GUI tools, or do you expect to be a command-line only shop?
What operating systems do your developers use?
How well suited is your codebase to distributed version control? Is your repository entirely full of text files, or do you have a lot of binary files mixed in? How big are these binary files (if they exist) and how often do they change?
What is your team’s development model? Are you hoping to choose a tool that will allow you to adopt a new development model?
How do you want hosting to be handled? Do you want, or need, to self-host or do you want to purchase hosting and let someone else deal with it for you?
What range of interest and technical expertise do you need to support? Are you a small group of people who all like to geek out on version control — and therefore are maybe interested in something more exotic — or are you a large group of people who don’t actually care how version control works at all and just want the damn system to work?
Do you require that the version control system is easily extensible and customizable?
What type of performance do you require? Is it more important to you to have a very well-compressed repository so clone sizes are as small as possible, or do you care more about how fast changes can be streamed through the pipe when doing clones and pulls?

Step 3: Gather Some Testers

The next stage is going to involve some experimentation, so it’s important that you include at least a few of the developers on your team in this. An easy way to do it is simply ask the team as a whole who would like to participate. Asking for volunteers means you’re likely to get feedback from people who are passionate enough to actively participate in the testing and not bombard everyone who doesn’t actually care with a bunch of noise.

Step 4: Figure Out How Well the Available Options Fit Your Requirements

This is the part where you get your hands dirty. I’ll cover some details about Git, Mercurial, and Bazaar — the 3 most widly used distributed version control systems — here.

Setting Up Temporary Hosting

In order to test, you’re going to need a hosting solution for testing. What you want to use for this probably depends a lot on whether you will require self-hosting or external hosting in your production environment. You don’t want to require a self-hosted solution in production, but test with external hosting only to find out how annoying the system is to host locally, do you (*ahem* not to imply that yours truely has made this mistake or anything…)?

External Hosting

If you can use external hosting, there are a few options (check with each source regarding what the cost will be for your particular organization). These options all offer collaborative code reviews, pull requests, project wiki, and a ticketing system in addition to basic code hosting:

GitHub – With the tagline of “Social Coding”, GitHub quickly became the de-facto standard for open-source git projects (and arguably GitHub itself — the timing and success of it’s initial release — is heavily responsible for the insane amount of popularity Git has had as a version control system).
BitBucket – Having recently gotten a facelift and a whole host of improvements since they were acquired by Atlassian, BitBucket providees options for hosting both Mercurial and Git repositories.
Launchpad – If you are going to try Bazaar, Launchpad is (as far as I know) your only option for external hosting.
Kiln on Demand – Hosting solution for Mercurial repositories from FogCreek software.

Self-Hosted Solutions

Self-hosting is a bit more tricky. Here are some options I know of:

RhodeCode – We use this at Unity because we need to self-host and, although it has some rough edges, it has worked reasonbly well for our needs. RhodeCode will support hosting for both Mercurial and Git repositories and includes repository/permissions management and integration with ldap authentication. Collaborative code reviews and pull requests are currently a work-in-progress but are being actively developed.
GitHub Enterprise – If you’re willing to fork over the (not insignificant amount of) cash, a self-hosted GitHub solution could be an option for you.
Atlassian Stash – Another option for self-hosting Git repositories.
Plain Mercurial + Apache – An option for more of a bare-bones setup for publishing Mercurial repositories.
Sloecode – For self-hosting Bazaar repositories.

Configuring Clients

Next, you need to get your guinea pigs set up locally. I highly recommend making sure users test both the command-line interface and a selection of available GUI tools, since almost certainly your team will have a mixture of people who prefer the command-line and people who prefer GUI tools.

Here are just some of the GUI tools that I know are popular for different version control systems:

SourceTree – Supports both Mercurial and Git on Mac OS X and is being ported to Windows at the time of this writing. This tool is used heavily internally at Unity Technologies and is highly recommended.
TortiseHG – Popular for Windows, though technically it does run on OS X and Linux as well.
GitX – GUI frontend for Git on OS X
Bazaar Explorer – Cross-platform GUI frontend for Bazaar

Each client will need to be configured differently, depending on the client and the version control system. Information on how to do that is not covered here.

Test, Test, Test!

Finally, start testing the various systems you’ve put into place. It’s important to have your developers try to do real work in the different version control systems. There will be some additional overhead doing this because they’ll have to pull these changes out and incorporate them into your “real” codebase after they are done, but the experience gained while testing for real work will be invaluable. There is no way to know how the tools will suit your needs for real work without actually trying them with real work — real codebases, real users, real commits. My experience as a build engineer has shown me how common it is for tools and applications to seem like they will work fine when being tested with a contrived test case, but fall down miserably when put to real use.

It’s a good idea to keep a running log that includes ongoing, constantly-updated input from your guinea pigs in the format of:

How easy each system/tool is to become comfortable with (read: when you have an entire team of developers to support, learning curve is important)
What they liked and didn’t like (chances are if your guinea pigs all dislike feature X of system Y, more developers on your team will as well)
Problems encountered and how easy they were to solve (If all of your developers are encountering problems with a particular system or GUI tool, then it’s obviously a red flag)

Step 5: Tally Up the Results

At this piont it makes sense to have a group discussion to share thoughts and opinions, then have each guinea pig (including yourself) rank each system in terms of how well it meets the requirements you outlined in Step 2. Based on these results, you probably have an overall winner.

At this point, you should be ready to begin planning your migration — how to actually migrate a team to distributed version control probably deserves a blog post in itself.

Good Luck, and Happy Coding!