Subversion was one of those things I really, really wanted to like. After using CVS, and being frustrated with its limitations, the idea of Subversion was great: Build a better CVS. Unfortunately, while it has partially accomplished that goal, it has failed in other ways. In some areas, it is no better than CVS, and in others, it is worse. That’s a shame.
Where svn gets it right:
1. Not versioning file-by-file
In CVS, everything — tags, branches, revisions — applies on a file-by-file basis. A “branch” as a collection of files exists only insofar as you have a bunch of files that all have the same branch identifier in them. It’s based on the old RCS file format. It’s surprisingly robust for all of that, but it does make fixing things annoyingly difficult at times.
2. Branches are lazy copies
This is definitely the right way to do branching, as Perforce pretty clearly showed. It makes it easy to create a branch without taking up a lot of disk space, but it also makes a branch a collection of files, which is, of course, how most people think of them.
What svn gets wrong:
1. The whole branches/tags/trunk standard
Branches should not be in a separate namespace; this just complicates things and makes “trunk” seem more special than it really is. Granted, this is just a convention, but it’s so strongly adhered to that many supporting tools break if you violate it. (Also, raise your hand if someone has ever created “branches/trunk” in your repo. Yeah, I thought so.)
2. No tag support
No, really: The “tags” in the branches/tags/trunk standard are not tags as they are usually defined in a version control system. A tag is, semantically, a human-readable reference to a revision number, and should be substitutable any place a revision number is. The developers of Subversion seem to have missed this crucial point. As a result, in Subversion, you cannot, for example, say something like
svn log -r"tag1":"tag2"
… you have to do something more complicated. Nor can you do something like
svn checkout -r"tag1" svn://repo/branches/feature1
You have to say
svn checkout svn://repo/tags/feature1
This is not just a matter of taste; there is no association between the tag and what it tagged in Subversion.
3. Merging is hard
I never actually adopted Subversion because it had no merge tracking. This is the number one feature that made branching and merging in CVS difficult, and the number one reason I used Perforce at several employers. I was forced to use Subversion at my current employer, and we looked forward eagerly to the promised merge tracking in 1.5. Unfortunately, when it arrived, it was not possible to “import” our previous merge history into the new merge tracking mechanism, which made it more or less impossible to start using it effectively, and in cases where we were starting fresh, it still suffered from the fact that it was a major kludge: The designers of Subversion didn’t build it in a way that made adding merge tracking easy. The result is not pretty; bolting wings onto a car doesn’t make it an airplane.
4. Remote use is awkward
It’s not as bad as Perforce (caveat: I haven’t used the latest version of Perforce, which has some new features related to remote use), but still, there’s no getting around the fact that Subversion is fundamentally tied to a single central server and there is no way to commit or examine history unless you are connected to the server.
5. The repo format is poorly documented and there are no recovery tools…
6. …unless you use the BDB backend, which has its own problems
The default new repo format is FSFS, which is a format created by the Subversion developers. There is some documentation for it, but it’s not the clearest stuff in the world. (Of course, you could always read the code — if the code wasn’t impenetrable.) This would be fine if you never encountered any kind of repository corruption, or if you could always restore from backups. However, backups are slow (see below), so you may have a significant gap between them. If your repository gets corrupted, you look hopefully into the manual and see this:
svnadmin recover REPOS_PATH
Yay! You are saved!
Well, no, you aren’t: That only works on BDB repos. Which you didn’t use because, well, the Subversion folks recommend against using BDB because it has problems with NFS (this is actually a good recommendation). It turns out that there are no recovery commands for FSFS. So you email the svn list for help and discover that apparently this can’t happen, because no one ever responds to you except several people asking you if you ever found a solution. So you dig into the code to see if you can fix it yourself, only to find that
7. The code is inscrutable
With all due respect to the Subversion team, the svn code is some of the most godawful code I have ever tried to read. While it’s true that version control is not exactly trivial, I find the code for git much, much easier to read, to the point that I can actually figure out what it’s doing. Even the old CVS code was more clear.
8. One revision number for the whole repository.
This is actually not a big deal if the repository contains the code for a single unit of development, e.g., a library or an application. In such a case the revision number basically corresponds to a change set. The problem is that many Subversion repositories contain multiple projects that are only loosely related to each other, with the result that it’s not always entirely clear what, exactly, revision NNNNN refers to. While this is partly a problem with how people use Subversion rather than Subversion itself, it’s also the case that Subversion encourages this sort of use because of the way the svn server is tied to the repository.
9. There are no logs
If you come to svn from Perforce, this is surprising. Subversion doesn’t keep any sort of logs, because the repository is separate from the programs that interact with it. That is, the svn server is not the only thing that can make changes to the repo. Knowing what program made a change to the repository and when can be very useful in figuring out what caused data corruption for example.
1. It’s slow. I realize that for some people, this is a major annoyance.
2. Backups are slow. If you have remote people worldwide, you don’t want to take the server offline to do the backups, so you do hot backups, but even they affect the performance, making it slower than usual.
3. Repositories are large. Our main Subversion repository at my current workplace occupied about 5.3 GB. The git import was less than 2 GB. This is fairly typical.
Overall, I think that Subversion only partially accomplished its original goal, to be a better CVS. By failing to design the software with branching and merging in mind, the developers created a system where branching is easy but merging is difficult. Branches are useless without good merging, and CVS’s biggest failing was in its lack of merge tracking. Unfortunately, this is the one area that Subversion got seriously wrong, and I believe that fixing it properly would require a major rewrite.
In future articles, I’ll talk about the pros and cons of various other version control systems, as well as continuous integration servers, bug-tracking systems, build systems (make, maven, ant, etc.), code review systems, and so forth.