Backend Development

Migrating from SVN to Git

By:

Jason McCreary

on 1/30/2014

Tl;dr – All our migration scripts have been open sourced for you to use to convert your SVN repositories to Git. Fork them from our GitHub repository.

Here at VIA Studio we recently switched our source control from SVN to Git. This is something we wanted to do for a while. But with over 100 SVN repositories, the switch seemed daunting.

About a year ago we began using Git for a few new projects. While this allowed us to become familiar with Git, it made the development process more complex. Which projects were Git? Which projects were SVN? We knew we wanted to use Git. It was time to rip the Band-Aid off and convert everything to Git.

Convert SVN to Git

Our only requirement for converting from SVN to Git was to keep history. We wanted all of our commit information in the new Git repository. If you don’t care about history, you can simply run git init inside your SVN working copy and remove the .svn files.

I did a lot of Googling and found a few invaluable posts for converting SVN repositories to Git repositories. While I didn’t follow his steps, this post helped me understand the process overall. Converting SVN to Git had also been asked on StackOverflow. Most of the answers used git svn. Others used svn2git.

git svn vs svn2git

Git comes with the tool git svn. It allows Git actions against an SVN repository. One is clone, which converts history and has additional options, such as converting tags and branches. While very powerful, git svn had a steep learning curve.

svn2git focuses solely on converting an SVN repository to Git. So the configuration options were more straightforward. And since svn2git wrapped git svn, I could pass any of the git svn options.

At its simplest, pass svn2git the location of the SVN repository and a file with author information:

svn2git svn://somerepo --authors authors-file.txt

You may need to tweak the options based on your repository. In our case, repositories were in standard SVN layout. However, everything existed under the dev branch and we did not use tags. So we passed the --notrunk and --notags options to svn2git:

svn2git svn://somerepo --authors authors-file.txt --notags --notrunk

Generating the authors file

The authors file was the most important piece. It must follow the format Jason McCreary <jmac@viastudio.com> for Git to properly convert history. I ran the following script against a few of our older, larger SVN repositories in an effort to get all of our author information.

svn log -q svn://somerepo | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2"@viastudio.com>"}' | sort -u >> all-authors.txt svn log -q svn://someotherrepo | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2"@viastudio.com>"}' | sort -u >> all-authors.txt sort -u all-authors.txt > authors-file.txt

I then went through the file and corrected the names and emails. For former employees, I linked to their GitHub account. For those that did not have a GitHub account, I entered their name and used a placeholder for their email.

Nearly all of the errors I encountered during the conversion process were due to missing author information. If you pass the option --verbose to svn2git it will provide more detail – in this case what author information is missing. You can add it manually to your authors file or keep this script handy to rerun.

Automating the conversion

After successfully converting a few of our larger repos, it was time to automate the conversion for all of our SVN repositories.

I wrote a few scripts to convert the repository from SVN to Git and push the code to GitHub. The script accepts a file listing all the repositories to convert. I shared these scripts on GitHub. Additional details can be found in the project README. I encourage you to use this script as a base and adjust the conversion process as needed. Once ready, you can convert any number of repositories by running:

sh svn2git-migration.sh repositories.txt

It took roughly 4 hours to convert 100 local SVN repositories of varying size and history.

Cleanup

After the projects were converted we noticed a few things we also wanted to change. For example, adding .gitignore, README.md, and removing SVN related files. We used svn2git-migration.sh as a base to create the other scripts (also available on GitHub).

.gitignore vs svn:ignore

Most of our projects use a similar codebase (WordPress). As such, we took the opportunity to standardize the .gitignore file by combining those provided by the GitHub’s gitignore project. We did export the svn:ignore properties into .gitignore from some SVN repositories with the following command, modified from a StackOverflow answer.

svn pg -R svn:ignore svn://somerepo | sed 's#svn://somerepo##g' | sed 's# - #/#g' | sed '/^$/d' > .gitignore

Shared SVN repositories

We had a few repositories that did not follow the standard SVN layout. Specifically repositories with multiple projects as sub-directories. I tried svn2git using the nested project example from the README. It did not work. I ended up using git svn directly.

git svn clone svn://somerepo/project/dir/ --no-metadata --authors-file=../authors-file.txt --no-minimize-url

Why Git?

Taking a step back, I’d like to review our decision to convert from SVN to Git. SVN and Git are both excellent source control management tools. I do not wish to take sides on which is better. For VIA Studio the bottom line is SVN had become painful. Checksum errors and merge conflicts were a daily hindrance to the team. The promise of Git finally drew us in.

The tougher question might be why GitHub? We chose to back our Git repositories with GitHub. GitHub provides a central location and backup. But since our repositories are private, we required an expensive plan. There are dozens of other cheaper services out there (bitbucket, gitlab). So why pay GitHub’s higher pricing?

The answer was features and popularity. GitHub has the most features: an excellent user interface, project wikis, organizational controls, service hooks. GitHub is also incredibly popular. Not only was most of our team familiar with GitHub, but so is the rest of the development community.

Git also gave us the opportunity to evolve our development process. We adopted more complex branching models (current using git-flow). We added code reviews during Pull Requests. We improved our continuous integration with service hooks and Jenkins.

It’s been about a month now and we haven’t looked back. From a technical perspective, this is the largest migration we’ve done in years. From a business perspective, Git opened new doors and gave the team a fresh outlook. That’s invaluable for our future.

Share to

Related Posts

Dive into the Sanity Structure Builder

By: Mark Biek on 6/13/2021

Sanity is the super fast, super customizable CMS that we're using as the backend for the new via.studio website. One of the more powerful concepts that Sanity is the ​Structure Builder which gives you the ability to customize how content is presented in the Sanity admin.

Read More »
Wordpress to Sanity Data Script

By:Nick Stewart on 3/11/2021

One of the biggest challenges of moving from one CMS to another is getting your existing data to the new platform. In our case, we were moving our existing WordPress platform, which had been around for years, over to the Sanity.io platform.

Read More »