Migrating from SVN to Git
By:
Jason McCreary
on 1/30/2014
Tl;dr – All our migration scripts have been open sourced for you to use to convert your SVN repositories to Git. Fork them from our GitHub repository.
Here at VIA Studio we recently switched our source control from SVN to Git. This is something we wanted to do for a while. But with over 100 SVN repositories, the switch seemed daunting.
About a year ago we began using Git for a few new projects. While this allowed us to become familiar with Git, it made the development process more complex. Which projects were Git? Which projects were SVN? We knew we wanted to use Git. It was time to rip the Band-Aid off and convert everything to Git.
Convert SVN to Git
Our only requirement for converting from SVN to Git was to keep history. We wanted all of our commit information in the new Git repository. If you don’t care about history, you can simply run git init
inside your SVN working copy and remove the .svn
files.
I did a lot of Googling and found a few invaluable posts for converting SVN repositories to Git repositories. While I didn’t follow his steps, this post helped me understand the process overall. Converting SVN to Git had also been asked on StackOverflow. Most of the answers used git svn
. Others used svn2git.
git svn
vs svn2git
Git comes with the tool git svn
. It allows Git actions against an SVN repository. One is clone
, which converts history and has additional options, such as converting tags and branches. While very powerful, git svn
had a steep learning curve.
svn2git
focuses solely on converting an SVN repository to Git. So the configuration options were more straightforward. And since svn2git
wrapped git svn
, I could pass any of the git svn
options.
At its simplest, pass svn2git
the location of the SVN repository and a file with author information:
svn2git svn://somerepo --authors authors-file.txt
You may need to tweak the options based on your repository. In our case, repositories were in standard SVN layout. However, everything existed under the dev branch and we did not use tags. So we passed the --notrunk
and --notags
options to svn2git
:
svn2git svn://somerepo --authors authors-file.txt --notags --notrunk
Generating the authors file
The authors file was the most important piece. It must follow the format Jason McCreary <jmac@viastudio.com>
for Git to properly convert history. I ran the following script against a few of our older, larger SVN repositories in an effort to get all of our author information.
svn log -q svn://somerepo | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2"@viastudio.com>"}' | sort -u >> all-authors.txt svn log -q svn://someotherrepo | awk -F '|' '/^r/ {sub("^ ", "", $2); sub(" $", "", $2); print $2" = "$2" <"$2"@viastudio.com>"}' | sort -u >> all-authors.txt sort -u all-authors.txt > authors-file.txt
I then went through the file and corrected the names and emails. For former employees, I linked to their GitHub account. For those that did not have a GitHub account, I entered their name and used a placeholder for their email.
Nearly all of the errors I encountered during the conversion process were due to missing author information. If you pass the option --verbose
to svn2git
it will provide more detail – in this case what author information is missing. You can add it manually to your authors file or keep this script handy to rerun.
Automating the conversion
After successfully converting a few of our larger repos, it was time to automate the conversion for all of our SVN repositories.
I wrote a few scripts to convert the repository from SVN to Git and push the code to GitHub. The script accepts a file listing all the repositories to convert. I shared these scripts on GitHub. Additional details can be found in the project README
. I encourage you to use this script as a base and adjust the conversion process as needed. Once ready, you can convert any number of repositories by running:
sh svn2git-migration.sh repositories.txt
It took roughly 4 hours to convert 100 local SVN repositories of varying size and history.
Cleanup
After the projects were converted we noticed a few things we also wanted to change. For example, adding .gitignore
, README.md
, and removing SVN related files. We used svn2git-migration.sh
as a base to create the other scripts (also available on GitHub).
.gitignore vs svn:ignore
Most of our projects use a similar codebase (WordPress). As such, we took the opportunity to standardize the .gitignore
file by combining those provided by the GitHub’s gitignore project. We did export the svn:ignore
properties into .gitignore
from some SVN repositories with the following command, modified from a StackOverflow answer.
svn pg -R svn:ignore svn://somerepo | sed 's#svn://somerepo##g' | sed 's# - #/#g' | sed '/^$/d' > .gitignore
Shared SVN repositories
We had a few repositories that did not follow the standard SVN layout. Specifically repositories with multiple projects as sub-directories. I tried svn2git
using the nested project example from the README
. It did not work. I ended up using git svn
directly.
git svn clone svn://somerepo/project/dir/ --no-metadata --authors-file=../authors-file.txt --no-minimize-url
Why Git?
Taking a step back, I’d like to review our decision to convert from SVN to Git. SVN and Git are both excellent source control management tools. I do not wish to take sides on which is better. For VIA Studio the bottom line is SVN had become painful. Checksum errors and merge conflicts were a daily hindrance to the team. The promise of Git finally drew us in.
The tougher question might be why GitHub? We chose to back our Git repositories with GitHub. GitHub provides a central location and backup. But since our repositories are private, we required an expensive plan. There are dozens of other cheaper services out there (bitbucket, gitlab). So why pay GitHub’s higher pricing?
The answer was features and popularity. GitHub has the most features: an excellent user interface, project wikis, organizational controls, service hooks. GitHub is also incredibly popular. Not only was most of our team familiar with GitHub, but so is the rest of the development community.
Git also gave us the opportunity to evolve our development process. We adopted more complex branching models (current using git-flow). We added code reviews during Pull Requests. We improved our continuous integration with service hooks and Jenkins.
It’s been about a month now and we haven’t looked back. From a technical perspective, this is the largest migration we’ve done in years. From a business perspective, Git opened new doors and gave the team a fresh outlook. That’s invaluable for our future.
Related Posts
Dive into the Sanity Structure Builder
By: Mark Biek on 6/13/2021
Sanity is the super fast, super customizable CMS that we're using as the backend for the new via.studio website. One of the more powerful concepts that Sanity is the Structure Builder which gives you the ability to customize how content is presented in the Sanity admin.
Read More »Wordpress to Sanity Data Script
By:Nick Stewart on 3/11/2021
One of the biggest challenges of moving from one CMS to another is getting your existing data to the new platform. In our case, we were moving our existing WordPress platform, which had been around for years, over to the Sanity.io platform.
Read More »