SVN to GIT migration or: how to organize a big repo with different languages

615 Views Asked by At

I know this topic has been here a few times, but not exactly as my situation is here.

We want to migrate from SVN to GIT using Atlassian Stash/Bitbucket Server. Our SVN repo is a "god object" repo, it's not that small. About 9 years history, 20,000 commits.

Because of historical reasons (...) the structure is like

svn:/url/trunk1
-| C# Libs (Visual Studio solutions + dll files)
--| Hardware related
---| Products
----| a lot of different solutions here...
---| non products
----| different other solutiosn here
-| Equipment (VS solutions, no Libs)
--| Equipment A
---| Project A1 
----| trunk
----| branches
----| tags
---| Project A2
----| trunk
--| Equipment B
---| Project B1
----| trunk
----| branches
----| tags
---| Project B2
---| no trunk but VS solution directly
-| Products
--| Product group A
---| Product A1
----| trunk
-----| matlab code
-----| documentation
-----| embedded c code
----| branches
----| tags
---| Product A2
----| trunk
-----| matlab code
-----| documentation
-----| embedded c code
----| branches
----| tags
-| Schematic/pcb
--| again some subfolders followed by products 

So it's a mess. There is a giant trunk1, we have mixed matlab + c code in the Product->Projects (and each project has its own trunk/branch). Then there are completely independent folders for pcb schematics and c# dll libs, though there is some connection to the products folder.

Two big questions: 1) would you even try to migrate this whole thing to a (lot of) GIT repos or would you re-organize it? 2) How do you organize repos with different languages like matlab code, embedded c firmware code and (loosely coupled) C# code? Do you split it up for each language, what do you do when a certain project needs parts of other repos and so on?

Thank you very much!

1

There are 1 best solutions below

3
On BEST ANSWER

How you handle this situation is eventually going to require you to reach some sort of consensus among the stakeholders. The good news is, every team that migrates to git has to deal with this sort of thing. There is no one right answer, but I can provide some practical things to think about and advice that I go through when I'm helping teams go through this process.

1. Git is all or nothing

Unlike subversion, a git clone (if you want it to be usable) is an all or nothing proposition. While you can shallow clone or check out a subset of a repo, these features are intended for deployment purposes and not general work.

So, going with one big repo means that every person who clones the repo will need to download 9 years of history for every project you've ever versioned.

While git is amazingly fast compared to older version control systems, it does suffer from performance degradation if you throw a few million objects into its database.

This means that some form of breakup is going to have to happen just to allow proper usage.

2. Where do you draw the lines?

First consider where you draw the lines now. It looks like you've already divided the repo down into smaller pieces, as I'm seeing multiple trunk and branch folders just in what you're showing us.

This is probably a good place to start. Go around to various members of the team and see what they actually have checked out, which will be very revealing.

Other considerations include what do you deploy together, how often do you change certain things, and how tightly coupled the components are to each other. Some projects might be best split up, some might be better combined.

As far as organization, BitBucket lets you create projects to organize your repositories, which behave much like your folders. You would end up with repos namespaced along these lines: bitbucket.mysite.com/equipment-a/project-a1.git

3. That's great for projects, but what about dependencies?

Looks like you've also got a lot of libraries and dependencies to worry about. There are a couple of ways to deal with this.

You can handle these things in the solution, by checking out the dependency repos and importing them into the projects using symlinks or importing them via the IDE. But be careful that your build process can find the dependencies later.

A slightly more modern and elegant solution is to set up internal package servers (for example, NuGet for c#) and use them to store versioned builds of the dependencies that you can pull into projects. This method will not only allow you easy access to the libraries in a familiar way, but also allow you to phase new versions in across projects more gradually in many cases. Not to mention it keeps all those pesky binary artifacts out of your repo.

4. Is that too many repos?

Yes, no, maybe, and probably not. There is not a magic number here. Obviously 1 is too low in your case (as is 10 probably) but too high is not determined by some arbitrary number, (no matter how scary it is to your boss) but it's about striking that magic balance between real and ideal that allows you to get the job done. I had a database architect who used to say "normalize til it hurts, denormalize til it works." The same applies here, find a balance that lets you work without the process getting in the way.

I've worked on microservices stacks with over 100 repos for a single project and large monolithic eCommerce sites with only 1. Both were equally effective for their respective environments.

On other good news, the decisions you make now aren't forever. It's relatively simple, or at least practical, to split or combine repos down the line if you find that they are too big or you've broken things apart too much. So make the best decisions you can now with the confidence that you're not locked in for life.

In Summary

You're already asking a lot of the right questions and seem to be intuiting a lot of the risks and rewards of the various methods, so I'm sure you'll find a good balance for your team. Most important of all, involve your team and make sure they have input into these decisions. And you might want to order an extra large tin of elbow grease.