How to deal with a large number of nested CVS projects

390 Views Asked by At

Never done this before, so I'm probably just being a noob... I'm trying to migrate our aged CVS repository to GitLab and I'm not sure how to handle the nested CVS projects. We have a LOT of them (i.e. about 1600 .project files dotted through the CVS repo). There's about 10 years worth of commits, totalling about 21GB, over two CVS repository directories.

The geneneral structure is $client/$product but most of these contain a bunch of subprojects - often very many.

What I've tried so far:

  1. Monolithic: tried to import the smaller CVS repo - ran out of memory on pass 1 first time (solved by adding memory) and ran out of disk space on pass 5 second time (can't really add disk as vmware datastores are nearly full - don't ask!).

  2. By client: cvs2git completed on one client, and then ran git --fast-import, but I then noticed all the sub-projects. Git doesn't care about the merged history, but our coders will. Read up on git submodules, but not sure this is what I need, as the entire project is normally within the same CVS repo, and I see it complicates the process of cloning the project.

  3. By project within client: using the productions from (2), recursed the CVS repo depth-first with find, looking for .project files; created a subdirectory for each and did a git init --bare on each, before importing the sub-projects with git --fast-import. This took ages, as I believe it has to munge the entire cvs2git blob and dump files every time, and I'm not sure I ended up with a proper git hierarchy.

So... rather than floundering round any more, I thought I'd ask here as I'm sure someone else must have needed to do this kind of thing. Any pointers greatly appreciated.


[edit]: Thanks for all the suggestions and help, people. It's out of my hands now - they (the devs) have decided to migrate the CVS projects piecemeal as they work them, so the majority will probably never be moved. The old cvs will be kept round as a read-only reference, for that purpose, and projects will be checked-in to git "pristine" so for any "BG" (before git) history, they will refer to cvs, but for "AG" history, they will consult git.

As for the issue of the deeply nested projects, the explanation I was given is that it relates to Java class hierarchies, and each project equates to one class. There's something in their build process that automatically changes cvs projects into java .jar files or something like that. There's a LOT of java in there.

2

There are 2 best solutions below

2
On

I'm not quite sure what you're asking, but here are some comments, hopefully one or more of which will answer your question.

  • Did you want to separately convert each individual project separately to git? I can't really tell from your question. But if you do, you can just copy each project's directory tree and run cvs2git on it. (Or even perhaps just create symlinks to save space, so long as the nesting allows it.) Loop over them one at a time. The simplicity of CVS's server-side back-end file storage is a blessing in this case.

e.g. doing this. Note that you could do some sort of a recursive copy rather than a symlink.

/opt/cvsrepos/CVSROOT
             /path/to/project1
                     /project2

/opt/convertrepos/CVSROOT #dummy empty directory to fool cvs2git
                 /project1 -> /opt/cvsrepos/path/to/project1
  • Can you just copy the whole cvs repository somewhere else temporarily to do the conversion, where you have more disk space and memory?
  • Whether you want to create one monolithic repository or lots of separate repositories is a whole opinion-based thing that is beyond the purpose of stackoverflow. It is also not clear to me if these projects require each other or not. If not, then you have more flexibility in that choice.
2
On

Usually it is not possible to preserve all information which is contained in centralized repository, especially something so imperfect as CVS, while converting to git. So I think you should not try it at all. Preserve the original repository for historical reference, and convert to git only projects which are currently in development. You don't even have to import whole 10 years of their, 2-3 years would be enough.