Using "hg transplant" to splice in the contents of another repository at a later point

248 Views Asked by At

If you want to get to the actual question, scroll to the bottom of the question. I just felt it necessary to explain the circumstances.

State of affairs

In our company we have, for historical reasons, several version control systems. Currently we are trying to move to any git-fast-import-compatible distributed version control system, really, but our pick is Mercurial at the moment. I say at the moment, because once you have taken that step, it's easier to migrate from one DVCS to another in most cases.

We have essentially three code bases that we want to join plus a part that has been committed into one SVN repository, which we want to separate out.

So we have:

  1. an ancient CVS repository
  2. one huge (26 GiB) SVN repository with nearly 7000 revisions containing a lot of code, some experimental code and actual junk (to be filtered out during conversion) and the build products from various releases - which are meant to get separated out into a repository or even just folder structure of their own)
  3. one SVN repository containing related code, but sharing no files with the other two (think of it as getting spliced in as a folder)

The huge repo (2.) contains snapshots of the state of the CVS repo (1.) at different points in time. Obviously none have been tagged in the CVS repo, because that'd be potentially useful. On top of that the snapshots have patches applied on top of that snapshot state.

This is to say that a subfolder hierarchy in 2. corresponds roughly to 1.. However, there is no need to worry about it, as the idea is to retire either one of those folders after initially splicing them under distinct path names. So no naming clashes to be expected here.

What I've done so far

  • After some research I picked reposurgeon as my tool of choice. This is a very powerful tool allowing, indeed, surgical operations on git-fast-import streams. I warmly recommend it to anybody tasked with similar migrations.
  • The conversion of the huge repository is fully covered by now. Files and folders have been expunged and old symbols removed. Kinks have been ironed out and stuff like closing a branch (in SVN) and later reopening it from another revision under the same name have been fixed such that they appear continuous. Basically all the surgical operations have been done. (the result is ~350 MiB as a git-fast-import stream, btw)
  • The smaller SVN repository is mostly covered as well, although some minor tasks remain. However, due to my experiences from the huge SVN repo, I'm confident this is a matter of only a few hours.
  • Last but not least the CVS repository. I have tried a number of different tools, including the cvs-fast-export, now maintained by Eric S. Raymond, also the author of reposurgeon. I have also contemplated conversion to SVN, just to find that the toolset (cvs2svn) used to do that has been extended to export to Mercurial as well.

The problem

While the SVN conversions took a long time to get to the point where we can call it done, the CVS conversion is still in progress.

Since CVS doesn't have a repository-wide revision history all tools have to attempt to parse the RCS files and make sense of their contents to piece together the puzzle.

Some of the really bad scars I was able to remove manually by literally editing the locked RCS file in an editor (after taking backups). This way some invalid revisions (RCS and CVS have a different idea of what is a valid revision number) as well as symbols that appeared as tags in some files and as branches in others have been weeded out.

I am also able to preprocess the (CVS) repository to remove a lot of the branches and tags which we do not need, prior to the branches we are interested in (rcsfile.py from rcsgrep helped). Basically prior to that certain point, we only want the contents of MAIN/trunk/default/master, whatever you want to call it.

However, some of the tools outright fail (e.g. cvs-fast-export crashes) and others give results that are somewhat mangled.

Not too bad, one can demangle a lot by means on reposurgeon. However, half a dozen of branches never even make it to the converted repository.

The reason appears to be in all cases that all tools get confused by a particular peculiarity you wouldn't find in SVN, for example.

If branch tags get "moved" forcibly (cvs tag -B), then the originally allocated branch number in the RCS file gets orphaned and another new branch number will take its place. However, the old revisions remain in the file.

Now the new branch started perhaps hours, days or months after the original branching took place. This appears to be what upsets all those tools.

While it would be cool to also include the orphaned branches and mend those "wounds", it's not a priority. Most of the files treated with cvs tag -B are not source files, but files like GNUmakefile or other project files.

However, the problem remains, that the CVS conversion isn't finished and will take some more time.

And managers grow impatient ...

The question

Is it possible to start out with the two SVN repositories spliced into a single Hg repository and later (when the CVS conversion is finished) splice in those changes without having to initialize yet another unrelated Hg repo?

The (CVS repo) splicing would not cause conflicting paths, I have to say up front. The other repository is meant to be spliced in via its own subdirectory, so no name clashes.

I know that pushes and pulls can introduce commits from two years ago into someone's repository today. However, does this mean that a hg transplant would be likely to succeed as well? I.e. can I expect to be able to transplant those commits from a decade ago into the joint Hg repository?

This way I could split the migration into stages.

  1. consolidate the two SVN repos into one Hg repo - basically now
  2. splice in the converted (to Hg) CVS repo in a few weeks/months from now

Is this technically feasible by means of hg transplant (or any other hg extensions for that matter)?

If it is, I'll appreciate any advice about potential caveats as well.

0

There are 0 best solutions below