As a software developer, I have a large project whose code is stored in a Subversion® repository. Over the years I have refactored the code many times and teased it into a modular architecture made up of various components. One component has matured to the extent that it really ought to be a standalone project in itself.
I want that component's code to reside in a new, standalone, Subversion® repository (to potentially be migrated into a git repository). But, I also want to retain the version history of all the files in that component so that I can read the log and commit messages that explain how and why it came to be in its current form.
I have created a dump of an existing repository and want to use svndumpfilter to purge all but a choice set of commits from the dump file which can be imported into a new repository (using svnadmin load).
As the penultimate paragraph of the svndumpfilter documentation advises, I intend to use the include option on svndumpfilter to list the paths that I wish to retain in my new repository.
Quote:
It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. To make the dump data self-sufficient,
svndumpfilterneeds to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream......If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths, perhaps including the paths that served as sources of your troublesome copy operations, too.
This means that, when we run the new project through the filter, in order to preserve their commit histories, we must not only include the project files at the current revision, but also include the paths of their ancestors.
The question is: How do we determine the paths of those ancestors?
It is possible to run the svn log command on a repository url. This would return its commit history. Using the --verbose option ensures that we can see the paths affected by any given commit in that history.
We can ignore occassions when the file is modified. We are really interested in tracing the history up to the point of when that file was svn-added to the repository. Furthermore, if that file was added using svn-copy (or any effective svn-move) command we want to trace the ancestry of that 'source file' too.
The information is all there in the output of svn log.
The --xml option prints the svn log output in xml format. This makes it easier for a machine to understand it.
What I need is some tool or technique to lift the pertinent ancestory path data from the xml stream that is output from a svn log command on a given repository file.
If this is done for each file in the project-to-be-extracted then we can build a set of paths that need to be included in the svndumpfilter process that is run to filter the dump file.
Does such a tool or solution already exist?
If so, I'd appreciate it if you could please let me know about it.
If a solution does not exist, I intend to write a little command line interface (CLI) program to parse the XML. See the project write-up on github.
I prefer not to 'reinvent a wheel' unless it is necessary so your help is appreciated.
svn log FILENAMESome samples with (a little fefactored) toy-repo in HEAD-state
with such short history
In case of restoring history of, f.e.
Coredir for XML and not-XML logs you have to performs approximately the same amount of operations(repeat for every PATH-part in "from" side recursively)
or, for XML-log (irrelevant part of log trimmed)
(parse every path node for interesting path, extract source-path from
copyfrom-pathkey, repeat logging with new extracted path)