dumpfilter one project out of svn repo

3.1k Views Asked by At

I have a dump file of the whole AFS svn repository. And I am trying to dump filter the hadoop project out of the large dump.Here is my command:

svndumpfilter include --drop-empty-revs --skip-missing-merge-sources /hadoop < svn-asf-public-r0\:1164363 > hadoop_dumpfile1

And then I got a stdout like this:

...
Revision 614268 skipped.
Revision 614269 skipped.
Revision 614270 skipped.
Revision 614271 skipped.
Revision 614272 skipped.
Revision 614273 skipped.
Revision 614274 skipped.
Revision 614275 committed as 614275.
Revision 614276 committed as 614276.
...

but here comes the problem:

Revision 614328 skipped.
svndumpfilter: E200003: Invalid copy source path '/lucene/hadoop/site'

I think it might be the old move/copy operations in the repo because the original dumpfile is really huge. And there maybe many changes in the tree of svn file structure. What should I do now?

1

There are 1 best solutions below

1
bahrep On

The revision 614329 affects these paths:

  • hadoop/core/site/,
  • lucene/hadoop/site/.

So you have to include /lucene/hadoop/site to your svndumpfilter include command-line.

Read SVNBook! It seems that the issue you've encountered is described in SVNBook | Filtering repository history:

Also, copied paths can give you some trouble. Subversion supports copy operations in the repository, where a new path is created by copying some already existing path. It is possible that at some point in the lifetime of your repository, you might have copied a file or directory from some location that svndumpfilter is excluding, to a location that it is including. To make the dump data self-sufficient, svndumpfilter needs to still show the addition of the new path—including the contents of any files created by the copy—and not represent that addition as a copy from a source that won't exist in your filtered dump data stream. But because the Subversion repository dump format shows only what was changed in each revision, the contents of the copy source might not be readily available. If you suspect that you have any copies of this sort in your repository, you might want to rethink your set of included/excluded paths, perhaps including the paths that served as sources of your troublesome copy operations, too.