I'm trying to remove large files from our SVN repository because our non-developer users are really cool guys and why wouldn't 5GB files go into source control?
I've read that obviously this isn't an easy task with SVN but it can be hacked around using svndumpfilter so that's what I'm doing. In order to test what I'm trying to filter, I've created an empty test repository that contains the following:
I've dumped that repository using this command:
sudo svnadmin dump /path/to/repo > my.dmp
and I can re-load it without problem using this:
sudo svnadmin load /path/to/repo < my.dmp
For my test, I tried filtering out just the .sql files using this command:
svndumpfilter exclude --pattern '*.sql' < my.dmp > my.dmp.filtered
For which the output is this, as expected:
bitnami@linux:~/working$ svndumpfilter exclude --pattern '*.sql' < my.dmp > my.dmp.filtered
Excluding prefix patterns:
'/*.sql'
Revision 0 committed as 0.
Revision 1 committed as 1.
Revision 2 committed as 2.
Revision 3 committed as 3.
Revision 4 committed as 4.
Revision 5 committed as 5.
Dropped 2 nodes:
'/NewFolder/09_Script_Migrate_Data_From_dbo.SalesNormalTable_To_dbo.SalesPartitionTable.sql'
'/Test/AllSql.sql'
After that I cleaned the repository by deleting the files and reinitializing the repo:
sudo rm -r /path/to/repo
sudo mkdir /path/to/repo
sudo chown -R daemon:subversion /path/to/repo
sudo svnadmin create /path/to/repo
and all is well, my files are gone. I load the filtered repo dump into the repository:
sudo svnadmin load /path/to/repo < my.dmp.filtered
with this output as expected (all the SQL files are gone):
bitnami@linux:~/working$ sudo svnadmin load /opt/bitnami/repository/RedMineProduction/ < my.dmp.filtered
<<< Started new transaction, based on original revision 1
* adding path : Test ... done.
------- Committed revision 1 >>>
<<< Started new transaction, based on original revision 2
* adding path : NewFolder ... done.
------- Committed revision 2 >>>
<<< Started new transaction, based on original revision 3
* adding path : Test/.NET String Formats.pdf ... done.
------- Committed revision 3 >>>
<<< Started new transaction, based on original revision 4
------- Committed revision 4 >>>
<<< Started new transaction, based on original revision 5
------- Committed revision 5 >>>
But after this, I can still see the files in TortoiseSVN when I browse the repository. I tried cleaning my cached repositories in Tortoise, but that didn't help. I thought maybe it was a bug with Tortoise so I went back to my server and checked out the repository to a local directory:
bitnami@linux:~/working$ svn co file:///opt/bitnami/repository/RedMineProduction/
A RedMineProduction/NewFolder
A RedMineProduction/Test
A RedMineProduction/Test/.NET String Formats.pdf
Checked out revision 5.
I had a look at the directory to make sure the files were there:
bitnami@linux:~/working$ ls -lash RedMineProduction/*
RedMineProduction/NewFolder:
total 8.0K
4.0K drwxrwxr-x 2 bitnami bitnami 4.0K Mar 11 15:35 .
4.0K drwxrwxr-x 5 bitnami bitnami 4.0K Mar 11 15:35 ..
RedMineProduction/Test:
total 148K
4.0K drwxrwxr-x 2 bitnami bitnami 4.0K Mar 11 15:35 .
4.0K drwxrwxr-x 5 bitnami bitnami 4.0K Mar 11 15:35 ..
140K -rw-rw-r-- 1 bitnami bitnami 138K Mar 11 15:35 .NET String Formats.pdf
How can I get Tortoise to refresh to the actual status of the repository? This is on a 5-revision repo... it'll be hell on our 20k+ revision repository...
Further information:
When I look at the repository in a browser, the files that I filtered out are still visible. They're corrupt, I can't download them, but their filenames and the relevant directory structure is still there.
I looked at the my.dmp.filtered file to see if there's any trace in there of the old structure and there isn't. The commit numbers where the structure and files would be have the "this is an empty revision for padding" message.
