Source file size increase during rsync

1.5k Views Asked by At

I backup a directory with rsync. I looked at the directory size before I started the rsync with du -s, which reported a directory size of ~1TB.

Then I started the rsync and during the sync I looked at the size of the backup directory to get an estimated end time. When the backup grew much larger than 1TB I got curious. It seems that the size of many files in the source directory increases. I did an du -s on a file in the source before and after the rsync process copied that file:

## du on source file **before** it was rsynced
# du -s file.dat
2 file.dat

## du on source file **after** it was rsynced
# du -s file.dat
4096 file.dat 

The rsync command:

rsync -av -s --relative --stats --human-readable --delete --log-file someDir/rsync.log sourceDir destinationDir/

The file system on both sides (source, destination) is BeeGFS 6.16 on RHEL 7.4, kernel 3.10.0-693

Any ideas what is happening here?


There are 1 best solutions below


file.dat is maybe a sparse file. Use option --sparse :

   -S, --sparse
          Try  to  handle  sparse  files  efficiently so they take up less
          space on the destination.  Conflicts with --inplace because it’s
          not possible to overwrite data in a sparse fashion.

Wikipedia about sparse files:

a sparse file is a type of computer file that attempts to use file system space more efficiently when the file itself is partially empty. This is achieved by writing brief information (metadata) representing the empty blocks to disk instead of the actual "empty" space which makes up the block, using less disk space.

A sparse file can be created as follows:

$ dd if=/dev/zero of=file.dat bs=1 count=0 seek=1M

Now let's examine and copy it:

$ ls -l file.dat
.... 1048576 Nov  1 20:59 file.dat
$ rsync file.dat file.dat.rs1
$ rsync --sparse file.dat file.dat.rs2
$ du -sh  file.dat*
0       file.dat
1.0M    file.dat.rs1
0       file.dat.rs2