How to compare local folder to folder in bucket?

3.2k Views Asked by At

We are archiving our projects in a bucket (using gsutil rsync). I've been tasked to verify that after each upload, a comparison must be performed of the local project folder and the folder uploaded to the bucket. This in order to ensure the local data in fact was fully uploaded to the bucket.

How could I perform such a test reliably?

1

There are 1 best solutions below

2
On BEST ANSWER

The gsutil rsync command will itself perform a checksum validation for every uploaded file. From Checksum Validation And Failure Handling:

At the end of every upload or download, the gsutil rsync command validates that the checksum of the source file/object matches the checksum of the destination file/object. If the checksums do not match, gsutil will delete the invalid copy and print a warning message.

[snip]

The rsync command will retry when failures occur, but if enough failures happen during a particular copy or delete operation the command will fail.

If the -C option is provided, the command will instead skip the failing object and move on. At the end of the synchronization run if any failures were not successfully retried, the rsync command will report the count of failures, and exit with non-zero status. At this point you can run the rsync command again, and it will attempt any remaining needed copy and/or delete operations.

[snip]

For more details about gsutil's retry handling, please see gsutil help retries.

So you could:

  • simply check the stdout/stderr of the command for any warnings indicating such checksum failure was fatal
  • employ gsutil rsync cmds with the -C option and directly use their failure tracking results (eventually with some automatic re-tries in place)
  • perform an alternate (paranoid) check looking just for the existance at the destination of all files which should have been sync'd (if the file still exists its content must have passed the checksum check)