How to download S3-Bucket, compress on the fly and reupload to another s3 bucket without downloading locally?

3.3k Views Asked by At

I want to download the contents of a s3 bucket (hosted on wasabi, claims to be fully s3 compatible) to my VPS, tar and gzip and gpg it and reupload this archive to another s3 bucket on wasabi!

My vps machine only has 30GB of storage, the whole buckets is about 1000GB in size so I need to download, archive, encrypt and reupload all of it on the fly without storing the data locally.

The secret seems to be in using the | pipe command. But I am stuck even in the beginning of download a bucket into an archive locally (I want to go step by step):

s3cmd sync s3://mybucket | tar cvz archive.tar.gz -

In my mind at the end I expect some code like this:

s3cmd sync s3://mybucket | tar cvz | gpg --passphrase secretpassword | s3cmd put s3://theotherbucket/archive.tar.gz.gpg

but its not working so far!

What am I missing?

3

There are 3 best solutions below

4
On

The aws s3 sync command copies multiple files to the destination. It does not copy to stdout.

You could use aws s3 cp s3://mybucket - (including the dash at the end) to copy the contents of the file to stdout.

From cp — AWS CLI Command Reference:

The following cp command downloads an S3 object locally as a stream to standard output. Downloading as a stream is not currently compatible with the --recursive parameter:

aws s3 cp s3://mybucket/stream.txt -

This will only work for a single file.

0
On

You can try this tool : https://pypi.org/project/s3-tar/

For example:

This example will take all the files in the bucket my-data in the folder 2020/07/01 and save it into a compressed tar gzip file in the same bucket into the directory Archives

s3-tar --source-bucket my-data --folder 2020/07/01 --filename Archive/2020-07-01.tar.gz
1
On

You may try https://github.com/kahing/goofys. I guess, in your case it could be the following algo:

$ goofys source-s3-bucket-name /mnt/src
$ goofys destination-s3-bucket-name /mnt/dst
$ tar -cvzf /mnt/src | gpg -e -o /mnt/dst/archive.tgz.gpg