aws s3 cp emr stdout.gz file as txt encoding issue

34 Views Asked by At

I use aws s3 cp s3://source_bucket/stdout.gz s3://target_bucket/stdout.txt in linux to copy this log file to another s3 location. and if I download the file to my windows folder, the content format of this txt file looks exactly as you would see when you view the stdout in the emr step interface. and I simply opened it with the windows native notepad application

but when I sync this file to my windows directory using the command in windows cmd: aws s3 sync s3://target_bucket/stdout.txt c://myfolder/stdout.txt and when open it with notepad again, I see all messed up un-readable chars.

I tried with --content-encoding gzip and --content-type text/plain option, it didn't work.

1

There are 1 best solutions below

4
Christian Loris On

The aws s3 cp command uses the S3 API behind the scenes, which supports transparent decompression for a number of common compression types like gzip, bzip2, zip etc. This is intended to facilitate the downloading of compressed files without having to manually decompress them first.

The aws s3 sync command differs from copy in that it operates on directories and tries not to be helpful by decompressing the file. I would suggest trying to sync the whole directory or target the specific file in the directory as such aws s3 sync s3://target_bucket c:\myfolder\target --include 'stdout.txt'.