I use aws s3 cp s3://source_bucket/stdout.gz s3://target_bucket/stdout.txt in linux to copy this log file to another s3 location. and if I download the file to my windows folder, the content format of this txt file looks exactly as you would see when you view the stdout in the emr step interface. and I simply opened it with the windows native notepad application
but when I sync this file to my windows directory using the command in windows cmd: aws s3 sync s3://target_bucket/stdout.txt c://myfolder/stdout.txt and when open it with notepad again, I see all messed up un-readable chars.
I tried with --content-encoding gzip and --content-type text/plain option, it didn't work.
The
aws s3 cpcommand uses the S3 API behind the scenes, which supports transparent decompression for a number of common compression types like gzip, bzip2, zip etc. This is intended to facilitate the downloading of compressed files without having to manually decompress them first.The
aws s3 synccommand differs from copy in that it operates on directories and tries not to be helpful by decompressing the file. I would suggest trying to sync the whole directory or target the specific file in the directory as suchaws s3 sync s3://target_bucket c:\myfolder\target --include 'stdout.txt'.