I'd like to copy some files from emr-hdfs to s3 bucket using s3-dist-cp, I've tried this cmd from "EMR Master Node":
s3-dist-cp -Dmapred.job.name=my_copy_job --src hdfs:///user/hadoop/abc s3://my_bucket/my_key/
this command executes fine but when I check the job name in yarn resource manager UI, it displays like this:
S3DistCp hdfs:///user/hadoop/abc **->** s3://my_bucket/my_key/
whereas, the expected job name should have been my_copy_job
Appreciate for any help,!
Note:
when I run hadoop distcp with this option -Dmapred.job.name=my_copy_job, it displays job name correctly in yarn RM UI, but the job eventually fails
s3-dist-cpdoes not support-Dstyle properties set during the runtime ashadoop distcpdoes. S3 Distcp accepts only a finite set of options as listed here. In addition to these options defined byS3DistCp, it accepts the Tool Interface's generic options.But
JobNameis not one of them.JobNameis hardcoded in the S3DistCp code and cannot be overriden.