I have noticed that sometimes s3-dist-cp takes much longer than usual due to a "slow node" issue. In case of spark I have enabled speculative execution which works fine. Howerver, when it comes to s3-dist-cp I would like to understand possible impact first.
In case of regular dist-cp I found that (link: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html#MapReduce_and_other_side-effects):
If mapreduce.map.speculative is set set final and true, the result of the copy is undefined.
I'm aware that s3-dist-cp is a completely separate job, but I wonder if there any caveats. I wasn't able to find any related documentation.
Thanks for any suggestions!