I had some problems in running a s3-dist-cp" command in my pyspark script as I needed some data movement from s3 to hdfs for performance enhancement. so here I am sharing this.
How do I run "s3-dist-cp" command inside pyspark shell / pyspark script in EMR 5.x
2.3k Views Asked by braj At
2
Note : - please make sure that you give the full path of s3-dist-cp like (/usr/bin/s3-dist-cp)
also, I think we can use subprocess.