I have data on CDH hdfs and I want to move it to Amazon S3 bucket, so I can run the code on AWS EMR instead of CDH. How can I move it securely and fast?
Can I do it with s3a command or any other efficient way to do it?
I have data on CDH hdfs and I want to move it to Amazon S3 bucket, so I can run the code on AWS EMR instead of CDH. How can I move it securely and fast?
Can I do it with s3a command or any other efficient way to do it?
Copyright © 2021 Jogjafile Inc.
I use hdfs distcp to copy data from S3 to hdfs. It also supports vice versa so should work in your case as well. Since it uses map reduce internally and does parallel processing its pretty fast. I created a script for running this command for an array of dates and then run it using nohup in background mode. Syntax of command is :