I am trying to transfer all the part* files from a directory directly from HDFS dir to sftp server. All the files in hdfs folder is pretty huge, so I do not want to copy them to local file system.
The current setup is
hdfs dfs -text "<HDFS_DIR>/part*" > localfile
curl "<sftp_username>:" --key "<private_key_file_path>" --pubkey "<public_key_file_path>" \
--upload-file local_file "sftp://<SFTP_HOST>/<Upload_dir>"
How can I upload the files directly from HDFS to sftp server path without writing the file to local filesystem.
I considered the following options
- scoop with sftp (Did not find enough resources) - https://sqoop.apache.org/docs/1.99.7/user/connectors/Connector-SFTP.html
- Copy each
partfile to local fs and move it to sftp server (inefficient) - hadoop distcp with sftp doesn't work in cdh5. I am using
CDH-5.16.2
Please let me know which is the best way to accomplish this. Thanks!
maybe you can pipe hdfs's output directly to curl for upload, by using
--upload-file .or--upload-file -, egabout the difference between
.and-the docs sayswhich sounds to me like curl may attempt to put the whole file in ram, or at least in a stdin buffer, before starting the upload, so
.sounds safer than-if you expect to deal with large files..