Is it possible to write directly to final file with distcp?

432 Views Asked by At

I'm trying to upload file to s3a using distcp.

distcp writes to temporary file first and than renames it to proper filename.

But user does not allow for update/delete. So I have file with proper size, wrong name.

-rw-rw-rw-   1       3738 2021-05-24 12:04 s3a://testbucket/.distcp.tmp.attempt_1621587961870_0001_m_000000_0

on s3 and receive an error:

Error: java.io.IOException: File copy failed: file:///testfile.json --> s3a://testbucket/testfile.json

Is it possible to omit renaming and write directly to final filename?

1

There are 1 best solutions below

0
On

I've found it here: https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html

there is a parameter :

-direct 

Write directly to destination paths Useful for avoiding potentially very expensive temporary file rename operations when the destination is an object store

example

hadoop distcp -direct hdfs://nn1:8020/datasets/set1 s3a://bucket/datasets/set1

Unfortunately my distcp version is too old and hasn't got this feature.