EMR creates 0 byte files while using HDFS's moveFromLocalFile API

509 Views Asked by At

I'm using EMR to move a folder from local file system to S3 in Spark using fs.moveFromLocalFile API. Everything works fine except a 0-byte file created by EMRFS with name _$folder$ for EVERY folder that is uploaded.

Is there any way to move folders without this dummy file creation for every folder? (other than manually deleting this file). Also, why is this dummy file created? I'm currently using s3:// protocol recommended by EMR team.

2

There are 2 best solutions below

0
On

Don't know about EMR fs; this sounds like the same extension used by The S3n client. These files are stripped in the client when listing/stat-ing paths.

ASF's S3a creates one with a "/" suffix.

1
On

My experience is that the mkdir() function usually called for local file systems or hdfs will result in an s3 empty file being created with the name of the mkdir folder and appended by _$folder$. In S3, there is no concept of an "empty folder" because you cannot have a key (pathname) with a null value (the file).

In a perfect world, mkdir(s3://bucket/path) should be a noop.