How to merge the small files on S3 generated by EMR with thousands of reducers

1.3k Views Asked by At

My cascalog EMR job generated thousands of small files on S3 buckets. It generate the same number of files as the number of reducers I used. Dumping all these tiny files take minutes. I wonder if there is a way to concat them on S3 so that I can dump them quickly?

Thanks

Kang

1

There are 1 best solutions below

0
On