Hadoop Custom Output format, when do all reducers end?

409 Views Asked by At

I'm building a custom output format for hadoop and was wondering if there is a way in the output format to know when all reducers (RecordWriters) are complete ?

In order to know that one RecordWriter completed, the close method of RecordWriter can be used, but what about executing some cleanup when all of the RecordWriters complete ?

1

There are 1 best solutions below

1
On

You can use the driver itself to do the final clean up instead of relying on the OutputFormat. I doubt if it really provides such a feature(api). The finalize method may be the last resort, but not advisable at all.

The waitForCompletion method of Job returns only after the jobs finishes. So simply do it as :

boolean status = job.waitForCompletion(true); 
if(status){
     // clean up required for successful jobs
} else {
     // clean up required for failed jobs
}

If your clean up is irrelevant to the job's success/failure, just remove the if-else part. And if you really need a method in your OutputFormat class to do the deletion, make it static. eg :

job.waitForCompletion(true);
CustomOutputFormat.cleanUp();

I hope this should suffice your need.