I'm building a custom output format for hadoop and was wondering if there is a way in the output format to know when all reducers (RecordWriters) are complete ?
In order to know that one RecordWriter completed, the close method of RecordWriter can be used, but what about executing some cleanup when all of the RecordWriters complete ?
You can use the driver itself to do the final clean up instead of relying on the
OutputFormat
. I doubt if it really provides such a feature(api). Thefinalize
method may be the last resort, but not advisable at all.The
waitForCompletion
method ofJob
returns only after the jobs finishes. So simply do it as :If your clean up is irrelevant to the job's success/failure, just remove the
if-else
part. And if you really need a method in yourOutputFormat
class to do the deletion, make itstatic
. eg :I hope this should suffice your need.