How to handle bad events in a batch job on EMR

66 Views Asked by At

I am running an EMR which processes some logs containing around 15-20M log events. Sometimes few log events contain badly formatted data that break my pipeline. I am looking for some options to drop those log events in a file or a queue. Then I can verify them, report them to the corresponding service and reprocess them maybe not in the same pipeline as the analysis would require some time to correct the logs.

What are the best options available and widely used by different companies running batch job?

0

There are 0 best solutions below