I have, what I would consider, a fairly simple Flink program. Sourced from a Kafka stream, filter's applied, process function applied, flat map applied, and sent to a Redis sink. Running this locally in a stand alone environment on my dev box, there is no problem. I am trying to push this into production on AWS EMR, I followed the guide for running a Flink program on EMR. After my first test, I had a GC overhead limit exceeded
error, so I made adjustments to reduce the amount of data stored. My next try the program ran for much longer, but eventually failed, not giving any indication of a type of error like it had previously.
I am unsure how to go about debugging problems that I suspect may be a side effect of running on EMR. Most of the monitoring metrics in the EMR console are useless as far as I can tell. If it matters, I am running the program as a Step in EMR, the guide i followed is here http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-flink.html
. This program is also suppose to be an always up solution, basically it will constantly be reading from the Kafka Stream and processing the data(If that matters at all, not sure if there is a different configuration I should be using for an always up solution)
I'll be happy to provide any information needed to help me getting this into production.
Thank you