Making ES ForEachWriter sink idempotent with structured streaming in spark

539 Views Asked by At

I am experiencing the same situation as described in Spark structured steaming from kafka - last message processed again after resume from checkpoint. When I restart my spark job after a failure the last message gets processed again. One of the answers suggest that the sink has to be idempotent. I am not sure I understand this well.

Right now I write to ES sink and the 3 methods are implemented as follows:

  1. open method returns true
  2. process method does Http post to ES
  3. close method closes the connection

I would like to know how to make the ES sink idempotent and also how to use the 2 parameters partitionId and version in the open method to return false if the data has already been processed.

Thanks in advance.

0

There are 0 best solutions below