How to start Flink application from last snapshot-id in DB streaming when application was stopped

167 Views Asked by At

I'm creating an AWS Flink application in Java that stream from Iceberg and wondering if Flink has mechanism that providing possibility of restarting stream from last snapshot-id that was successfully processed, if the whole application is down. Should I sink snapshot-id to database or there is a better solution for that ?

Expected scenario:

  • Flink stream processing messages from Iceberg table (application is scalable and can run in many processes) and store results in Kinesis and another IcebergTable
  • application has died unexpectedly
  • I'm starting application again
  • it continues to processing messages without any data loss and without any manual interference

I don't use dedicated IcebergSource. Perhaps this implementation could solve it somehow. Right now I'm using FlinkSource. Both of source implementations has method for set snapshot-id, but it have to be set manually and store somewhere during processing. Is there any way to avoid it and use internal Flink mechanism ?

1

There are 1 best solutions below

3
Martijn Visser On

Flink has a snapshotting mechanism (checkpointing and savepointing). It's used to write Flink's state to durable storage, for recovery purposes in case of an error or in situations where you want to perform an upgrade (of either your Flink cluster, or the business logic in your application). See https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/checkpoints/ for more details