How to start Flink application from last snapshot-id in DB streaming when application was stopped

167 Views Asked by Netrunner At 26 June 2023 at 15:25

I'm creating an AWS Flink application in Java that stream from Iceberg and wondering if Flink has mechanism that providing possibility of restarting stream from last snapshot-id that was successfully processed, if the whole application is down. Should I sink snapshot-id to database or there is a better solution for that ?

Expected scenario:

Flink stream processing messages from Iceberg table (application is scalable and can run in many processes) and store results in Kinesis and another IcebergTable
application has died unexpectedly
I'm starting application again
it continues to processing messages without any data loss and without any manual interference

I don't use dedicated IcebergSource. Perhaps this implementation could solve it somehow. Right now I'm using FlinkSource. Both of source implementations has method for set snapshot-id, but it have to be set manually and store somewhere during processing. Is there any way to avoid it and use internal Flink mechanism ?

Original Q&A

There are 1 best solutions below

Martijn Visser On 29 June 2023 at 09:20

Flink has a snapshotting mechanism (checkpointing and savepointing). It's used to write Flink's state to durable storage, for recovery purposes in case of an error or in situations where you want to perform an upgrade (of either your Flink cluster, or the business logic in your application). See https://nightlies.apache.org/flink/flink-docs-stable/docs/ops/state/checkpoints/ for more details

How to start Flink application from last snapshot-id in DB streaming when application was stopped

There are 1 best solutions below

Related Questions in JAVA

Related Questions in APACHE-FLINK

Related Questions in FLINK-STREAMING

Related Questions in FAULT-TOLERANCE

Related Questions in APACHE-ICEBERG

Trending Questions

Popular # Hahtags

Popular Questions