I'm fairly new to Flink and trying to understand appropriate use cases where Stream API/ Table API can be used. As part of it trying to understand
- like Stream API, does Table API has the flexibility to choose the type of state backend it can use?
- What are all the available backend for Table API, does it require any external datastores like My SQL? or any other datastore?
In short, trying to understand the work around the backend used by Table API.
Per se, Flink might not be a good long term persistent store. It is more like a processing system. You'll want to have your long term persistent state in MySQL/Kafka/Cassandra/S3/etc.
This being said, some computations require internal state bookkeeping: When you do
Some kind of integer transient state is used per word. Now your job wouldn't be safe from machine failures and restart. This is why the
state backend
exists. It can save where it was in the computation (eg Kafka offsets) as well the values the counts were at.So, to answer your questions:
It does. The internal Blink query planner uses the same code (save for different application of execution plan rules). It does make a bit more sense to me in a streaming context, or very expensive batch job (you might want to use cheap AWS Spot instances for Task Managers, and you would be robust to instance preemption). This page might help you make a choice.
The state backends are in the link I provided. Now, you might want to read from where your data currently is, and would be going post-computation. Lots of data stores are suported. With a very large grain of salt, the distinction is: some can stream data: the Datastream connectors; some cannot: the Table / SQL connectors. For example: A MySQL JDBC Datastream connector will only have a sink, while it could be both a sink and a source in the Table API.
As a side note: the state backend is indeed queryable; but IMHO is better suited for debugging purposes.