I want to configure filesystem state backend and zookeeper recovery mode:
state.backend: filesystem
state.backend.fs.checkpointdir: ???
recovery.mode: zookeeper
recovery.zookeeper.storageDir: ???
As you can see I should specify checkpointdir and storageDir parameters, but I don't have any file systems supported by Apache Flink (like HDFS or Amazon S3). But I have installed Riak CS cluster (seems like it compatible with S3).
So, can I use Riak CS together with Apache Flink? If it is possible: how to configure Apache Flink to work with Riak CS?
Answer: How to join Apache Flink and Riak CS?
Riak CS has S3 (version 2) compatible interface. So, possible to use S3 file system adapter from Hadoop to work with Riak CS.
I don't known why but Apache Flink has only part of Hadoop filesystem adapters inside fat jar (
lib/flink-dist_2.11-1.0.1.jar) i.e. it has FTP file system (org.apache.hadoop.fs.ftp.FTPFileSystem) but doesn't have S3 file system (i.e.org.apache.hadoop.fs.s3a.S3AFileSystem). So, you have 2 ways to solve this problem:<flink home>/libdirectorySo, I choose second way because don't want to provision Hadoop in my environment. You can copy JARs from Hadoop dist or internet:
As you can see I am using old versions because such version using in Hadoop 2.7.2 and I use Flink compatible with this version of Hadoop.
FYI: Such hack can cause problems if you are using latest version of these JARs in own flow. To avoid problem related to different versions you can relocate packages when you are building fat jar with flow use something like (I am using Gradle):
Then you should specify path to
core-site.xmlinflink-conf.yamlbecause Hadoop compatible file systems using this config for loading settings:As you can see I just place it to
<fink home>/confdirectory. It has the following settings:Then you should configure Riak CS buckets in
flink-conf.yamlas recommender here:and create buckets in Riak CS. I am using
s3cmd(installed overbrewin my OS X dev env):FYI: Before using
s3cmdyou should configure it uses3cmd --configureand then fix some settings in~/.s3cmdfile:So, that's all what you should configure for save/restore state of Standalone HA Apache Flink cluster in Riak CS.