Zookeeper timeout when upgrade flink 1.14 to 1.18

27 Views Asked by At

Im upgrading flink from 1.14 to 1.18

I have flink HA zookeper in flink-conf.yaml

flink-conf.yaml

high-availability.type: zookeeper
high-availability.storageDir: file:///opt/flink/state_dir/ha/
high-availability.zookeeper.quorum: zookeeper1:2181
high-availability.zookeeper.path.root: /flink_ns1
high-availability.cluster-id: /default_ns1
high-availability.jobmanager.port: 6123
high-availability.zookeeper.client.connection-timeout: 30000
high-availability.zookeeper.client.max-retry-attempts: 10
high-availability.zookeeper.client.retry-wait: 30000

in flink 1.14 i have no issue with zookeeper but at flink 1.18 i get this error

did i need update setting in zookeeper?

find answer about zookeeper timeout

error detail in Jobmanager

1

There are 1 best solutions below

0
Rion Williams On

If I read your stack trace correctly, it looks like the running job is referencing Zookeeper 3-3.8.3 which may be causing an issue as major changes to Flink versioning can often drop support for older versions of dependencies like Zookeeper, requiring you to update those as well to ensure they are supported.

Flink 1.15 explicitly mentions dropping support for older versions of Zookeeper in their release notes:

Support for using Zookeeper 3.4 for HA has been dropped. Users relying on Zookeeper need to upgrade to 3.5/3.6. By default Flink now uses a Zookeeper 3.5 client.

The Flink 1.17 release notes also mention how Zookeeper itself was bundled within the distribution itself, which may not be related to your current issues, but is worth mentioning since you are jumping up by a few versions:

The Flink distribution no longer bundles 2 different Zookeeper client jars (one in lib, one in lib/opt respectively). Instead, only 1 client will be bundled within the flink-dist jar. This has no effect on the supported Zookeeper server versions.

I suspect you'll likely want to look into upgrading Zookeeper to ensure it's running on a version compatible with the version of Flink you are targeting. If that still continues to fail, you may want to consider posting more detail on your specific set-up/deployment/versions.