When using the hudi-flink-bundle.jar
, our Flink SQL jobs are unable to load the s3-fs-hadoop
plugin.
Details
We are using Flink 1.17 with Hudi 0.13.1 on top of S3. Following the Hudi documentation, we created our own Flink Docker image and added the hudi-flink-bundle.jar
to the flink/lib
directory. We also created a plugin folder for the s3-fs-hadoop
plugin and copied the plugin jar from the flink/opt
directory.
The Flink job jars do not contain the hudi-flink-bundle
or the s3-fs-hadoop
libraries. When running Flink jobs, we get this exception:
java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.s3a.S3AFileSystem not found
If we copy the s3-fs-hadoop
plugin jar to the flink/lib
folder then everything works, but the plugin jar contains a lot of libraries that conflict with versions we are using in our job jar.
I've read the Flink debugging classloading docs, but it doesn't explain how or if plugin jar dependencies can be loaded in the application classloader.