I am running a Flink job in standalone deployment mode that uses Java djl to load a pytorch model. The model gets successfully loaded and I am able to cancel the job through Flink Rest API. However, when I try to launch the flink job once again, it throws,
UnsatisfiedLink Error:<pytorch>.so already loaded in another classloader
It requires a standalone deployment restart to load again. Is it possible to close the process along with the close job request so that I can load again without restarting?
The native library can only be loaded once per JVM. In DJL, the pytorch native library will be loaded when
Engineclass is initialized, if the native library has been loaded already in another classloader, the engine class will failed to initialize.One of the workaround is to load the native library in system
ClassLoaderthat can be shared by child classloaders. DJL allows you to inject aNativeHelperclass to load the native library, you need to make sure yourNativeHelperis in the system classpath:You can find the test code for
NativeHelperhereSee this link for more detail
In your
MyNativeHelperclass, you only need to add the following:At runtime DJL will invoke your
load(String path)function to load native library in yourClassLoader.