Pytorch Djl java loading exception

698 Views Asked by At

I am running a Flink job in standalone deployment mode that uses Java djl to load a pytorch model. The model gets successfully loaded and I am able to cancel the job through Flink Rest API. However, when I try to launch the flink job once again, it throws,

UnsatisfiedLink Error:<pytorch>.so already loaded in another classloader

It requires a standalone deployment restart to load again. Is it possible to close the process along with the close job request so that I can load again without restarting?

1

There are 1 best solutions below

6
Frank Liu On

The native library can only be loaded once per JVM. In DJL, the pytorch native library will be loaded when Engine class is initialized, if the native library has been loaded already in another classloader, the engine class will failed to initialize.

One of the workaround is to load the native library in system ClassLoader that can be shared by child classloaders. DJL allows you to inject a NativeHelper class to load the native library, you need to make sure your NativeHelper is in the system classpath:

System.setProperty("ai.djl.pytorch.native_helper", "org.examples.MyNativeHelper");

You can find the test code for NativeHelper here

See this link for more detail

In your MyNativeHelper class, you only need to add the following:

    public static void load(String path) {
        System.load(path);
    }

At runtime DJL will invoke your load(String path) function to load native library in your ClassLoader.