Erlang NIF crashes on unavailable resources when calling enif_thread_create without enif_thread_join

57 Views Asked by At

My project has a NIF that is called from the main Erlang app. It needs to calculate and train an NN model using OpenNN, so it is called with a new Erlang NIF thread:
Code in project

C code excerpt:

.....
res = enif_thread_create((char*)"train_nif_proc", &(TrainNNptr->tid), trainFun, (void*) pTrainNNptr, 0);

if (res)
{
    LogError("failed to call enif_thread_create with trainFun");
    nifpp::str_atom ret_status("train_error");
    return nifpp::make(env, ret_status);
}
.....

This piece of code crashes on Raspbian Bullseye with std::system_error: Resource temporarily unavailable, and it was fixed by adding:

else
    {
        res = enif_thread_join(TrainNNptr->tid, exit_code );
        if (res)
        {
            LogError("failed to join with trainFun");
            nifpp::str_atom ret_status("train_error");
            return nifpp::make(env, ret_status);
        }
    }

According the documentation on nif thread create: "The driver creating the thread is responsible for joining the thread, through erl_drv_thread_join, before the driver is unloaded".

My question is, when does the NIF become unloaded?
The NIF is loaded on startup of the app and is never unloaded in code at any stage. Why then do I have to use join if the NIF is still loaded?
How does the erlang scheduler treat the process that calls a NIF method when it's blocked on erl_drv_thread_join?
There's supposed to be a limit on how long does it take for a NIF function to respond.

0

There are 0 best solutions below