Python program freezes when multiprocessing call follows sequential call

82 Views Asked by At

I encountered a strange problem with a Python program processing images with the pyvips library. Here is a simplified version of my problem.

I try to create a numpy array (coming from zeros in my simple example) several times (2 times in the example).

When I do it sequentially, everything is fine. When I use the multiprocessing module, it also works as it should. But during my tests I noticed the program freezes when I perform these two runs rapidly one after another. Hence the program below. When I run it, it freezes before the first "step 3" in the multiprocessing part.

I think it is linked to a garbage collection mechanism but I am lost here.

I'm using python 3.11.3 and pyvips 2.2.2 (with vips-8.15.1). I use Spyder with its IPython console on a linux machine (a Ubuntu 22.04 based distribution).

import multiprocessing
import numpy as np
import pyvips

def myfunction(useless):
    print("step 1")
    image = pyvips.Image.new_from_array(np.zeros((8, 8, 3)), interpretation="rgb")
    print("step 2")
    res = np.asarray(image)
    print("step 3")
    return res

def main(nbpool):
    srcrange = range(2)
    if nbpool == 0:
        res = list()
        for srcid in srcrange:
            res.append(myfunction(srcid))
        return res
    else:
        pool = multiprocessing.Pool(nbpool)
        return pool.map(myfunction, srcrange)

if __name__ == "__main__":
    res1 = main(0)
    print("Now with multiprocessing")
    res2 = main(1)
1

There are 1 best solutions below

2
msalam On BEST ANSWER

The issue you're experiencing might be due to the fact that the multiprocessing module doesn't play well with IPython or Jupyter notebooks. This is because these environments aren't fork-safe, and multiprocessing relies on the ability to safely fork the Python interpreter.

One way to potentially solve this issue is to use the multiprocessing.set_start_method('spawn') at the beginning of your script. This will make the multiprocessing module create a new Python interpreter for each child process, which can avoid issues with forking in environments that aren't fork-safe.

Here's how you can modify your script:

import multiprocessing
import numpy as np
import pyvips

# Set the start method for multiprocessing
multiprocessing.set_start_method('spawn')

def myfunction(useless):
    print("step 1")
    image = pyvips.Image.new_from_array(np.zeros((8, 8, 3)), interpretation="rgb")
    print("step 2")
    res = np.asarray(image)
    print("step 3")
    return res

def main(nbpool):
    srcrange = range(2)
    if nbpool == 0:
        res = list()
        for srcid in srcrange:
            res.append(myfunction(srcid))
        return res
    else:
        pool = multiprocessing.Pool(nbpool)
        return pool.map(myfunction, srcrange)

if __name__ == "__main__":
    res1 = main(0)
    print("Now with multiprocessing")
    res2 = main(1)

Please note that using the 'spawn' start method can cause your program to run slower, as a new Python interpreter has to be created for each child process. However, it can help avoid issues with forking in environments like IPython or Jupyter notebooks.