I have a Python module that uses multiprocessing. I'm executing this module from another script with runpy. However, this results in (1) the module running twice, and (2) the multiprocessing jobs never finish (the script just hangs).
In my minimal working example, I have a script runpy_test.py:
import runpy
runpy.run_module('module_test')
and a directory module_test containing an empty __init__.py and a __main__.py:
from multiprocessing import Pool
print 'start'
def f(x):
return x*x
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'
When I run runpy_test.py, I get:
start
start
and the script hangs.
If I remove the pool.map call (or if I run __main__.py directly, including the pool.map call), I get:
start
done
I'm running this on Scientific Linux 7.6 in Python 2.7.5.
Rewrite your
__main__.pylike so:And then write an
implementation.py(you can call this whatever you want) in which your function is defined:Otherwise you will have the same problem with most interfaces in multiprocessing, and independently of using runpy. As @Weeble explained, when
Pool.maptries to load the functionfin each sub-process it will import<your_package>.__main__where your function is defined, but since you have executable code at module-level in__main__it will be re-executed by the sub-process.Aside from this technical reason, this is also better design in terms of separation of concerns and testing. Now you can easily import and call (including for test purposes) the function
fwithout running it in parallel.