Using multiprocessing with runpy

Question

Using multiprocessing with runpy

436 Views Asked by user2870171 At 08 October 2019 at 14:54

I have a Python module that uses multiprocessing. I'm executing this module from another script with runpy. However, this results in (1) the module running twice, and (2) the multiprocessing jobs never finish (the script just hangs).

In my minimal working example, I have a script runpy_test.py:

import runpy
runpy.run_module('module_test')

and a directory module_test containing an empty __init__.py and a __main__.py:

from multiprocessing import Pool

print 'start'
def f(x):
    return x*x
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

When I run runpy_test.py, I get:

start
start

and the script hangs.

If I remove the pool.map call (or if I run __main__.py directly, including the pool.map call), I get:

start
done

I'm running this on Scientific Linux 7.6 in Python 2.7.5.

Original Q&A

There are 3 best solutions below

Weeble On 08 October 2019 at 15:06

Try defining your function f in a separate module. It needs to be serialised to be passed to the pool processes, and then those processes need to recreate it, by importing the module it occurs in. However, the __main__.py file it occurs in isn't a module, or at least, not a well-behaved one. Attempting to import it would result in the creation of another Pool and another invocation of map, which seems like a recipe for disaster.

user2870171 On 08 October 2019 at 15:55

Although not the "right" way to do it, one solution that ended up working for me was to use runpy's _run_module_as_main instead of run_module. This was ideal for me since I was working with someone else's code and required the fewest changes.

**Iguananaut** · Accepted Answer · 2019-10-08T15:15:13.967000

Rewrite your __main__.py like so:

from multiprocessing import Pool
from .implementation import f

print 'start'
pool = Pool()
result = pool.map(f, [1,2,3])
print 'done'

And then write an implementation.py (you can call this whatever you want) in which your function is defined:

def f(x):
    return x*x

Otherwise you will have the same problem with most interfaces in multiprocessing, and independently of using runpy. As @Weeble explained, when Pool.map tries to load the function f in each sub-process it will import <your_package>.__main__ where your function is defined, but since you have executable code at module-level in __main__ it will be re-executed by the sub-process.

Aside from this technical reason, this is also better design in terms of separation of concerns and testing. Now you can easily import and call (including for test purposes) the function f without running it in parallel.

Using multiprocessing with runpy

There are 3 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-MULTIPROCESSING

Related Questions in RUNPY

Trending Questions

Popular # Hahtags

Popular Questions