Executing a long running external process in parallel

184 Views Asked by At

In python, I'm writing a script that runs an external process. This external process does the following steps:

  1. Fetch a value from a config file, taking into account other running processes.
  2. Run another process, using the value from step 1.

Step 1 can be bypassed by passing in a value to use. Trying to use the same value concurrently is an error, but using it sequentially is valid. (think of it as a pool of pids, with no more than 10 available) Other processes (e.g. a user logging in) can use one of these "pids".

The external process takes a few hours to run, and multiple independent copies must be run. Running them sequentially works, but takes too long.

I'm changing the script to run these processes concurrently using the multiprocessing module. A simplified version of my code is:

from multiprocessing import Pool
import subprocess

def longRunningTask(n):
    subprocess.call(["ls", "-l"]) # real code uses a process with no screen I/O

if __name__ == '__main__':
    myArray = [1,2,3,4,5]
    pool = Pool(processes=3)
    pool.map(longRunningTask, myArray)

Using this code fails, because it uses the same "pid" for every process started.

The solutions I've come up with are:

  1. If the call fails, have a random delay and try again. This could end up busy waiting for hours if enough "pids" are in use.
  2. Create a Queue of the available "pids", get() an item from it before starting the process, and put() it when it completes. This would still need to wait if the "pid" was in use, the same as number 1.
  3. Use a Manager to hold an array of "pids" that are in use (starting empty). Before starting the process, get a "pid", check if it's in the array (start again if it is), add it to the array, remove it when done.

Are there problems with approach 3, or is there a different way to do it?

0

There are 0 best solutions below