Is there a way to parallelize a function inside a pool in Python?

270 Views Asked by At

I am trying to parallelize a function inside a pool using the multiprocessing module, but I run into the error:

daemonic processes are not allowed to have children

More specifically, I am using emcee module which makes use of the multiprocessing module for parallelization, and I would like to parallelize my posterior function as well to speed up the calculations. Is there a way to parallelize a function inside the main Pool in this case?

Edit (code added):

# defines the log-posterior probability distribution
def logp(p):

    mu_imf, mu_h0, sigma_h, b_h = p
    
    logpost = np.zeros(ngal, dtype='longdouble')
    
    for i in range(ngal):
        logpost[i] = np.log(quad(lambda m_halo: Integrand(m_halo, i, mu_imf, mu_h0, sigma_h, b_h), 11.9, 15.0107)[0])
        
    return np.sum(logpost) 

with Pool() as pool:
    sampler = emcee.EnsembleSampler(nwalkers, npars, logp, pool=pool)
1

There are 1 best solutions below

1
On

In general you don't want to do that. Parallelising parallel processes can explode very quickly. It also is likely not much quicker, as there is an overhead in spinning up a new process.

Rather than trying to parallelise everything at once, parallelise in two steps. First generate all the emcee tasks you are interested in, and parallelise your processing, as in the docs. Then aggregate your results to a queue, and map your posterior function over that.

I.e., rather than having your current setup, parallelising this:

def do_work(**params):
    some_results = do_emcee_call_in_parallel(**params)
    do_stuff_with_some_results(some_results)

Where you've split your problem into two chunks, do this:

problems = get_all_problems()
results = parallel_solve_emcee_problems(problems)
pool = Pool()
pool.map(posterior, results) # etc

Alternatively, if I've misunderstood what you're trying to do, feel free to show what your desired inputs and outputs are.