Using Python Multiprocessing Module to Download Models from the BioModels Database

73 Views Asked by At

I'm trying to use python multiprocessing module to speed up some computations. The first step is acquiring a number of models from the BioModels database. There is an API for this called BioServices which can be downloaded with pip install bioservices. I've managed to do this in serial but this takes time and would benefit from paralleling.

bio=bioservices.BioModels() #initialize the class for downloading models
m=bio.getAllCuratedModelsId() #assign model ID's to the m (a python list)
def f(ID):
    dct={}
    name=bio.getModelNameById(ID)#retrieve the model name for the result dict key
    print 'running {}'.format(name) #print some information so you can see the program working
    dct[name]=bio.getModelSBMLById(ID) #get the model and assign as value in dct
    time.sleep(0.5) #stop the program for a bit to prevent bombarding the services and being cut out
    return dct
model_dct={}
P=multiprocessing.Pool(8)
for i in m:
    model_dct.update(P.map(f,i)) # parallelize

print time.time()-start+'seconds'

At present this is just initializing the bio class and crashing (or doing nothing at least). Could anybody suggest how to fix my code?

Thanks

1

There are 1 best solutions below

0
On

Pool.map is meant to apply a function to all items in an iterable, so you should say:

for i in m:
    ...P.map(f,i)

but rather just

P.map(f, m)

This will give you a list of dicts, one for each ID.