Suppose I have 2 Python functions: funcA
and funcB
. funcA
takes a large_data
as input which is generated in funcB
. And funcA
is called within funcB
many times. I would like to use joblib.Parallel
to speed up the calculation. My question is if I use partial
as follows, would this large_data
be copied to the different nodes? If yes, can you elaborate on it? If not, what is the proper way of handling this large_data
?
def funcA(large_data, key):
...
return val
def funcB(list_keys, param)
large_data = some_function(param)
tmp_func = partial(funcA, large_data=large_data)
list_res = Parallel(n_jobs)(delayed(tmp_func)(key) for key in list_keys)
return list_res