Suppose I have 2 Python functions: funcA and funcB. funcA takes a large_data as input which is generated in funcB. And funcA is called within funcB many times. I would like to use joblib.Parallel to speed up the calculation. My question is if I use partial as follows, would this large_data be copied to the different nodes? If yes, can you elaborate on it? If not, what is the proper way of handling this large_data?
def funcA(large_data, key):
...
return val
def funcB(list_keys, param)
large_data = some_function(param)
tmp_func = partial(funcA, large_data=large_data)
list_res = Parallel(n_jobs)(delayed(tmp_func)(key) for key in list_keys)
return list_res