Large Pandas Dataframe as global variable when parallel processing

50 Views Asked by At

I know there exists a duplicate question

But after such a long time, are there any new method to achieve the same target?


def process(id):
    temp_df = df[id]
    return temp_df.apply(another_function)

Parallel(n_jobs=-2)(delayed(process)(id) for id in df.columns)

The dataframe seems to be copied for each process, which is not possible for large dataframe. Are there any method or packages to fix this?

0

There are 0 best solutions below