I need to apply a function on df, I used a pandarallel to parallelize the process, however, I have an issue here, I need to give func_do an N rows each call so that I can utilize a vectorization on that function.
The following will call func_do on each row. Any idea how to make a single call for each batch and keep the parallelization process.
def fun_do(value_col):
return do(value_col)
df['processed_col'] = df.parallel_apply(lambda row: fun_do(row['col']), axis=1)
A possible solution is to create virtual groups of N rows:
Output:
Note: I don't know why
groupby(...)['col'].parallel_apply(fun_do)doesn't work. It seemsparallel_applyis not available withSeriesGroupBy.This is the first time I use
pandarallel, usually I usedmultiprocessingmodule