Joblib.parallel vs concurrent.futures

32 Views Asked by At

I'm parallelizing the processing of 1000 columns of a pandas dataframe using joblib.parallel and concurrent.futures.

In the first case I'm just setting n_jobs=-1 while, with concurrent, I'm splitting the columns into 12 batches (my machine has 12 cores) and making each core process a batch of columns in a for loop.

I have two questions:

  1. Using joblib, why I see 8 python processes running and not 1000?

  2. Why joblib is faster than using concurrent with batches? Looking around, I read that batch processing is often better than spawning a process for each column.

0

There are 0 best solutions below