Python multiprocess pool.map takes too long for initialization

54 Views Asked by At
%%time
import pandas as pd
from datetime import datetime
from multiprocessing import Pool


def get_id(x):
    return x['motor_id']

def parallel_load(df, get_id):
    print('parallel_load: ', datetime.now())
    return df.apply(lambda x: get_id(x), axis=1)

def parallel_df(df, func, n_cores):
    print('start: ', datetime.now())
    df_split = np.array_split(df, n_cores)
    print('splitted: ', datetime.now())
    pool = Pool(n_cores)
#     df = pd.concat(pool.map(func, df_split))
    s = datetime.now()
    print('pool start: {}'.format(s))
    data = pool.map(func, df_split)
    print('pool end: {}, {}s'.format(datetime.now(), (datetime.now() - s).seconds))
    pool.close()
    pool.join()
    print('pool join: {}s'.format((datetime.now() - s).seconds))
    
    return data

new_func = partial(
    parallel_load, get_id=get_id
)

ndf = pd.DataFrame({'motor_id': np.arange(10000)})
parallel_df(ndf, new_func, 4)

With the sample code above, I get this result

start:  2024-01-02 16:58:09.390751
splitted:  2024-01-02 16:58:09.391897
parallel_load:  parallel_load: parallel_load: parallel_load: parallel_load:  parallel_load: 2024-01-02 16:58:12.453533parallel_load:  parallel_load: 2024-01-02 16:58:12.456901parallel_load: parallel_load:   

2024-01-02 16:58:12.456053  2024-01-02 16:58:12.4578302024-01-02 16:58:12.457674   2024-01-02 16:58:12.457189

2024-01-02 16:58:12.457663

2024-01-02 16:58:12.4583492024-01-02 16:58:12.458657
2024-01-02 16:58:12.457537


pool start: 2024-01-02 16:58:12.448087
pool end: 2024-01-02 16:58:12.504469, 0s
pool join: 0s
CPU times: user 52.5 ms, sys: 3.48 s, total: 3.53 s
Wall time: 3.39 s

As you can see, it seems like it takes about 3 seconds to start the actual code for each process after pool.map was calling.

check the time between "splitted" and "parallel_load"

Is there a way to shorten this time? It seems like pool.map is not good for using small works like this; since getting to start takes too much time compare to the actual processing time.

0

There are 0 best solutions below