Python code takes longer to run with MPI (SLURM) than as a single process

1.1k Views Asked by At

I have some python code which takes approximately 12 hours to run on my laptop (MacOS 16GB 2133 MHz LPDDR3). The code is looping over a few thousand iterations and doing some intensive processing at each step so it makes sense to parallelise the problem with MPI processing. I have access to a slurm cluster, where I have built mpi4py (for python 2.7) against their OpenMPI implementation with mpicc. I then submit the following submission script with sbatch --exclusive mysub.sbatch:

#!/bin/bash
#SBATCH -p par-multi
#SBATCH -n 50
#SBATCH --mem-per-cpu=8000
#SBATCH -t 48:00:00
#SBATCH -o %j.log
#SBATCH -e %j.err

module add eb/OpenMPI/gcc/3.1.1

mpirun python ./myscript.py

which should split the tasks across 50 processors, each of which with an 8GB memory allocation. My code does something like the following:

import numpy as np
import pickle
import mpi4py

COMM = MPI.COMM_WORLD

def split(container, count):
    return [container[_i::count] for _i in range(count)]
    
def read():
    #function which reads a series of pickle files from my home directory
    return data
    
def function1():
    #some process 1
    return f1

def function2():
    #some process 2
    return f2

def main_function(inputs):
    #some process which also calls function1 and function2
    f1 = function1(inputs)
    f2 = function2(f1)
    result = #some more processing
    return result
    
### define global variables and read data ###
data = read()
N = 5000
#etc...

selected_variables = range(N)

if COMM.rank == 0:
    splitted_jobs = split(selected_variables, COMM.size)
else:
    splitted_jobs = None

scattered_jobs = COMM.scatter(splitted_jobs, root=0)

results = []
for index in scattered_jobs:
    outputs = main_function(data[index])
    results.append(outputs)
results = COMM.gather(results, root=0)
        
if COMM.rank == 0:
    all_results = []
    for r in results:
        all_results.append(r)
        
    f = open('result.pkl','wb')
    pickle.dump(np.array(all_results),f,protocol=2)
    f.close()

The maximum run time I can allocate for my job is 48 hours, at which point the job has not even finished running. Could anyone tell me if there is something in either my submission script or my code which is likely causing this to be very slow?

Thanks

0

There are 0 best solutions below