Slurm not launching multiple tasks within multiple nodes

56 Views Asked by At

I have to run several tasks across multiple nodes using slurm. Example: I have 120 tasks, to run in 3 nodes, each capable of running 32 tasks at a time. I created a list with the input files. I would like to start as much tasks as possible (96, in this example) and, as tasks finish, start new ones until all is done. Problem is, when I do that, only one task starts (script below).

#!/bin/bash

#SBATCH --nodes=3
#SBATCH --ntasks=96
#SBATCH --ntasks-per-node=32

files=(*.csv)

for f in ${files[@]::96}; do
    echo -e "Running $f in `hostname`"
    mpirun --bind-to hwthread --map-by numa -np 1 --output-filename log/${f%.*} mpiproc $f 2>&1 &
done

for f in ${files[@]:97:}; do 
    wait -n 
    mpirun --bind-to hwthread --map-by numa -np 1 --output-filename log/${f%.*} mpiproc $f 2>&1 &
done

wait

However, if I use a single node, as below, it works...

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=32
#SBATCH --cputs-per-task=1

files=(*.csv)

for f in ${files[@]::32}; do
    echo -e "Running $f in `hostname`"
    mpirun --bind-to hwthread --map-by numa -np 1 --output-filename log/${f%.*} mpiproc $f 2>&1 &
done

for f in ${files[@]:33:}; do 
    wait -n 
    mpirun --bind-to hwthread --map-by numa -np 1 --output-filename log/${f%.*} mpiproc $f 2>&1 &
done

wait

I also tried to place the option '#SBATCH --cputs-per-task=1' when using 3 nodes, with no success.

Any ideas on why this happens and how to solve the issue?

Ps.: I already tried job arrays, no luck with that.

I'll gladly provide more info on the problem if needed.

0

There are 0 best solutions below