Slurm script for parallel execution of independent tasks not working

307 Views Asked by At

I am having a problem with the Slurm script as shown below:

#!/bin/bash
#
#SBATCH --job-name=parReconstructPar        # Job name
#SBATCH --output=log.parReconstructPar      # Standard output and error log
#SBATCH --partition=orbit                   # define the partition
#SBATCH -n 32
#

srun --exclusive -n1 reconstructPar -allRegions -time 0.0:0.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.35:0.65 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.7:1.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.05:1.35 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.4:1.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.75:2.05 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.1:2.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.45:2.75 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.8:3.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.15:3.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.45:3.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.75:4.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.05:4.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.35:4.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.65:4.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.95:5.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.25:5.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.55:5.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.85:6.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.15:6.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.45:6.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.75:7.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.05:7.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.35:7.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.65:7.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.95:8.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.25:8.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.55:8.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.85:9.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.15:9.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.45:9.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.75:10.0 &

The script is supposed to submit several tasks that are independent from each other and should run in parallel. However, when submitting the job to the scheduler, the tasks aren't launched and the job is removed immediately. The log file does not show a single entry.

If someone could tell me, what is wrong with this, that would be very appreciated.

Best regards

I tried running the script without --exclusive and also with explicit memory allocation.

1

There are 1 best solutions below

0
On BEST ANSWER

You are missing the command wait at the end of the submission script. Without wait to wait for all the backgrounded processes to complete, the script will exit straight away as you have seen.

i.e. Your script should be:

#!/bin/bash
#
#SBATCH --job-name=parReconstructPar        # Job name
#SBATCH --output=log.parReconstructPar      # Standard output and error log
#SBATCH --partition=orbit                   # define the partition
#SBATCH -n 32
#

srun --exclusive -n1 reconstructPar -allRegions -time 0.0:0.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.35:0.65 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.7:1.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.05:1.35 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.4:1.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.75:2.05 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.1:2.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.45:2.75 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.8:3.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.15:3.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.45:3.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.75:4.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.05:4.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.35:4.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.65:4.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.95:5.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.25:5.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.55:5.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.85:6.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.15:6.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.45:6.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.75:7.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.05:7.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.35:7.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.65:7.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.95:8.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.25:8.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.55:8.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.85:9.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.15:9.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.45:9.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.75:10.0 &


wait