Setting the constraint in slurm job script for GPU compute capability

633 Views Asked by At

I am trying to set a constraint so that my job would only run on GPUs with compute capability higher (or equal) to 7.

Here is my script named torch_gpu_sanity_venv385-11.slurm:

#!/bin/bash
#SBATCH --partition=gpu-L --gres=gpu:1 --constraint="cc7.0" 
# -------------------------> ask for 1 GPU
d=$(date)
h=$(hostname)
echo $d $h env         # show CUDA related Env vars 
env|grep -i cuda
# nvidia-smi
#          actual work
/research/jalal/slurm/fashion/fashion_compatibility/torch_gpu_sanity_venv385-11.bash 

Without using --constraint="cc7.0" my script runs correctly. I even used another version that is --constraint=cc7.0 but in either case I get the following error:

[jalal@goku fashion_compatibility]$ sbatch torch_gpu_sanity_venv385-11.slurm 
sbatch: error: Batch job submission failed: Invalid feature specification

When I remove the --constraint="cc7.0" term, I am able to run the job. after removing the constraint term:

[jalal@goku fashion_compatibility]$ sbatch torch_gpu_sanity_venv385-11.slurm 
Submitted batch job 28398

So, how can I set the constraint so that I am only assigned GPUs with compute capability of 7 or higher?

I followed this tutorial for constraint setting.

0

There are 0 best solutions below