I have a cluster with many nodes. Each node has 32 cores divided into 4 numa nodes. The cluster is managed by slurm, which means I have to use srun or sbatch to run applications. Because there are 4 numa nodes per node, so I run a scientific application with 4 MPI processes per node:
srun -N <num-nodes> -n <4 * num-nodes> -c 8
To improve performance, I want to use numactl to bind MPI processes to numa nodes. If I were using mpiexec, I would do something like this:
mpiexec -np 1 numactl --cpubind=0 --membind=0 <app> :
-np 1 numactl --cpubind=1 --membind=1 <app> :
-np 1 numactl --cpubind=2 --membind=2 <app> :
-np 1 numactl --cpubind=3 --membind=3 <app>
But what should I do since I have to use slurm ?