Submitting Slurm job to head node via compute node?

186 Views Asked by Cody Oliveros At 08 August 2025 at 00:01

I've set up a Slurm cluster on AWS ParallelCluster for a customer who needs to be able to launch nested Slurm jobs. For example, from a login node, we need to be able to launch a single job on a compute node that can launch hundreds/thousands of jobs on separate nodes in the cluster.

If this is considered to be against best practice with Slurm job architecture, we can't simply ask our client to rewrite all of their jobs, we simply need to get to a working state with their existing jobs written the way they are.

When running srun --partition all srun --partition all echo hi, the initial job gets instantiated, but from there, the compute node that runs the root level job seems to be unable to submit jobs to the cluster.

Error message:
srun: error: Unable to create step for job 2: Job/step already completing or completed

What I think might be happening is that the first job is allocating all of the resources on the compute node it gets run on, and the compute node trying to run the second Slurm job on itself, instead of redirecting jobs back to the head node so they can be run on another node/partition. What I don't know is how to reconfigure the cluster to allow compute nodes to resubmit jobs into the queue.

Original Q&A

Submitting Slurm job to head node via compute node?

There are 0 best solutions below

Related Questions in LINUX

Related Questions in SLURM

Related Questions in AMAZON-PARALLELCLUSTER

Trending Questions

Popular # Hahtags

Popular Questions