I'm using gnu parallel to run many instances of an executable in batches. Each instance must be pinned to its own processor. On a single local host, this is straightforward; I set the number of slots to be the number of processors, then do:
taskset -c $(( {%} - 1)) ...
I'm not sure how to extend this to multiple hosts over ssh, since the slot index can no longer easily translate to a processor index.
One idea: my nodes all have the same number of processors, so if the slots could be initially assigned contiguously, filling up each node before any are assigned to the next, I could calculate a processor index from the slot index. However, it seems that slots are assigned in a round-robin fashion, and I haven't been able to figure out how to change that.
Any ideas on how I can get this to work? Thanks