Do modern nVIDIA GPUs perform sub-warp scheduling of work?

725 Views Asked by At

In recent nVIDIA GPU uarchitectures, a single streaming multiprocessor seems to be broken up into 4 sub-units; with each of them having horizontal or vertical 'bars' of 8 'squares', corresponding to different functional units: integer ops, 32-bit flops, 64-bit flops, and load/store. A single warp scheduler seems to be associated with each such "quarter-SM".

enter image description here

Now, in the CUDA programming model, the threads of each warp (= 32 threads) are instruction-locked together. However, when actually executing work, and in a situation where, say, only the second half or latter quarter of the threads in a warp are active - can these sub-warps be scheduled to 2 or 3 quarter-SMs, with the other quarter-SM doing some other work?

1

There are 1 best solutions below

0
On

No, they don't.

Based on Robert's comments, sub-warp scheduling does not happen - scheduling is always of full warps (at least as far as anyone using the chip is concerned). Internally it may or may not be the case that sub-warp scheduling is possible.