Do modern nVIDIA GPUs perform sub-warp scheduling of work?

732 Views Asked by einpoklum At 28 June 2025 at 16:24

In recent nVIDIA GPU uarchitectures, a single streaming multiprocessor seems to be broken up into 4 sub-units; with each of them having horizontal or vertical 'bars' of 8 'squares', corresponding to different functional units: integer ops, 32-bit flops, 64-bit flops, and load/store. A single warp scheduler seems to be associated with each such "quarter-SM".

Now, in the CUDA programming model, the threads of each warp (= 32 threads) are instruction-locked together. However, when actually executing work, and in a situation where, say, only the second half or latter quarter of the threads in a warp are active - can these sub-warps be scheduled to 2 or 3 quarter-SMs, with the other quarter-SM doing some other work?

Original Q&A

There are 1 best solutions below

einpoklum On 05 January 2018 at 16:08

No, they don't.

Based on Robert's comments, sub-warp scheduling does not happen - scheduling is always of full warps (at least as far as anyone using the chip is concerned). Internally it may or may not be the case that sub-warp scheduling is possible.

Do modern nVIDIA GPUs perform sub-warp scheduling of work?

There are 1 best solutions below

No, they don't.

Related Questions in CUDA

Related Questions in NVIDIA

Related Questions in GPGPU

Related Questions in GPU-WARP

Trending Questions

Popular # Hahtags

Popular Questions