On today's GPUs, can warps be recombined dynamically?

38 Views Asked by At

GPUs run threads by grouping them in warps, and this grouping is apparently fixed. If you have divergence (from branching), then each processor cycle, only a fraction of the threads in the warp will actually execute; the GPU efficiency at that point is reduced. This is traditionally done by having a single PC for the whole wrap, and a mask with one bit per thread, and instructions that push and pop the PC and the mask at the right places.

The Volta and Turing GPUs moved the PC to be per-thread, in a technique called Independent Thread Scheduling (e.g. see here). However, it seems the basics remain the same: the thread-to-wrap mapping is fixed, and a GPU cycle will only execute one of these fixed warps or a subset thereof.

Am I correct? Or is there some other technique that lets the GPU recombine the threads into new warps dynamically? I ask because it would seem like a good performance improvement: for example, after an "if" that is true roughly 50% of the time randomly, the fixed-warp approach runs the branches at half capacity. In the (I guess) common case, your whole program is not just one wrap, but a large number of them, ready to be scheduled. It seems that although only ~16 out of 32 threads in each warp are ready to run at a given PC, there might be many more at the same PC in other warps. Is there no logic in the GPU that could consolidate the threads into non-diverged warps?

(I'm aware of special GPU instructions that assume fixed warps. Maybe the driver would enable this optimization I'm talking about only if the code doesn't use these instructions.)

0

There are 0 best solutions below