Work-item branch divergence in OpenCL, how does it work?

297 Views Asked by AudioBubble At 29 June 2022 at 16:11

I'm studying something about OpenCL and I don't understand very well the concept of "work-item divergence or Divergent Control Flow".

As we can see in the picture below, there are some warp or wavefront, depends of the model of the GPU that executes one instruction or another instruction.

Now, my question is: all the warp/wavefront will execute the if condition and later the else condition or only one of these (only the if or only the else) as a normal control flow of a program.

This question can be very stupid, but on the web, I didn't find anything and with other material, I don't understand the point.

Original Q&A

There are 1 best solutions below

pmdj On 29 June 2022 at 19:31

The key to understanding the GPU-style SIMD execution model is that all threads in a wavefront/SIMD group always execute the exact same instruction at the same time. If a thread doesn't need to run an instruction that at least one other thread must execute, there won't be any side effects (register values won't change, etc.), but it still costs as much in terms of performance as if it really did run it.

If the branching condition is either true or false for all threads in a wavefront/SIMD group, then all threads only run the one branch, and the other branch is skipped. So if the condition is the same for almost all threads in your workload, or if you can arrange for the condition to be the same for all threads in a group, then you don't pay the divergence cost. (Or it becomes negligible.)

If there is a frequent divergence within the group, the whole wavefront needs to execute both branches. When this happens, the threads which don't need to actually run the code, will still step through those instructions required by the other threads at exactly the same time as those other threads, it just has no effect. Unlike hardware CPU threads, a GPU thread can't run different code from other threads (in the same SIMD group), it can only run the same code on different data, or it has to wait until the other threads have finished the code it doesn't need to run.

Work-item branch divergence in OpenCL, how does it work?

There are 1 best solutions below

Related Questions in GPU

Related Questions in OPENCL

Related Questions in NVIDIA

Related Questions in AMD-GPU

Related Questions in THREAD-DIVERGENCE

Trending Questions

Popular # Hahtags

Popular Questions