OpenGL compute shader mapping to nVidia warps

982 Views Asked by At

Let's say I have an OpenGL compute shader with local_size=8*8*8. How do the invocations map to nVidia GPU warps? Would invocations with the same gl_LocalInvocationID.x be in the same warp? Or y? Or z? I don't mean all invocations, I just mean general aggregation.

I am asking this because of optimizations as in one moment, not all invocations have work to do so I want them to be in the same warp.

2

There are 2 best solutions below

2
On

According to this: https://www.khronos.org/opengl/wiki/Compute_Shader#Inputs

  gl_LocalInvocationIndex =
          gl_LocalInvocationID.z * gl_WorkGroupSize.x * gl_WorkGroupSize.y +
          gl_LocalInvocationID.y * gl_WorkGroupSize.x + 
          gl_LocalInvocationID.x;

So it is quite safe to assume that invocations with the same gl_LocalInvocationID.x are in the same warp.

11
On

The compute shader execution model allows the number of invocations to (greatly) exceed the number of individual execution units in a warp/wavefront. For example, hardware warp/wavefront sizes tend to be between 16 and 64, while the number of invocations within a work group (GL_MAX_COMPUTE_WORK_GROUP_INVOCATIONS) is required in OpenGL to be no less than 1024.

barrier calls and using shared variable data when a work group spans multiple warps/wavefronts works essentially by halting the progress of all warps/wavefronts until they each have passed that particular point. And then performing various memory flushing so that they can access each others' variables (based on memory barrier usage, of course). If all of the invocations in a work group fit into a single warp, then it's possible to avoid such things.

Basically, you have no control over how CS invocations are grouped into warps. You can assume that the implementation is not trying to be slow (that is, it will generally group invocations from the same work group into the same warp), but you cannot assume that all invocations within the same work group will be in the same warp.

Nor should you assume that each warp only executes invocations from the same work group.