Mapping between Compute Shaders and Cuda

1.9k Views Asked by At

I am trying to understand the differences between Computer Shaders and Cuda and how they operate. I have only used Cuda and As I understand:

In shader-based computing the number of shaders is equal to the number of pixels while in Cuda we can have kernels-threads that operate to more than one 'pixel-data element'.

In Cuda we have various memory types from which we can fetch data (global, shared, constant, texture) but what happens in shader-based computing? Are there different memory types and how computing is mapped onto graphics (kernel,input,output) ? Is it true that in Compute Shaders there is no communication among processes (like the shared memory & synchronization in Cuda) ? And are there any other limitations on compute shader kernels?

1

There are 1 best solutions below

1
On

They serve the same general purpose, the real difference is that compute shaders are an extension of graphics APIs such as OpenGL and Direct3D. Compute shaders allow you to bypass the normal programmable graphics pipeline (e.g. vertex->tessellation->geomtery->fragment) and access the underlying compute power of the host GPU without having to shoe-horn your algorithm somewhere into the aforementioned pipeline. There absolutely is shared memory / synchronization between invocations in a workgroup in compute shaders.

Also, I do not know where this notion of "pixels" is coming from. The whole point of creating compute shaders was to unburden development from constructs that only apply to the actual graphics pipeline (e.g. vertices, fragments/pixels) and strip everything down to general purpose (hence the term GPGPU) compute/memory functionality. Granted, when D3D/OpenGL compute shaders are used rather than a dedicated API such as OpenCL or CUDA it is often to accomplish something related to rendering, but this is by no means a requirement.

To put all of this into a more formal context, consider how the GLSL specification introduces the compute processor. The key point to take away from this is that they are a new type of shader, but not a new stage in the graphics pipeline - they exist on their own.


GLSL 4.4 Spec - 2.6 Compute Processor - pp. 8

Compute Processor

The compute processor is a programmable unit that operates independently from the other shader processors.

[...]

A compute shader has access to many of the same resources as fragment and other shader processors, including textures, buffers, image variables, and atomic counters. It does not have any predefined inputs nor any fixed-function outputs. It is not part of the graphics pipeline and its visible side effects are through changes to images, storage buffers, and atomic counters.

A compute shader operates on a group of work items called a work group. A work group is a collection of shader invocations that execute the same code, potentially in parallel. An invocation within a work group may share data with other members of the same work group through shared variables and issue memory and control barriers to synchronize with other members of the same work group.