CUDA atomic operations and concurrent kernel launch

1k Views Asked by At

Currently I develop a GPU-based program that use multiple kernels that are launched concurrently by using multiple streams.

In my application, multiple kernels need to access a queue/stack and I have plan to use atomic operations.

But I do not know whether atomic operations work between multiple kernels concurrently launched. Please help me anyone who know the exact mechanism of the atomic operations on GPU or who has experience with this issue.

1

There are 1 best solutions below

0
On

Atomics are implemented in the L2 cache hardware of the GPU, through which all memory operations must pass. There is no hardware to ensure coherency between host and device memory, or between different GPUs; but as long as the kernels are running on the same GPU and using device memory on that GPU to synchronize, atomics will work as expected.