Using atomic arithmetic operations in CUDA Unified Memory multi-GPU or multi-processor

1k Views Asked by HARUN SASMAZ At 29 June 2025 at 10:12

I am trying to implement a CUDA program that uses Unified Memory. I have two unified arrays and sometimes they need to be updated atomically.

The question below has an answer for a single GPU environment but I am not sure how to extend the answer given in the question to adapt in multi-GPU platforms.

Question: cuda atomicAdd example fails to yield correct output

I have 4 Tesla K20 if you need this information and all of them updates a part of those arrays that must be done atomically.

I would appreciate any help/recommendations.

Original Q&A

There are 1 best solutions below

talonmies On 09 June 2020 at 12:30 BEST ANSWER

To summarize comments into an answer:

You can perform this sort of address space wide atomic operation using atomicAdd_system
However, you can only do this on compute capability 6.x or newer devices (7.2 or newer if using Tegra)
specifically this means you have to compile for the correct compute capability such as -arch=sm_60 or similar
You state in the question you are using Telsa K20 cards -- these are compute capability 3.5 and do not support any of the system wide atomic functions.

As always, this information is neatly summarized in the relevant section of the Programming Guide.

Using atomic arithmetic operations in CUDA Unified Memory multi-GPU or multi-processor

There are 1 best solutions below

Related Questions in CUDA

Related Questions in ATOMIC

Related Questions in UNIFIED-MEMORY

Trending Questions

Popular # Hahtags

Popular Questions