Using atomic arithmetic operations in CUDA Unified Memory multi-GPU or multi-processor

1k Views Asked by At

I am trying to implement a CUDA program that uses Unified Memory. I have two unified arrays and sometimes they need to be updated atomically.

The question below has an answer for a single GPU environment but I am not sure how to extend the answer given in the question to adapt in multi-GPU platforms.

Question: cuda atomicAdd example fails to yield correct output

I have 4 Tesla K20 if you need this information and all of them updates a part of those arrays that must be done atomically.

I would appreciate any help/recommendations.

1

There are 1 best solutions below

0
On BEST ANSWER

To summarize comments into an answer:

  • You can perform this sort of address space wide atomic operation using atomicAdd_system
  • However, you can only do this on compute capability 6.x or newer devices (7.2 or newer if using Tegra)
  • specifically this means you have to compile for the correct compute capability such as -arch=sm_60 or similar
  • You state in the question you are using Telsa K20 cards -- these are compute capability 3.5 and do not support any of the system wide atomic functions.

As always, this information is neatly summarized in the relevant section of the Programming Guide.