If I have a __constant__
value
__constant__ float constVal;
Which may or may not be initialized by MPI ranks on non-blocking streams:
cudaMemcpyToSymbolAsync((void*)&constVal,deviceValue,sizeof(float),0,cudaMemcpyDeviceToDevice,stream);
Is this:
- Safe to be accessed by multiple MPI ranks simultaneously within kernels? I.e. do ranks share the same instance of
val
or do MPI semantics (they all have a private copy) still hold? - If the above is safe, is it safe to be initialized by multiple MPI ranks?
Neither. CUDA contexts are not shared amongst processes. If you have multiple processes you get multiple contexts, and each context has its own copy of all the statically defined symbols and code. This behaviour is independent of MPI semantics. If you are imagining that multiple processes in an MPI communicator are sharing the same GPU context and state, they aren't.
It isn't only safe, it is mandatory.