Suppose I have a GPU and driver version supporting unified addressing; two GPUs, G0 and G1; a buffer allocated in G1 device memory; and that the current context C0 is a context for G0.
Under these circumstances, is it legitimate to cuMemcpy() from my buffer to host memory, despite it having been allocated in a different context for a different device?
So far, I've been working under the assumption that the answer is "yes". But I've recently experienced some behavior which seems to contradict this assumption.
Calling
cuMemcpyfrom another context is legal, regardless of which device the context was created on. Depending on which case you are in, I recommend the following:cuMallocAsync/cuFreeAsyncAPI to allocate and/or release memory, please make sure that operations are correctly stream-orderedIf you keep experiencing issues after these steps, you can file a bug with NVIDIA here.