I have been dealing with problems when using Python multiprocessing and cuPY multiGPU in order to process data in parallel on different GPU.
I have coded a MVC to show you the error:
import cupy as cp
from multiprocessing import Pool, current_process, set_start_method, get_context
def run(ndevices, list_data):
process = current_process()
pid = process.pid
gpu_id = pid % ndevices
data = list_data[gpu_id]
with cp.cuda.Device(gpu_id):
data *= 0.5
print("Process {0} using GPU {1}, and data is on GPU {2}".format(pid, gpu_id, data.device), data)
ctx = get_context('spawn')
#pool = ctx.Pool(4)
# set_start_method('spawn', force=True)
def func():
list_multi_gpu = []
ndevices = 3
for i in range(ndevices):
with cp.cuda.Device(i):
list_multi_gpu.append(cp.ones((2,2)))
print("Checking GPU arrays on list")
for i, data in enumerate(list_multi_gpu):
print("Data from GPU {0} is on GPU {1}".format(i, data.device))
with ctx.Pool(processes=3) as pool:
pool.starmap(run, [(ndevices, list_multi_gpu), (ndevices, list_multi_gpu), (ndevices, list_multi_gpu)])
if __name__ == "__main__":
func()
After executing this using 3 GPUs I get:
Checking GPU arrays on list
Data from GPU 0 is on GPU <CUDA Device 0>
Data from GPU 1 is on GPU <CUDA Device 1>
Data from GPU 2 is on GPU <CUDA Device 2>
Process 1893744 using GPU 0, and data is on GPU <CUDA Device 0> [[0.5 0.5]
[0.5 0.5]]
/home/miguel.carcamo/test_multigpu/test_3.py:10: PerformanceWarning: The device where the array resides (0) is different from the current device (2). Peer access has been activated automatically.
data *= 0.5
/home/miguel.carcamo/test_multigpu/test_3.py:10: PerformanceWarning: The device where the array resides (0) is different from the current device (1). Peer access has been activated automatically.
data *= 0.5
Process 1893743 using GPU 2, and data is on GPU <CUDA Device 0> [[0.5 0.5]
[0.5 0.5]]
Process 1893745 using GPU 1, and data is on GPU <CUDA Device 0> [[1. 1.]
[1. 1.]]
So my questions are:
- Why data appears on GPU 0 even if the data was accessed using the
correct GPU id on the list and on
cp.cuda.Device()
? - Why data does not change to 0.5 on the last print lines?
I attach my current environment:
OS : Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Python Version : 3.9.12
CuPy Version : 12.2.0
CuPy Platform : NVIDIA CUDA
NumPy Version : 1.21.5
SciPy Version : 1.7.3
Cython Build Version : 0.29.28
Cython Runtime Version : 0.29.28
CUDA Root : /usr/local/cuda
nvcc PATH : /usr/local/cuda/bin/nvcc
CUDA Build Version : 11080
CUDA Driver Version : 11080
CUDA Runtime Version : 11080
cuBLAS Version : (available)
cuFFT Version : 10900
cuRAND Version : 10300
cuSOLVER Version : (11, 4, 1)
cuSPARSE Version : (available)
NVRTC Version : (11, 8)
Thrust Version : 101501
CUB Build Version : 101501
Jitify Build Version : <unknown>
cuDNN Build Version : None
cuDNN Version : None
NCCL Build Version : None
NCCL Runtime Version : None
cuTENSOR Version : None
cuSPARSELt Build Version : None
Device 0 Name : NVIDIA A100 80GB PCIe
Device 0 Compute Capability : 80
Device 0 PCI Bus ID : 0000:81:00.0
Device 1 Name : NVIDIA A100 80GB PCIe
Device 1 Compute Capability : 80
Device 1 PCI Bus ID : 0000:C1:00.0
Device 2 Name : NVIDIA A100 80GB PCIe
Device 2 Compute Capability : 80
Device 2 PCI Bus ID : 0000:C2:00.0