Weird error when using multiprocessing and CUPY multi GPU

54 Views Asked by At

I have been dealing with problems when using Python multiprocessing and cuPY multiGPU in order to process data in parallel on different GPU.

I have coded a MVC to show you the error:

import cupy as cp
from multiprocessing import Pool, current_process, set_start_method, get_context

def run(ndevices, list_data):
    process = current_process()
    pid = process.pid
    gpu_id = pid % ndevices
    data = list_data[gpu_id]
    with cp.cuda.Device(gpu_id):
        data *= 0.5
        print("Process {0} using GPU {1}, and data is on GPU {2}".format(pid, gpu_id, data.device), data)

ctx = get_context('spawn')
#pool = ctx.Pool(4)
# set_start_method('spawn', force=True)

def func():
    list_multi_gpu = []

    ndevices = 3

    for i in range(ndevices):
        with cp.cuda.Device(i):
            list_multi_gpu.append(cp.ones((2,2)))

    print("Checking GPU arrays on list")
    for i, data in enumerate(list_multi_gpu):
        print("Data from GPU {0} is on GPU {1}".format(i, data.device))

    with ctx.Pool(processes=3) as pool:
        pool.starmap(run, [(ndevices, list_multi_gpu), (ndevices, list_multi_gpu), (ndevices, list_multi_gpu)])

if __name__ == "__main__":
    func()

After executing this using 3 GPUs I get:

Checking GPU arrays on list
Data from GPU 0 is on GPU <CUDA Device 0>
Data from GPU 1 is on GPU <CUDA Device 1>
Data from GPU 2 is on GPU <CUDA Device 2>
Process 1893744 using GPU 0, and data is on GPU <CUDA Device 0> [[0.5 0.5]
 [0.5 0.5]]
/home/miguel.carcamo/test_multigpu/test_3.py:10: PerformanceWarning: The device where the array resides (0) is different from the current device (2). Peer access has been activated automatically.
  data *= 0.5
/home/miguel.carcamo/test_multigpu/test_3.py:10: PerformanceWarning: The device where the array resides (0) is different from the current device (1). Peer access has been activated automatically.
  data *= 0.5
Process 1893743 using GPU 2, and data is on GPU <CUDA Device 0> [[0.5 0.5]
 [0.5 0.5]]
Process 1893745 using GPU 1, and data is on GPU <CUDA Device 0> [[1. 1.]
 [1. 1.]]

So my questions are:

  1. Why data appears on GPU 0 even if the data was accessed using the correct GPU id on the list and on cp.cuda.Device()?
  2. Why data does not change to 0.5 on the last print lines?

I attach my current environment:

OS                           : Linux-5.15.0-52-generic-x86_64-with-glibc2.35
Python Version               : 3.9.12
CuPy Version                 : 12.2.0
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.21.5
SciPy Version                : 1.7.3
Cython Build Version         : 0.29.28
Cython Runtime Version       : 0.29.28
CUDA Root                    : /usr/local/cuda
nvcc PATH                    : /usr/local/cuda/bin/nvcc
CUDA Build Version           : 11080
CUDA Driver Version          : 11080
CUDA Runtime Version         : 11080
cuBLAS Version               : (available)
cuFFT Version                : 10900
cuRAND Version               : 10300
cuSOLVER Version             : (11, 4, 1)
cuSPARSE Version             : (available)
NVRTC Version                : (11, 8)
Thrust Version               : 101501
CUB Build Version            : 101501
Jitify Build Version         : <unknown>
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A100 80GB PCIe
Device 0 Compute Capability  : 80
Device 0 PCI Bus ID          : 0000:81:00.0
Device 1 Name                : NVIDIA A100 80GB PCIe
Device 1 Compute Capability  : 80
Device 1 PCI Bus ID          : 0000:C1:00.0
Device 2 Name                : NVIDIA A100 80GB PCIe
Device 2 Compute Capability  : 80
Device 2 PCI Bus ID          : 0000:C2:00.0
0

There are 0 best solutions below