Can I split physical GPUs into multiple Logical/Virtual GPUS and pass them to dask_cuda.LocalCUDACluster?

353 Views Asked by At

I have a workflow which is greatly benefited from GPU acceleration, but each task has relatively low memory requirements (2-4 GB). I'm using a combination of dask.dataframe, dask.distributed.Client, and dask_cuda.LocalCUDACluster. The process would greatly benefit from more workers CUDA workers so I want to split the physical GPUs (Nvidia RTX A600, V100, A100) into multiple virtual/logical GPUs to increase the number of workers in my dask_cuda LocalCUDACluster. My initial thought was to try and pass logical_gpus created in TensorFlow to the LocalCUDACluster, but I don't seem to be able to pass them into the cluster.

I'm working in a docker environment, and I'd like to keep these plitting inside python. This workflow will ideally scale from a local workstation to multinode MPI jobs, but I'm not sure this is possible and I'm open to any suggestions.

Adding a similar example.

from dask.distributed import Client
from dask_cuda import LocalCUDACluster
from dask_cuda.initialize import initialize
import pandas as pd
import dask.dataframe as dd
import time

# fake function
def my_gpu_sim(x):
   """ 
   GPU simulation which is independent of any others (calls a c++ program in real-world, which saves a 
   file.)
   """
   ...
   return None

# fake data creation 
dic = {'random':['apple' for i in range(40)], 'main':[i for i in range(40)]}
df = pd.DataFrame.from_dict(dic)
ddf = dd.from_pandas(df, npartitions=4) 

 
# Configurations
protocol = "ucx"
enable_tcp_over_ucx = True
enable_nvlink = True
enable_infiniband = False
initialize(
    create_cuda_context=True,
    enable_tcp_over_ucx=enable_tcp_over_ucx,
    enable_infiniband=enable_infiniband,
    enable_nvlink=enable_nvlink,
)
cluster = LocalCUDACluster(local_directory="/tmp/USERNAME",
                        protocol=protocol,
                        enable_tcp_over_ucx=enable_tcp_over_ucx,
                        enable_infiniband=enable_infiniband,
                        enable_nvlink=enable_nvlink,
                        rmm_pool_size="35GB"
                    )
client = Client(cluster)

# Simulation
ddf.map_partitions(lambda df: df.apply(lambda x: my_gpu_sim(x.main), axis=1)).compute(scheduler=client)
0

There are 0 best solutions below