CUDA not available on Azure Data Science virtual machine

873 Views Asked by At

I am running a fresh Windows Server 2019 Data Science virtual machine in Azure. I'm using the NC6_Promo size which has the Tesla K80 GPU. After the deployment is complete I tried to check if CUDA was working using the following python commands:

import torch
torch.cuda.is_available()
Out[3]: False

This returns a false statement indicating CUDA is not available.

When checking with the nvidia-smi tooling I get the following response:

Microsoft Windows [Version 10.0.17763.2300]
(c) 2018 Microsoft Corporation. All rights reserved.

C:\Users\administrator>nvidia-smi

Wed Dec 22 11:23:36 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 472.50       Driver Version: 472.50       CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           TCC  | 00000001:00:00.0 Off |                    0 |
| N/A   42C    P8    28W / 149W |      9MiB / 11448MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Which tells me CUDA version 11.4 is available.

The device manager of the virtual machine also show the Tesla K80 GPU as installed and working properly:

Tesla K80 status

Steps I've taken so far:

  1. Enable the nVidia virtual machine extension
  2. (Re)installed the latest nVidia drivers for the Tesla K80
  3. Upgraded CUDA to version 11.5

However I still can't use the GPU from PyTorch. Any other steps I could take to get this working? This really should work out of the box.

1

There are 1 best solutions below

0
On

it is a version thing .... Since the Azure Data Science VM image is what it is you need to fix some things yourself before things work correctly. First check the real version of your CUDA by running "nvidia-smi"-command. Mine reported 11.1 , so I need to install versions that match it.

Next I activate my desired conda env in the vm (mine was "azureml_py38_PT_and_TF") and do "conda uninstall pytorch"

Then I go to pytorch.org and use the install tool to come up with suitable install command , mine was: conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c conda-forge

This installs pytorch LTS with correct cuda support. Now it should work.