I am running a fresh Windows Server 2019 Data Science virtual machine in Azure. I'm using the NC6_Promo size which has the Tesla K80 GPU. After the deployment is complete I tried to check if CUDA was working using the following python commands:
import torch
torch.cuda.is_available()
Out[3]: False
This returns a false statement indicating CUDA is not available.
When checking with the nvidia-smi tooling I get the following response:
Microsoft Windows [Version 10.0.17763.2300]
(c) 2018 Microsoft Corporation. All rights reserved.
C:\Users\administrator>nvidia-smi
Wed Dec 22 11:23:36 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 472.50 Driver Version: 472.50 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 TCC | 00000001:00:00.0 Off | 0 |
| N/A 42C P8 28W / 149W | 9MiB / 11448MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Which tells me CUDA version 11.4 is available.
The device manager of the virtual machine also show the Tesla K80 GPU as installed and working properly:
Steps I've taken so far:
- Enable the nVidia virtual machine extension
- (Re)installed the latest nVidia drivers for the Tesla K80
- Upgraded CUDA to version 11.5
However I still can't use the GPU from PyTorch. Any other steps I could take to get this working? This really should work out of the box.
it is a version thing .... Since the Azure Data Science VM image is what it is you need to fix some things yourself before things work correctly. First check the real version of your CUDA by running "nvidia-smi"-command. Mine reported 11.1 , so I need to install versions that match it.
Next I activate my desired conda env in the vm (mine was "azureml_py38_PT_and_TF") and do "conda uninstall pytorch"
Then I go to pytorch.org and use the install tool to come up with suitable install command , mine was: conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch-lts -c conda-forge
This installs pytorch LTS with correct cuda support. Now it should work.