I am following a tutorial to fin-tune a SOTA model with MXNet. I am doing it in Google Colab: https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html
However, I am unable to make it work. I believe it has to do with the version of MXNEt and Cuda version in Google Colab. I am getting this error:
MXNetError: Traceback (most recent call last):
File "../src/ndarray/../operator/tensor/./../mxnet_op.h", line 1120
Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device
when it reaches this part:
train_loss += sum([l.mean().asscalar() for l in loss])
The version of CUDA I get is the following
!nvcc --version # para mirar la version de CUDA
!nvidia-smi
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
Wed Mar 16 22:13:40 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03 Driver Version: 460.32.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 00000000:00:04.0 Off | 0 |
| N/A 71C P8 33W / 149W | 0MiB / 11441MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Last time I ran it, it worked but after 2 weeks I tried to ran the same notebook and I am unable to make it work. This is how I installed the libraries I need:
!pip install mxnet-cu110
!pip install torch==1.8.0 torchvision
!pip install gluoncv[full]
!pip install mmcv
Any help would be very much appreciated. Thanks!
Google Colab is a cloud service, and depending on load, time of day, and geographic location, a range of different hardware and software stacks can be provisioned.
For GPU enabled sessions, this currently can include Tesla K80 (compute capability 3.7), Tesla P100 (compute capability 6.0), and Tesla T4 (compute capability 7.5) running over various versions of CUDA 10 and CUDA 11. This can lead to situations where externally installed code and frameworks may run in one session and not in another, if the sessions are provisioned on VMs running older or newer hardware or software stacks which break compatibility.