Error with MXNET and CUDA in Google Colab: no kernel image is available for execution on the device

762 Views Asked by At

I am following a tutorial to fin-tune a SOTA model with MXNet. I am doing it in Google Colab: https://cv.gluon.ai/build/examples_action_recognition/finetune_custom.html

However, I am unable to make it work. I believe it has to do with the version of MXNEt and Cuda version in Google Colab. I am getting this error:

MXNetError: Traceback (most recent call last):
  File "../src/ndarray/../operator/tensor/./../mxnet_op.h", line 1120
Name: Check failed: err == cudaSuccess (209 vs. 0) : mxnet_generic_kernel ErrStr:no kernel image is available for execution on the device

when it reaches this part:

train_loss += sum([l.mean().asscalar() for l in loss])

The version of CUDA I get is the following

!nvcc --version # para mirar la version de CUDA
!nvidia-smi
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0
Wed Mar 16 22:13:40 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P8    33W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Last time I ran it, it worked but after 2 weeks I tried to ran the same notebook and I am unable to make it work. This is how I installed the libraries I need:

!pip install mxnet-cu110
!pip install torch==1.8.0 torchvision
!pip install gluoncv[full]
!pip install mmcv

Any help would be very much appreciated. Thanks!

1

There are 1 best solutions below

0
On

Google Colab is a cloud service, and depending on load, time of day, and geographic location, a range of different hardware and software stacks can be provisioned.

For GPU enabled sessions, this currently can include Tesla K80 (compute capability 3.7), Tesla P100 (compute capability 6.0), and Tesla T4 (compute capability 7.5) running over various versions of CUDA 10 and CUDA 11. This can lead to situations where externally installed code and frameworks may run in one session and not in another, if the sessions are provisioned on VMs running older or newer hardware or software stacks which break compatibility.