Using DeepAR GPU

646 Views Asked by At

I am using Gluonts for building DeepAR model but takes lot of time to run the training object eventhough I use cox = 'gpu' but throws an error. My machine has GPU but the option didn't work. Any help is much appreciated...

2

There are 2 best solutions below

0
On

My takeaways to train GluonTS[mxnet] models using GPU:

  1. MXNET only supports NVIDIA GPU (ex: EC2 g4dn.xlarge)
  2. Make sure NVIDIA drivers are properly installed (using an AMI such as "Deep Learning AMI GPU CUDA" was great help for me)
  3. Get driver version using nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0

  1. Uninstall mxnet pip uninstall mxnet
  2. Install mxnet with cuda support according to your driver version pip install mxnet-cu113
  3. Optionaly, set your trainer with GPU (it should automatically detect the GPU but you may force it)
trainer=Trainer(
        ctx=mxnet.context.gpu(),
        epochs=train_conf.max_epochs,
        num_batches_per_epoch=train_conf.num_batches_per_epoch,
    )
  1. Run your training and check the GPU is being used with nvidia-smi You should see something like that:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.47.03    Driver Version: 510.47.03    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   32C    P0    37W /  70W |   1101MiB / 15360MiB |     39%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     16057      C   python                           1099MiB |
+-----------------------------------------------------------------------------+
2
On

You can check your mxnet current version, I believe ur using a CPU version.

please check the following:

import mxnet as mx
    print(f'mxnet version: {mx.__version__}')
    print(f'Number of GPUs: {mx.context.num_gpus()}')

it should return number of gpus