I am using Gluonts for building DeepAR model but takes lot of time to run the training object eventhough I use cox = 'gpu' but throws an error. My machine has GPU but the option didn't work. Any help is much appreciated...
Using DeepAR GPU
650 Views Asked by Sharan At
2
There are 2 best solutions below
0
On
My takeaways to train GluonTS[mxnet] models using GPU:
- MXNET only supports NVIDIA GPU (ex: EC2 g4dn.xlarge)
- Make sure NVIDIA drivers are properly installed (using an AMI such as "Deep Learning AMI GPU CUDA" was great help for me)
- Get driver version using
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Mon_May__3_19:15:13_PDT_2021 Cuda compilation tools, release 11.3, V11.3.109 Build cuda_11.3.r11.3/compiler.29920130_0
- Uninstall mxnet
pip uninstall mxnet - Install mxnet with cuda support according to your driver version
pip install mxnet-cu113 - Optionaly, set your trainer with GPU (it should automatically detect the GPU but you may force it)
trainer=Trainer( ctx=mxnet.context.gpu(), epochs=train_conf.max_epochs, num_batches_per_epoch=train_conf.num_batches_per_epoch, )
- Run your training and check the GPU is being used with
nvidia-smiYou should see something like that:
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.47.03 Driver Version: 510.47.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 | | N/A 32C P0 37W / 70W | 1101MiB / 15360MiB | 39% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 16057 C python 1099MiB | +-----------------------------------------------------------------------------+
You can check your mxnet current version, I believe ur using a CPU version.
please check the following:
it should return number of gpus