How to handle OOM(out of memory) issue when using tensorflow?

778 Views Asked by At

I'm trying to convert a pre-trained model to onnx format. I'm using tf2onnx.convert for this purpose. Command that I ran:

$ python3 -m tf2onnx.convert --saved-model models --output tf_model_op9.onnx

On executing the command, I get OOM issue and the process is killed like this:

2021-06-10 20:45:45.363569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 984 MB memory) -> physical GPU (device: 0, name: Xavier, pci bus id: 0000:00:00.0, compute capability: 7.2)

2021-06-10 20:45:46,335 - INFO - Computed 2 values for constant folding

Killed

On checking /var/log/kern.log i get:

Jun 10 21:01:36 dreamvu-desktop kernel: [559821.101983] Out of memory: Kill process 27888 (python3) score 501 or sacrifice child

Jun 10 21:01:36 dreamvu-desktop kernel: [559821.102503] Killed process 27888 (python3) total-vm:18059264kB, anon-rss:3788464kB, file-rss:126752kB, shmem-rss:0kB

Jun 10 21:01:36 dreamvu-desktop kernel: [559822.232634] oom_reaper: reaped process 27888 (python3), now anon-rss:0kB, file-rss:127808kB, shmem-rss:0kB

Most of the solutions I find are to limit batch_size(already 1), gpu resources using sessions(already tried) or change number of threads on cpu or change memory limit(not supported even in tf v2.5). I think I need to limit the RAM being used.

How do I do that?

OS : ubuntu 18.04 || Memory : 7.6 GiB

Graphics : NVIDIA Tegra Xavier (nvgpu)/integrated

Processor : ARMv8 Processor rev 0 (v8l) × 6

1

There are 1 best solutions below

1
On

Have you considered using a swapfile to provide the extra memory you need ? (assuming you have disk to do so )

as root, or with sudo you would need to:

  1. dd if=/dev/zero of=/some/path/swapfile bs=1M count=8192
  2. mkswap /some/path/swapfile
  3. swapon /some/path/swapfile

use the free command to confirm you have the extra memory available as swap