I use tensorflow c++ version to do CNN inference. I already set_allow_growth(true)
, but it still consume more GPU memory than exactly need .
set_per_process_gpu_memory_fraction
can only set an upper bound of the GPU memory, but different CNN model have different upper bound. Is there a good way to solve the problem
Unfortunately, there's no such flag to use out-of-the-box, but this could be done (manually):
By default, TF allocates all the available GPU memory. Setting
set_allow_growth
totrue
, causing TF to allocate the needed memory in chunks instead of allocating all GPU memory at once. Every time TF will require more GPU memory than already allocated, it will allocate another chunk.In addition, as you mentioned, TF supports
set_per_process_gpu_memory_fraction
which specifies the maximum GPU memory the process can require, in terms of percent of the total GPU memory. This results in out-of-memory (OOM) exceptions in case TF requires more GPU memory than allowed.Unfortunately, I think the chunk size cannot be set by the user and is hard-coded in TF (for some reason I think the chunk size is 4GB but I'm not sure).
This results in being able to specify the maximum amount of GPU memory that you allow TF to use (in terms of percents). If you know how much GPU memory you have in total (can be retrieved by
nvidia-smi
, and you know how much memory you want to allow, you can calculate it in terms of percents and set it to TF.If you run a small number of neural networks, you can find the required GPU memory for each of them by running it with different allowed GPU memory, like a binary search and see what's the minimum fraction that enables the NN to run. Then, setting the values you found as the values for
set_per_process_gpu_memory_fraction
for each NN will achieve what you wanted.