Using tensorflow, how can you average parameter gradient values over a number of batches and update using that average?

174 Views Asked by At

Like many people developing deep learning models, I am hindered by my hardware resources, namely GPU memory. I have an audio classification problem for which I am trying out a number of RNNs. The data is very large and I am only able to use small batch sizes and must limit the lstm size also. I understand that many people use spectrograms or other methods to provide more condensed data to the network but I would specifically like to know how to do this with raw data.

This is what I am doing currently on a batch size of around 4:

loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=label_op))

optimisation_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss_op)

I would ideally like to calculate the gradients for each parameter for a batch on the GPU, then move those to RAM whilst calculating the gradients for the next batch. After some number of batches, I would then like to average the gradients for each parameter and use them to update the network.

I got this idea from the inception github page, which describes something similar to parallelise over multiple GPUs: https://github.com/tensorflow/models/tree/master/research/inception

1

There are 1 best solutions below

0
On

Use tf.Variable objects to store the gradients. Place those on the CPU.