My main question is; is averaging the loss the same thing as averaging the gradient and how do i accumulate my loss over mini batches then calculate my gradient?
I have been trying to implement policy gradient in Tensorflow and run into the issue where i can not feed all my game states into my network at once and then update. The problem is if i lower my network size then train on all frames at once and take the mean of the loss then it begins to converge nicely. But if I accumulate the gradients over mini batches then average them, my gradients explode and i overflow my weights.
Any help or insight will be very appreciated.
Keep in mind also, this is my first time asking a question here.
What you can do is to accumulate gradients after each mini-batch and then update the weights based on gradient averages. Consider following simple case for fitting 50 Gaussian blobs with a single-layered perceptron:
The
minimize()
method of the tensorflow optimizers callscompute_gradients()
and thenapply_gradients()
. Instead of calling theminimize()
, I'm going to call both methods directly. First, to get the gradients we callcompute_gradients()
(which returns a list of tuplesgrads_and_vars
) and forapply_gradients()
instead of gradients I'm going to feed placeholders for future gradient's averages:During mini-batches we only compute losses (you can accumulate losses as well - append to some list and then compute average) and gradients, without applying gradients to weights. At the end of each epoch we execute
apply_grads_op
operation while feeding accumulated gradients to its placeholders: