I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer. Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the gradients, but I want to use the updated weights.
Using Keras Version 2.3.1 and Tensorflow Version 1.14.0
def get_updates(self, loss, params):
grads = self.get_gradients(loss, params)
self.updates = [K.update_add(self.iterations, 1)]
lr = self.learning_rate
if self.initial_decay > 0:
lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
K.dtype(self.decay))))
# momentum
shapes = [K.int_shape(p) for p in params]
moments = [K.zeros(shape, name='moment_' + str(i))
for (i, shape) in enumerate(shapes)]
self.weights = [self.iterations] + moments
for p, g, m in zip(params, grads, moments):
v = self.momentum * m - lr * g # velocity
self.updates.append(K.update(m, v))
if self.nesterov:
new_p = p + self.momentum * v - lr * g
else:
new_p = p + v
# Apply constraints.
if getattr(p, 'constraint', None) is not None:
new_p = p.constraint(new_p)
self.updates.append(K.update(p, new_p))
### own changes ###
if self.armijo:
inputs = (model._feed_inputs +
model._feed_targets +
model._feed_sample_weights)
input_layer = model.layers[0].input
armijo_function = K.function(inputs=input_layer, outputs=[loss],
updates=self.updates,name='armijo')
loss_next= armijo_function(inputs)
[....change updates if learning rate was not good enough...]
return self.updates
Unfortunately, I don't understand the error message when trying to calculate "loss_next":
tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.
Two questions here:
how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.
any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?
Anyone who can help? Thanks in advance.
For this you can use
batch_size = Total training records
inmodel.fit()
so that every epoch has just one forward pass and back propagation. Thus you can analysis the gradients onepoch 1
and modify the learning rate forepoch 2
OR if you are using the custom training loop then modify the code accordingly.I do not recall any other option to evaluate gradient apart from using
from tensorflow.keras import backend as K
intensorflow version 1.x
. The best option is to update tensorflow to latest version2.2.0
and usetf.GradientTape
.Would recommend to go through this answer to capture gradients using
from tensorflow.keras import backend as K
intensorflow 1.x
.Below is a sample code which is almost similar to your requirement. I am using
tensorflow version 2.2.0
. You can build your requirements from this program.We are doing below functions in the program -
model.fit
. Here I am incrementing learning rate by 0.01 for every epoch usingtf.keras.callbacks.LearningRateScheduler
and also displaying it at end of every epoch usingtf.keras.callbacks.Callback
.tf.GradientTape()
after end of every epoch. We are collecting the grads of every epoch to a list using append.batch_size=len(train_images)
as per your requirement.Note : I am training on just 500 records from Cifar dataset due to memory constraints.
Code -
Output -
Hope this answers your question. Happy Learning.