Forward Pass calculation on current batch in "get_updates" method of Keras SGD Optimizer

756 Views Asked by At

I am trying to implement a stochastic armijo rule in the get_gradient method of Keras SGD optimizer. Therefore, I need to calculate another forward pass to check if the learning_rate chosen was good. I don't want another calculation of the gradients, but I want to use the updated weights.

Using Keras Version 2.3.1 and Tensorflow Version 1.14.0

def get_updates(self, loss, params):
        grads = self.get_gradients(loss, params)
        self.updates = [K.update_add(self.iterations, 1)]

        lr = self.learning_rate
        if self.initial_decay > 0:
            lr = lr * (1. / (1. + self.decay * K.cast(self.iterations,
                                                      K.dtype(self.decay))))
        # momentum
        shapes = [K.int_shape(p) for p in params]
        moments = [K.zeros(shape, name='moment_' + str(i))
                   for (i, shape) in enumerate(shapes)]
        self.weights = [self.iterations] + moments
        for p, g, m in zip(params, grads, moments):
            v = self.momentum * m - lr * g  # velocity
            self.updates.append(K.update(m, v))

            if self.nesterov:
                new_p = p + self.momentum * v - lr * g
            else:
                new_p = p + v

            # Apply constraints.
            if getattr(p, 'constraint', None) is not None:
                 new_p = p.constraint(new_p)

            self.updates.append(K.update(p, new_p))

        ### own changes ###
        if self.armijo:
            inputs = (model._feed_inputs +
                      model._feed_targets +
                      model._feed_sample_weights)
            input_layer = model.layers[0].input
            armijo_function = K.function(inputs=input_layer, outputs=[loss],                                                                                  
                                                updates=self.updates,name='armijo')
            loss_next= armijo_function(inputs)
            [....change updates if learning rate was not good enough...]

        return self.updates

Unfortunately, I don't understand the error message when trying to calculate "loss_next":

tensorflow.python.framework.errors_impl.InvalidArgumentError: Requested Tensor connection between nodes "conv2d_1_input" and "conv2d_1_input" would create a cycle.

Two questions here:

  • how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.

  • any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?

Anyone who can help? Thanks in advance.

1

There are 1 best solutions below

1
On

how to access the current batch I am working on? The forward calculation should only consider the actual batch and as the gradients also belong only to that batch.

For this you can use batch_size = Total training records in model.fit() so that every epoch has just one forward pass and back propagation. Thus you can analysis the gradients on epoch 1 and modify the learning rate for epoch 2 OR if you are using the custom training loop then modify the code accordingly.

any better ideas to not use K.function for updating and evaluating a forward pass to calculate the loss function on that batch?

I do not recall any other option to evaluate gradient apart from using from tensorflow.keras import backend as K in tensorflow version 1.x. The best option is to update tensorflow to latest version 2.2.0 and use tf.GradientTape.

Would recommend to go through this answer to capture gradients using from tensorflow.keras import backend as K in tensorflow 1.x.

Below is a sample code which is almost similar to your requirement. I am using tensorflow version 2.2.0. You can build your requirements from this program.

We are doing below functions in the program -

  1. We are altering the Learning rate after every epoch. You can do that using callbacks argument of model.fit. Here I am incrementing learning rate by 0.01 for every epoch using tf.keras.callbacks.LearningRateScheduler and also displaying it at end of every epoch using tf.keras.callbacks.Callback.
  2. Computing the gradient using tf.GradientTape() after end of every epoch. We are collecting the grads of every epoch to a list using append.
  3. Also have set batch_size=len(train_images)as per your requirement.

Note : I am training on just 500 records from Cifar dataset due to memory constraints.

Code -

%tensorflow_version 2.x
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import backend as K

import os
import numpy as np
import matplotlib.pyplot as plt

(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.cifar10.load_data()

train_images = train_images[:500]
train_labels = train_labels[:500]

test_images = test_images[:50]
test_labels = test_labels[:50]

model = Sequential([
    Conv2D(16, 3, padding='same', activation='relu', input_shape=(32, 32, 3)),
    MaxPooling2D(),
    Conv2D(32, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Conv2D(64, 3, padding='same', activation='relu'),
    MaxPooling2D(),
    Flatten(),
    Dense(512, activation='relu'),
    Dense(10)
])

lr = 0.01
adam = Adam(lr)

# Define the Gradient Fucntion
epoch_gradient = []
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

# Define the Required Callback Function
class GradientCalcCallback(tf.keras.callbacks.Callback):
  def on_epoch_end(self, epoch, logs={}):
    with tf.GradientTape() as tape:
       logits = model(train_images, training=True)
       loss = loss_fn(train_labels, logits)    
    grad = tape.gradient(loss, model.trainable_weights)
    model.optimizer.apply_gradients(zip(grad, model.trainable_variables))
    epoch_gradient.append(grad)

gradcalc = GradientCalcCallback()

# Define the Required Callback Function
class printlearningrate(tf.keras.callbacks.Callback):
    def on_epoch_begin(self, epoch, logs={}):
        optimizer = self.model.optimizer
        lr = K.eval(optimizer.lr)
        Epoch_count = epoch + 1
        print('\n', "Epoch:", Epoch_count, ', LR: {:.2f}'.format(lr))

printlr = printlearningrate() 

def scheduler(epoch):
  optimizer = model.optimizer
  return K.eval(optimizer.lr + 0.01)

updatelr = tf.keras.callbacks.LearningRateScheduler(scheduler)

model.compile(optimizer=adam, 
          loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
          metrics=['accuracy'])

epochs = 10 

history = model.fit(train_images, train_labels, epochs=epochs, batch_size=len(train_images), 
                    validation_data=(test_images, test_labels),
                    callbacks = [printlr,updatelr,gradcalc])

# (7) Convert to a 2 dimensiaonal array of (epoch, gradients) type
gradient = np.asarray(epoch_gradient)
print("Total number of epochs run:", epochs)
print("Gradient Array has the shape:",gradient.shape)

Output -

 Epoch: 1 , LR: 0.01
Epoch 1/10
1/1 [==============================] - 0s 427ms/step - loss: 30.1399 - accuracy: 0.0820 - val_loss: 2114.8201 - val_accuracy: 0.1800 - lr: 0.0200

 Epoch: 2 , LR: 0.02
Epoch 2/10
1/1 [==============================] - 0s 329ms/step - loss: 141.6176 - accuracy: 0.0920 - val_loss: 41.7008 - val_accuracy: 0.0400 - lr: 0.0300

 Epoch: 3 , LR: 0.03
Epoch 3/10
1/1 [==============================] - 0s 328ms/step - loss: 4.1428 - accuracy: 0.1160 - val_loss: 2.3883 - val_accuracy: 0.1800 - lr: 0.0400

 Epoch: 4 , LR: 0.04
Epoch 4/10
1/1 [==============================] - 0s 329ms/step - loss: 2.3545 - accuracy: 0.1060 - val_loss: 2.3471 - val_accuracy: 0.1800 - lr: 0.0500

 Epoch: 5 , LR: 0.05
Epoch 5/10
1/1 [==============================] - 0s 340ms/step - loss: 2.3208 - accuracy: 0.1060 - val_loss: 2.3047 - val_accuracy: 0.1800 - lr: 0.0600

 Epoch: 6 , LR: 0.06
Epoch 6/10
1/1 [==============================] - 0s 331ms/step - loss: 2.3048 - accuracy: 0.1300 - val_loss: 2.3069 - val_accuracy: 0.0600 - lr: 0.0700

 Epoch: 7 , LR: 0.07
Epoch 7/10
1/1 [==============================] - 0s 337ms/step - loss: 2.3041 - accuracy: 0.1340 - val_loss: 2.3432 - val_accuracy: 0.0600 - lr: 0.0800

 Epoch: 8 , LR: 0.08
Epoch 8/10
1/1 [==============================] - 0s 341ms/step - loss: 2.2871 - accuracy: 0.1400 - val_loss: 2.6009 - val_accuracy: 0.0800 - lr: 0.0900

 Epoch: 9 , LR: 0.09
Epoch 9/10
1/1 [==============================] - 1s 515ms/step - loss: 2.2810 - accuracy: 0.1440 - val_loss: 2.8530 - val_accuracy: 0.0600 - lr: 0.1000

 Epoch: 10 , LR: 0.10
Epoch 10/10
1/1 [==============================] - 0s 343ms/step - loss: 2.2954 - accuracy: 0.1300 - val_loss: 2.3049 - val_accuracy: 0.0600 - lr: 0.1100
Total number of epochs run: 10
Gradient Array has the shape: (10, 10)

Hope this answers your question. Happy Learning.