Train pretrained custom model with different loss function

792 Views Asked by At

In TF2 keras, I have trained an Autoencoder using tensorflow.keras.losses.MeanSquaredError as loss function. Now, I want to further train this model by using another loss function, specifically tensorflow.keras.losses.KLDivergence. The reason for this is that initially unsupervised learning is conducted for representation learning. Then, having the generated embeddings, I can cluster them and use these clusters for self-supervision, i.e. labels, enabling the second, supervised loss and improving the model further.

This is not transfer learning per se, as no new layers are added to the model, just the loss function is changed and the model continues training.

What I have tried is using the pretrained model with the MSE loss as the new model's property:

class ClusterBooster(tf.keras.Model):

 def __init__(self, base_model, centers):
    super(ClusterBooster, self).__init__()

    self.pretrained = base_model
    self.centers = centers


 def train_step(self, data):
    with tf.GradientTape() as tape:
        loss = self.compiled_loss(self.P, self.Q, regularization_losses=self.losses)

    # Compute gradients
    gradients = tape.gradient(loss, self.trainable_variables)
    
    # Update weights
    self.optimizer.apply_gradients(zip(gradients, self.trainable_variables))

    return {m.name: m.result() for m in self.metrics}

where the loss is the KL loss between distributions P and Q. The distributions are computed in a callback function instead of the model train_step as I need access to the current epoch (P is updated every 5 epochs, not on each epoch):

def on_epoch_begin(self, epoch, logs=None):
    z = self.model.pretrained.embed(self.feature, training=True)
    z = tf.reshape(z, [tf.shape(z)[0], 1, tf.shape(z)[1]])  # reshape for broadcasting
    # CALCULATE Q FOR EVERY EPOCH
    partial = tf.math.pow(tf.norm(z - self.model.centers, axis=2, ord='euclidean'), 2)
    nominator = 1 / (1 + partial)
    denominator = tf.math.reduce_sum(1 / (1 + partial))
    self.model.Q = nominator / denominator

    # CALCULATE P EVERY 5 EPOCHS TO AVOID INSTABILITY
    if epoch % 5 == 0:
        partial = tf.math.pow(self.model.Q, 2) / tf.math.reduce_sum(self.model.Q, axis=1, keepdims=True)
        nominator = partial
        denominator = tf.math.reduce_sum(partial, axis=0)
        self.model.P = nominator / denominator

However, when apply_gradients() is executed I get:

ValueError: No gradients provided for any variable: ['dense/kernel:0', 'dense/bias:0', 'dense_1/kernel:0', 'dense_1/bias:0', 'dense_2/kernel:0', 'dense_2/bias:0', 'dense_3/kernel:0', 'dense_3/bias:0']

I think that this is due to the fact that the pretrained model is not set to be further trained somewhere inside the new model (only the embed() method is called, which does not train the model). Is this a correct approach and I am just missing something or is there a better way?

1

There are 1 best solutions below

0
On

It seems that whatever computation takes place in a callback, isn't tracked for gradient computation and weight updating. Thus, these computations should be put inside the train_step() function of the custom Model class (ClusterBooster).

Providing that I don't have access to the number of epochs inside the train_step() function of ClusterBooster, I created a custom training loop without a Model class, where I could use plain python code (which is computed eagerly).