Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

Question

Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

116 Views Asked by Looters At 17 August 2025 at 21:23

I am trying to implement below architecture and not sure in applying gradient tape properly.

In the above architecture we can see, outputs taken from multiple layers in the blue boxes. Each blue box is termed as loss branch in the paper which contains two losses namely cross entropy and l2 loss. I wrote architecture in tensorflow 2 and using gradient tape for custom training purpose. One thing I am not sure is how should I update the losses using gradient tape.

I have two queries,

How am I supposed to use gradient tape for multiple losses in this scenario. I am interested in seeing code!
For instance, consider the 3rd blue box(3rd loss branch) in the above image, where we will take inputs from conv 13 layer and get two outputs, one for classification and other for regression. So after computing the losses how I am supposed to update the weights, should I update all the layers above(from conv 1 to conv 13) or should I only update the layers weights which fetched me conv 13 (conv 11, 12 and 13).

I am also attaching a link where I posted a question yesterday in detail.

Below is the snippet which I have tried for gradient descent. Please correct me if I am wrong.

        images = batch.data[0]
        images = (images - 127.5) / 127.5

        targets = batch.label

        with tensorflow.GradientTape() as tape:
            outputs = self.net(images)
            loss = self.loss_criterion(outputs, targets)
        
        self.scheduler(i, self.optimizer)
        grads = tape.gradient(loss, self.net.trainable_variables)
        self.optimizer.apply_gradients(zip(grads, self.net.trainable_variables))

Below is the code for custom loss function which is used as loss_criterion above.

    losses = []
    for i in range(self.num_output_scales):
        pred_score = outputs[i * 2]
        pred_bbox = outputs[i * 2 + 1]
        gt_mask = targets[i * 2]
        gt_label = targets[i * 2 + 1]

        pred_score_softmax = tensorflow.nn.softmax(pred_score, axis=1)
        loss_mask = tensorflow.ones(pred_score_softmax.shape, tensorflow.float32)

        if self.hnm_ratio > 0:
            pos_flag = (gt_label[:, 0, :, :] > 0.5)
            pos_num = tensorflow.math.reduce_sum(tensorflow.cast(pos_flag, dtype=tensorflow.float32)) 
        if pos_num > 0:
            neg_flag = (gt_label[:, 1, :, :] > 0.5)
            neg_num = tensorflow.math.reduce_sum(tensorflow.cast(neg_flag, dtype=tensorflow.float32))
            neg_num_selected = min(int(self.hnm_ratio * pos_num), int(neg_num))
            neg_prob = tensorflow.where(neg_flag, pred_score_softmax[:, 1, :, :], \
            tensorflow.zeros_like(pred_score_softmax[:, 1, :, :]))
            neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
            prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
            neg_grad_flag = (neg_prob <= prob_threshold)
            loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1), 
                tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)
        else:
            neg_choice_ratio = 0.1
            neg_num_selected = int(tensorflow.cast(tensorflow.size(pred_score_softmax[:, 1, :, :]), dtype=tensorflow.float32) * 0.1)
            neg_prob = pred_score_softmax[:, 1, :, :]
            neg_prob_sort = tensorflow.sort(tensorflow.reshape(neg_prob, shape=(1, -1)), direction='ASCENDING')
            prob_threshold = neg_prob_sort[0][int(neg_num_selected)]
            neg_grad_flag = (neg_prob <= prob_threshold)                
            loss_mask = tensorflow.concat([tensorflow.expand_dims(pos_flag, axis=1), 
                tensorflow.expand_dims(neg_grad_flag, axis=1)], axis=1)

        pred_score_softmax_masked = tensorflow.where(loss_mask, pred_score_softmax, 
            tensorflow.zeros_like(pred_score_softmax, dtype=tensorflow.float32))
        pred_score_log = tensorflow.math.log(pred_score_softmax_masked)
        score_cross_entropy = - tensorflow.where(loss_mask, gt_label[:, :2, :, :], 
            tensorflow.zeros_like(gt_label[:, :2, :, :], dtype=tensorflow.float32)) * pred_score_log
        loss_score = tensorflow.math.reduce_sum(score_cross_entropy) / 
        tensorflow.cast(tensorflow.size(score_cross_entropy), tensorflow.float32)

        mask_bbox = gt_mask[:, 2:6, :, :]
        predict_bbox = pred_bbox * mask_bbox
        label_bbox = gt_label[:, 2:6, :, :] * mask_bbox
        # l2 loss of boxes
        # loss_bbox = tensorflow.math.reduce_sum(tensorflow.nn.l2_loss((label_bbox - predict_bbox)) ** 2) / 2
        loss_bbox = mse(label_bbox, predict_bbox) / tensorflow.math.reduce_sum(mask_bbox)

        # Adding only losses relevant to a branch and sending them for back prop
        losses.append(loss_score + loss_bbox)
        # losses.append(loss_bbox)
    
        # Adding all losses and sending to back prop Approach 1
        # loss_cls += loss_score
        # loss_reg += loss_bbox
        # loss_branch.append(loss_score)
        # loss_branch.append(loss_bbox)
        # loss = loss_cls + loss_reg

    return losses

I am not getting any error but my losses aren't minimizing. Here is the log for my training.

Someone please help me in fixing this.

Original Q&A

Computing gradients for outputs taken from intermediate layers and updating weights using optimizer

There are 0 best solutions below

Related Questions in COMPUTER-VISION

Related Questions in TENSORFLOW2.0

Related Questions in TF.KERAS

Related Questions in GRADIENTTAPE

Trending Questions

Popular # Hahtags

Popular Questions