I'm training a DeeplabV3+ for semantic segmentation of remote sensing images. I build the model after the keras tutorial ( https://keras.io/examples/vision/deeplabv3_plus/ ) and it works fine when I use a Focal loss (predefined in Keras/tf) for the training. The masks are one-hot encoded and in the form of (batch size, height, width, channels). In the next stept I wanted to test the Dice loss (I implemented this after Correct Implementation of Dice Loss in Tensorflow / Keras ), which also seemed to work fine but led to a lower IoU. This is how I implemented the loss and trained the model:
import numpy as np
import tensorflow as tf
from tensorflow.keras import backend as K
def dice_coef(y_true, y_pred, smooth=100): # calculate the dice coeffient for one class
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
dice = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return dice
def dice_coef_loss(y_true, y_pred, smooth=100):
return -dice_coef(y_true, y_pred, smooth) # turn the dice coeffient into a loss
#compute the dice coefficent for multiple classes and calculate a simple average
def dice_coef_multilabel(y_true, y_pred, M=10, smooth=100):
dice = 0
for index in range(M): #M is the number of classes
dice += dice_coef(y_true[:,:,:,index], y_pred[:,:,:,index], smooth)
return dice/M
def dice_coef_multi_loss(y_true, y_pred, M=number_classes, smooth=100):
return -dice_coef_multilabel(y_true, y_pred, M=number_classes, smooth=100)
from Seg_Models_own import DeeplabV3Plus
DeeplabV3Plus = DeeplabV3Plus(image_size=256,image_channels = 4, num_classes= 10)
DeeplabV3Plus.compile(optimizer = Adam(learning_rate = 1e-4), loss =dice_coef_multi_loss,metrics= ['categorical_accuracy',tf.keras.metrics.MeanIoU(num_classes=10,sparse_y_true=False, sparse_y_pred=False)])
history2 = DeeplabV3Plus.fit(train_img_gen, # use a generator to get the training data
steps_per_epoch = steps_per_epoch,
epochs = 35,
verbose = 2,
validation_data = val_img_gen, # use a generator to get the validation data
validation_steps = val_steps_per_epoch,
batch_size= 64)
I'd like to combine the Focal loss with the Dice loss, since it could improve the performance of the model. I tried to combine the losses like this (and in a few similar ways):
def dice_coef_multi_loss(y_true, y_pred, M=10, smooth=100):
return -dice_coef_multilabel(y_true, y_pred, M=10, smooth=100)+ tf.keras.losses.CategoricalFocalCrossentropy(y_true, y_pred)
But unfortunately I always receive the following error:
"TypeError: Failed to convert elements of <keras.src.losses.CategoricalFocalCrossentropy object at 0x7f8ec02546d0>
to Tensor. Consider casting elements to a supported type. See https://www.tensorflow.org/api_docs/python/tf/dtypes for supported TF dtypes
."
I also tried to cast the datatype of the losses and y_true,y_test
to float32
. What am i missing here? I thought each loss returns a single number and the addition is straightforward... Do I have to implement the Focal loss (similar to the Dice loss) in my script too? I'm thrilled to read what you think about it!
see above, where I explained the whole problem