Why is YOLO training loss not decreasing significantly & mean IoU not increasing?

1.6k Views Asked by At

I am trying to implement Yolo (the paper did not mention it as v1 but it's the first paper so I think it's v1) from this paper. I am implementing on Google Colab using Keras and Tensorflow 1.x.

TLDR; Results:

Starting Epochs:

Iteration,  0
Train on 1800 samples, validate on 450 samples
Epoch 1/32
1800/1800 [==============================] - 13s 7ms/step - loss: 541.8767 - mean_iou_metric: 0.0040 - val_loss: 361.9846 - val_mean_iou_metric: 0.0043
Epoch 2/32
1800/1800 [==============================] - 11s 6ms/step - loss: 378.6184 - mean_iou_metric: 0.0042 - val_loss: 330.4124 - val_mean_iou_metric: 0.0043

Ending Epochs (a total of 320 epochs, 32 per loop):

Epoch 31/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.4603 - mean_iou_metric: 0.0038 - val_loss: 350.3984 - val_mean_iou_metric: 0.0042
Epoch 32/32
1800/1800 [==============================] - 11s 6ms/step - loss: 240.2410 - mean_iou_metric: 0.0038 - val_loss: 349.5258 - val_mean_iou_metric: 0.0042

Issue: After even so many epochs, the loss has decreased very less (but decreased which is okay), but the mean_iou has not increased which is a cause of concern for me. Why is this happenining? At this stage I am not able to debug as to why iou is not increasing despite loss decreasing. And are such kind of loss values natural? It doesn't feel natural to me, hence some suggestions would be appreciated. I suspect I am doing something wrong in the loss function implementation.

Dataset: I am using a dataset of size 2500*256*256*3 consisting of white background and 3 types of colored shapes-rectangle, triangle and circle. There can be no shapes or 3 shapes at max. The shapes may all be same or of the different types as mentioned. It can be generated using python file from here. Example image:

dataset image

Parameter specifications: According to paper, I set S (so SxS grids), B (number of bounding boxes per grid) and C (number of classes) as follows:

N=len(labels)
print("No of images, ",N)
# No of bounding boxes per grid, B
B=1
# No of grids,S*S
S=16
# No. of classes, C
C=3 #3 for 3 types of shapes
# Output=SxSx(5B+C)
I_S=256 # Image dimension I_SxI_S
classes={'circle':0,'triangle':1,'rectangle':2}
lenClasses=len(classes)
#print(lenClasses)
norm_const=I_S/S

Note that I have defined a constant called norm_const that will be used to normalize ground truth center coordinates, height and width which are present in range 0-255.

How I am normalizing the image, center coordinates, height & width: Ground truth is a JSON structure with x1,x2,y1,y2 coordinates of bounding box of each shape in one image. I am computing center, height and width values and normalizing them. My final vector for each grid is [1,cx,cy,h,w,0,0,1] where last three values are classification scores, first value is confidence score and the rest are coordinates. If a grid doesn't have a center then its vector is automatically [0,0,0,0,0,0,0,0] (from numpy zeros definition).

x1='x1'
x2='x2'
y1='y1'
y2='y2'
boxes='boxes'
classVar='class'
for box in labels[i][boxes]:
    cx1,cx2=box[x1],box[x2]
    cy1,cy2=box[y1],box[y2]
    # Design one hot vector, [0]*3 gives [0,0,0]
    onehot=[0]*lenClasses
    onehot[classes[box[classVar]]]=1 # box[classVar] gives string 'className', which is fed into classes dictionary, which gives position (0/1/2)
    # Centers and h,w
    cx,cy,h,w=(cx1+cx2)/2.0,(cy1+cy2)/2.0,np.abs(cy2-cy1),np.abs(cx2-cx1)
    # Now, to compute where in a grid of SxS, the center would lie
    posx,posy=int((cx*S)/I_S),int((cy*S)/I_S)
    
    # NORMALIZE h,w
    h=h/I_S # I_S is image size
    w=w/I_S
    # NORMALIZE cx,cy
    cx=(cx-posx*norm_const)/norm_const
    cy=(cy-posy*norm_const)/norm_const
    # RESTORING cx,cy from normalized values
    #cxo=(cx+posx)*norm_const
    #cyo=(cy+posy)*norm_const
    gt[image_number,posx,posy]=1,cx,cy,h,w,*onehot # gt is defined as np.zeros((N,S,S,5+lenClasses))

Loss function implementation:

from keras import backend as K
coord=10  # We want loss from coordinates to have more weightage
noobj=0.1
# A simple loss
# How output is arranged-> B confidence values, 4B normalized coordinates, one hot vector
def yolo_loss_trial(y_true,y_pred):
  localizationLoss=0.0
  classificationLoss=0.0
  confidenceLoss=0.0
  batchsize_as_tensor_obj=tf.shape(y_pred)[0]
  object_presence=tf.reshape(y_true[:,:,:,0],shape=[batchsize_as_tensor_obj,S,S,1]) #from batchx16x16 to batchx16x16x1

  # CLASSIFICATION LOSS
  # batch x S x S x 1 * batch x S x S x 3, allowed
  classificationLoss=K.sum(K.square((object_presence*y_true[:,:,:,5:8])-y_pred[:,:,:,5*B:5*B+C]))

  # LOCALIZATION LOSS
  for i in range(B,5*B,4):
    # batch x S x S x 1 * batch x S x S x 2, allowed
    localizationLoss=localizationLoss+(K.sum(K.square((object_presence*y_true[:,:,:,1:3])-y_pred[:,:,:,i:i+2])))
    localizationLoss=localizationLoss+(K.sum(K.square(K.sqrt(object_presence*y_true[:,:,:,3:5])-K.sqrt(y_pred[:,:,:,i+2:i+4]))))
  localizationLoss=localizationLoss*coord

  # CONFIDENCE LOSS
  for i in range(0,B):
    y_iou=return_iou_tensor(y_true,y_pred,i) #  batch x S x S
    # take 1 from 1,noobj and noobj from noobj,0
    object_presence_modified=tf.math.maximum(y_true[:,:,:,0],noobj) # batch x S x S
    confidenceLoss=confidenceLoss+(K.sum(K.square((object_presence_modified*y_true[:,:,:,0]*y_iou)-y_pred[:,:,:,i]))) # batch x S x S ops

  return localizationLoss+classificationLoss+confidenceLoss

IoU (Intersection over Union) implementation for confidence score: Inspired from here, the IoU implementation is as follows:

import tensorflow as tf
from keras import backend as K

# Creating INDICES tensor to add to normalized centers
indices=np.reshape(np.arange(S),[1,S]) # consists of 0 to S-1, i.e., indices.
indices_tensor_Y=tf.constant(indices,dtype=float) # 1x S
indices_tensor_Y=tf.repeat(indices_tensor_Y,repeats=[S],axis=0) # S x S, 0123S;0123S;0123S S rows
indices_tensor_X=tf.transpose(indices_tensor_Y) # S x S
indices_tensor_Y=tf.reshape(indices_tensor_Y,[1,S,S]) # 1 x S x S
indices_tensor_X=tf.reshape(indices_tensor_X,[1,S,S]) # 1 x S x S
#indices_tensor=tf.repeat(indices_tensor,repeats=[batch_tensor],axis=0) # batch x S x S
# repeat() will repeat axis-0 (SxS), batch_tensor number of times along the channel

# IOU Calculation between two bounding boxes
def return_iou_tensor(box_true,box_pred,i):
  '''
  box_true=batch x S x S x 8
  box_pred=batch x S x S x (5B+C)
  '''
  
  # Restored gt
  cx_restored_gt_tensor=norm_const*(indices_tensor_X+box_true[:,:,:,2]) # 1 x S x S + batch x S x S = batch x S x S
  cy_restored_gt_tensor=norm_const*(indices_tensor_Y+box_true[:,:,:,3]) # 1 x S x S + batch x S x S = batch x S x S
  h_restored_gt_tensor=box_true[:,:,:,4]*I_S # batch x S x S
  w_restored_gt_tensor=box_true[:,:,:,5]*I_S # batch x S x S

  # Restored predicted
  cx_restored_pred_tensor=norm_const*(indices_tensor_X+box_pred[:,:,:,B+4*i]) # 1 x S x S + batch x S x S = batch x S x S
  cx_restored_pred_tensor=tf.math.maximum(cx_restored_pred_tensor,0)# To remove negative values
  cy_restored_pred_tensor=norm_const*(indices_tensor_Y+box_pred[:,:,:,B+1+4*i]) # 1 x S x S + batch x S x S = batch x S x S
  cy_restored_pred_tensor=tf.math.maximum(cy_restored_pred_tensor,0)# To remove negative values
  h_restored_pred_tensor=box_pred[:,:,:,B+2+4*i]*I_S # batch x S x S
  h_restored_pred_tensor=tf.math.maximum(h_restored_pred_tensor,0)# To remove negative values
  w_restored_pred_tensor=box_pred[:,:,:,B+3+4*i]*I_S # batch x S x S
  w_restored_pred_tensor=tf.math.maximum(w_restored_pred_tensor,0)# To remove negative values

  # min max of intersection box all, batch x S x S
  x_min_tensor=tf.math.maximum(tf.math.maximum(cx_restored_gt_tensor-w_restored_gt_tensor/2,0),tf.math.maximum(cx_restored_pred_tensor-w_restored_pred_tensor/2,0))
  y_min_tensor=tf.math.maximum(tf.math.maximum(cy_restored_gt_tensor-h_restored_gt_tensor/2,0),tf.math.maximum(cy_restored_pred_tensor-h_restored_pred_tensor/2,0))
  x_max_tensor=tf.math.minimum(cx_restored_gt_tensor+w_restored_gt_tensor/2,cx_restored_pred_tensor+w_restored_pred_tensor/2)
  y_max_tensor=tf.math.minimum(cy_restored_gt_tensor+h_restored_gt_tensor/2,cy_restored_pred_tensor+h_restored_pred_tensor/2)
  w_intersection=tf.math.maximum(x_max_tensor-x_min_tensor,0)
  h_intersection=tf.math.maximum(y_max_tensor-y_min_tensor,0)
  intersection_tensor=w_intersection*h_intersection # batch x S x S
  union_tensor=(w_restored_gt_tensor*h_restored_gt_tensor)+(w_restored_pred_tensor*h_restored_pred_tensor) # batch x S x S
  smooth=1 # We are using smooth because we dont want division by 0
  return (intersection_tensor+smooth)/(union_tensor+smooth) #batch x S x S

And the mean_iou_metric to observe during training:

def mean_iou_metric(y_true,y_pred):
  mean_iou=0.0
  for i in range(0,B):
    iou_tensor=y_true[:,:,:,0]*return_iou_tensor(y_true,y_pred,i)
    mean_iou=mean_iou+K.mean(iou_tensor)
  return mean_iou/B

Model (based on faster yolo):

TLDR; image:

model image

from keras import backend as K
import tensorflow as tf

def custom_activation(x):
  # LEAKY RELU
  isPositive=K.cast(K.greater(x,0),K.floatx()) # U HAVE TO CAST THE OUTPUT OF COMAPARISION TO FLOAT, BOOL NOT ACCEPTED
  # OUTPUT OF THIS FUNCTION IS A TENSOR
  return (isPositive*x)+(1-isPositive)*0.1*x

###############  BLOCK 1 ##############################
input_=Input(shape=(256,256,3),name='input')
#zeropad1=ZeroPadding2D(padding=(3,3))(input_) # PADDING MAKES 448->3+448+3, it is required to bring output to 112

convLayer1=Conv2D(64,(7,7),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer1')(input_)
maxpoolLayer1=MaxPooling2D(pool_size=(2,2),name='max_pool_layer1')(convLayer1)
#zeropad2=ZeroPadding2D(padding=(1,1))(maxpoolLayer1)
########################################################

###############  BLOCK 2 ##############################
convLayer2=Conv2D(192,(3,3),padding='valid',activation=custom_activation,name='conv_layer2')(maxpoolLayer1)
maxpoolLayer2=MaxPooling2D(pool_size=(2,2),name='max_pool_layer2')(convLayer2)
#zeropad3=ZeroPadding2D(padding=(2,2))(maxpoolLayer2)
########################################################

###############  BLOCK 3 ##############################
convLayer3=Conv2D(128,(1,1),padding='valid',activation=custom_activation,name='conv_layer3')(maxpoolLayer2)
convLayer4=Conv2D(256,(3,3),padding='valid',activation=custom_activation,name='conv_layer4')(convLayer3)
#convLayer5=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer5')(convLayer4)
#convLayer6=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer6')(convLayer5)
maxpoolLayer3=MaxPooling2D(pool_size=(2,2),name='max_pool_layer3')(convLayer4)
#zeropad4=ZeroPadding2D(padding=(5,5))(maxpoolLayer3)
########################################################

###############  BLOCK 4 ##############################
convLayer7=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer7')(maxpoolLayer3)
convLayer8=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer8')(convLayer7)
#convLayer9=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer9')(convLayer8)
#convLayer10=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer10')(convLayer9)
#convLayer11=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer11')(convLayer10)
#convLayer12=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer12')(convLayer11)
#convLayer13=Conv2D(256,(1,1),padding='valid',activation=custom_activation,name='conv_layer13')(convLayer12)
#convLayer14=Conv2D(512,(3,3),padding='valid',activation=custom_activation,name='conv_layer14')(convLayer13)
#convLayer15=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer15')(convLayer14)
#convLayer16=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer16')(convLayer15)
maxpoolLayer4=MaxPooling2D(pool_size=(2,2),name='max_pool_layer4')(convLayer8)
#zeropad5=ZeroPadding2D(padding=(4,4))(maxpoolLayer4)
###########################################################

###############  BLOCK 5 ##################################
convLayer17=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer17')(maxpoolLayer4)
convLayer18=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer18')(convLayer17)
#convLayer19=Conv2D(512,(1,1),padding='valid',activation=custom_activation,name='conv_layer19')(convLayer18)
#convLayer20=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer20')(convLayer19)
#convLayer21=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer21')(convLayer20)
#convLayer22=Conv2D(1024,(3,3),strides=(2,2),padding='valid',activation=custom_activation,name='conv_layer22')(convLayer21)
#zeropad6=ZeroPadding2D(padding=(2,2))(convLayer18)
#############################################################

################ BLOCK 6 ####################################
convLayer23=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer23')(convLayer18)
#convLayer24=Conv2D(1024,(3,3),padding='valid',activation=custom_activation,name='conv_layer24')(convLayer23)
flattenedLayer1=Flatten()(convLayer23) # Flatten just converts 3d matrix to 1d so that it can be connected to a next Dense Layer
###############################################################

################ BLOCK 7 #########################################
denseLayer1=Dense(units=4096,activation=custom_activation)(flattenedLayer1)
##################################################################

################ BLOCK 8 #########################################
denseLayer2=Dense(units=S*S*(5*B+C),activation='linear')(denseLayer1)
output_=Reshape((S,S,(5*B+C)))(denseLayer2) # Reshapes the 1D to 3D
##################################################################

fast_model=Model(inputs=input_,outputs=output_)
fast_model.summary()
from keras.utils import plot_model
plot_model(fast_model,to_file='unet.png',show_shapes=True)

Latest training parameters:

model=fast_model # SELECT MODEL
model.save_weights('weights.hdf5')
model.compile(optimizer=Adam(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
#model.compile(optimizer=SGD(learning_rate=1e-5),loss=yolo_loss_trial,metrics=[mean_iou_metric])
model.load_weights('weights.hdf5')
checkpointer = callbacks.ModelCheckpoint(filepath = 'weights.hdf5',save_best_only=True)
training_log = callbacks.TensorBoard(log_dir='./Model_logs')
reduce_lr = ReduceLROnPlateau(monitor='loss', factor=0.2,patience=3, min_lr=1e-5,mode='auto') # ADD IN CALLBACK in fit()
# patience is after how many epochs if improvement is not seen, then reduce lr, newlr=lr*factor

for i in range(0,10):
  print("Iteration, ",i)
  history=model.fit(X_train,Y_train,validation_data=(X_val,Y_val),batch_size=16,epochs=32,callbacks=[training_log,checkpointer,reduce_lr],shuffle=True)
  # SAVE MODEL TO DRIVE
  !cp '/content/weights.hdf5' 'gdrive/My Drive/Colab Notebooks/Colab Datasets/Breast_Cancer_HNS/Images'
  # CONFIRM EXECUTION TIMESTAMP
  from datetime import datetime
  import pytz
  tz = pytz.timezone('Asia/Calcutta')
  berlin_now = datetime.now(tz)
  dt_string = berlin_now.strftime("%d/%m/%Y %H:%M:%S")
  print(dt_string)
  #from google.colab import output
  #output.eval_js('new Audio("https://ssl.gstatic.com/dictionary/static/sounds/20180430/complete--_us_1.mp3").play()')
# SAVE WHOLE MODEL TO LOCAL/COLAB DRIVE
model.save("FastYolo") #Saves weights also according to official docs
# SAVE MODEL TO GOOGLE DRIVE
!cp '/content/FastYolo' '/content/gdrive/My Drive/Colab Notebooks/Colab Datasets/Shape_Detection_YOLO/None2500NoOverlap'
0

There are 0 best solutions below