I am building a logo detection system using YOLOv4-tiny. I've built a custom synthetic dataset where I've drawn transparent logos on top of gameplay screens. The logos, before drawn on the background, were augmented (blurring, perspective transformation, resizing, rotating, etc). Here are some examples of the sort of data that I have.
I am pasting between 1-5 logos (randomly) on defined positions on the background image for 1,700 images. Each logo is a class and I have 16 classes. When I run this with YOLOv4-tiny, this is an example of my output. My loss fluctuates and I don't understand why.
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.281087), count: 1, class_loss = 0.738789, iou_loss = 15.317201, total_loss = 16.055990
total_bbox = 485600, rewritten_bbox = 1.175453 %
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.547854), count: 5, class_loss = 4.434289, iou_loss = 2.494116, total_loss = 6.928406
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.767385), count: 1, class_loss = 1.036910, iou_loss = 0.562877, total_loss = 1.599788
total_bbox = 485606, rewritten_bbox = 1.175439 %
(next mAP calculation at 3147 iterations)
Last accuracy [email protected] = 3.82 %, best = 3.82 %
3038: 2.917786, 2.895705 avg loss, 0.002610 rate, 2.421050 seconds, 145824 images, 42.306447 hours left
Loaded: 1.968205 seconds - performance bottleneck on CPU or Disk HDD/SSD
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.602211), count: 3, class_loss = 2.421113, iou_loss = 0.208666, total_loss = 2.629779
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.000000), count: 1, class_loss = 0.000067, iou_loss = 0.000000, total_loss = 0.000067
total_bbox = 485609, rewritten_bbox = 1.175431 %
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.583863), count: 4, class_loss = 3.290887, iou_loss = 0.876662, total_loss = 4.167549
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.383844), count: 1, class_loss = 0.823941, iou_loss = 8.962539, total_loss = 9.786480
total_bbox = 485614, rewritten_bbox = 1.175419 %
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.566122), count: 11, class_loss = 9.910147, iou_loss = 1.705420, total_loss = 11.615566
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.000000), count: 1, class_loss = 0.000120, iou_loss = 0.000000, total_loss = 0.000120
total_bbox = 485625, rewritten_bbox = 1.175393 %
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 30 Avg (IOU: 0.582394), count: 5, class_loss = 3.849459, iou_loss = 0.561072, total_loss = 4.410531
v3 (iou loss, Normalizer: (iou: 0.07, obj: 1.00, cls: 1.00) Region 37 Avg (IOU: 0.365837), count: 1, class_loss = 0.918538, iou_loss = 6.407685, total_loss = 7.326223
My question is, what is a takeaway from this? I haven't even tested the model against a test set to see how well it is performing. Do I need to improve the augmentation of the logos? How can I make sense of this output?
Update
This is the performance of the model. It correctly detected what logo it was but totally messed up on the bounding box.