SimCLR does not learn representations

436 Views Asked by At

So I'm trying to train a SimCLR network with a custom lightweight ConvNet backbone (tried it with a ResNet already) on a dataset containing first 5 letters of the alphabet out of which two are randomly selected and placed in random positions in the image. I am unsure of what augmentations to use in such a scenario, so I only use Image translation to provide some degree of difference between the augmented samples.

This sounds like an extremely trivial task, but it performs VERY poorly on a multi-label classifier built on top of the frozen pretrained network. I'm quite certain this is because of how poor the quality of representations learnt are rather than the linear classifier. This works well on a supervised classifier, obviously.

Variations I've tried till now:

  1. Made the dataset single letter, random position (multi-class) and it performed very well.
  2. Made the dataset with random letters, but same center position, and it performed well. Same augmentation mentioned above for these as well.

Sample image from dataset (Label here is [1, 1, 0, 0, 0] for the letters that are present)

Example

Can someone please help me figure out how to make this work?

1

There are 1 best solutions below

0
On

This is not the first time I hear of someone trying SimCLR and getting horrible results...

I have some questions:

  • Have you tried other losses for the contrastive pretraining part? How about triplet loss?
  • Are the representations normalised?
  • Are you getting good results with contrastive pretraining in the variations that you mention?
  • Are you getting good supervised classification results with both models (Resnet and custom convnet)?
  • Have you tried to visualise the features learned by the model in the conv layers?
  • You could also try to visualise the feature maps with forward hooks and see what is the network "looking at".