Neural net: no dropout gives the best test score. Is that bad?

504 Views Asked by At

I took over some code from someone and my task was to reproduce the same model and performance in pytorch. I was given best hyper-parameters for that model as well. After playing around with it for quite sometime, I see that if I set drop out rate to zero, then my test performance is the best. it stops earlier, the smaller dropout is. In fact it outperforms previous model a bit. While the previous model had significant dropout rates.
The data size is about 33 million rows. And the neural net is like 4-5 layers. Total input embedding is ~1000. Though I am happy to see the performance, I am wondering if it is some kind of red flag. Because without dropout i don't really have any other regularization. And it performing good with this may mean there might be some data leakage or something? Looking for some wisdom around dropout in this context.

1

There are 1 best solutions below

0
On

Sometimes, these things happen. Once my neural-net was not working and so I was advised to add Batch Normalization layers in it, and then it worked so well. But then in another problem, Batch Normalization made my neural-net worse. This is all because of backpropagation. Sometimes, adding some layer makes the neural-net get stuck in a local-minimum, while sometimes it helps to get out of it. I am not quite sure why is this, but I think it is because of BackPropagation.

might be some data leakage or something?

The answer is no. Its just because of backpropagation.

NOTE - If you feel I am wrong anywhere in this post, then please comment it.