Tensorflow Model Underpredicts Values with Dropout

249 Views Asked by At

I am having a problem implementing dropout as a regularization method in my dense NN model. It appears that adding a dropout value above 0 just scales down the predicted value, in a way makes me think something is not being accounted for correctly after individual weights are being set to zero. I'm sure I am implementing something incorrectly, but I can't seem to figure out what.

The code to build this model was taken directly from a tensorflow page (https://www.tensorflow.org/tutorials/keras/overfit_and_underfit), but occurs no matter what architecture I use to build the model.

model = tf.keras.Sequential([
        layers.Dense(512, activation='relu', input_shape=[len(X_train[0])]),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(512, activation='relu'),
        layers.Dropout(0.5),
        layers.Dense(1)
    ])

Any help would be much appreciated!

plot generated when using a dropout rate of 0.5 in between layers

2

There are 2 best solutions below

3
On

It's perfectly normal to decrease accuracy in training set when adding Dropout. You usually do this as a trade-off to increase accuracy in unseen data (test set) and thus, generalization properties.

However, try to decrease Dropout rate to 0.10 or 0.20. You will get better results. Also, unless you are dealing with hundreds of millions of examples, try to decrease the neurons from your neural net, like from 512 to 128. With a complex neural net the backpropagation gradients won't reach an optimum level. With a neural net that is too simple, the gradients will saturate and won't learn, either.

Other point, you may want to apply pd.get_dummies to your output (Y) and increase last layer to Dense(2) and normalize input data.

0
On

I know this is an ancient post, but I had a similar issue, and this was one of the first posts that appeared when searching for the problem.

The dropout layer doesn't just randomly drop values to zero, it also scales the remaining inputs during training. So if you have a dropout rate of 0.5, the remaining values (ie. the values that aren't dropped to zero) are DOUBLED to compensate.

Generally, I think the problem you're seeing indicates that your dropout layer is too close to the input layer of your network.

I'm not an expert, but I'd suggest you might be able to take care of this in a couple of ways...

  • Remove the dropout layer(s) closest to your input layer, or
  • After training, run a set of tests and produce a multiplier that scales your data back up to where it should be, or
  • Try different methods of reducing over-fitting