Why is my convolutional Neural Network stuck in a local minimum?

3.6k Views Asked by At

I've heard that machine learning algorithms rarely get stuck in local minima, but my CNN (in tensorflow) is predicting a constant output for all values and I am using a mean square error loss function so I think this must be a local minima given the properties of MSE. I have a network with 2 convolution layers and 1 dense layer (+1 dense output layer for regression) with 24, 32 and 100 neurons respectively, but I've tried changing the numbers of layers/neurons and the issue is not solved. I have relu activations for the hidden layers and absolute value on the output layer (I know this is uncommon but it converges faster to a lower MSE than the softplus function which still has the same problem and I need strictly positive outputs). I also have a 50% dropout layer between the dense and output layers and a pooling layer between the 2 convolutions. I have also tried changing the learning rate (currently 0.0001) and batch size. I am using an Adam Optimizer.

I have seen it suggested to change/add bias but I'm not sure how to initialize it in tf.layers.conv2d/tf.layers.dense (for which I have bias=True), and I can't see any options for bias with tf.nn.conv2d which I used for my first layer so I could initialize the kernel easily.

Any suggestions would be really appreciated, thanks.

Here's the section of my code with the network:

filter_shape = [3,3,12,24]
def nn_model(input):
    weights = tf.Variable(tf.truncated_normal(filter_shape, mean=10, 
stddev=3), name='weights')    
    conv1 = tf.nn.conv2d(input, weights, [1,1,1,1], padding='SAME')
    conv2 = tf.layers.conv2d(inputs=conv1, filters=32, kernel_size=[3,3], 
padding="same", activation=tf.nn.relu)
    pool = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2, 
padding='same')
    flat = tf.reshape(pool, [-1, 32*3*3])
    dense_3 = tf.layers.dense(flat, neurons, activation = tf.nn.relu)
    dropout_2 = tf.layers.dropout(dense_3, rate = rate)
    prediction = tf.layers.dense(dropout_2, 1, activation=tf.nn.softplus)    
    return prediction

My inputs are 5x5 images with 12 channels of environmental data and I have ~100,000 training samples. My current MSE is ~90 on values of ~25.

1

There are 1 best solutions below

3
On

I used to face the same problem with bigger images. I incresed the number of convolution layers to solve it. Maybe you should try to add even more convolution layers.

In my opinion, the problem comes from the fact you don't have enough parameters and thus get stuck in a local minimum. If you increase your number of parameters, it can help the updates to converge to a better minimum.

Also, I can't see the optimizer you are using. Is it Adam ? You can try to start with a bigger learning-rate and use a decay to decrease it epoch after epoch.