Can't seem to implement L2 regularization correctly in Python — low accuracy scores

633 Views Asked by At

I'm trying to add regularization to my Mnist digits NN classifier, which I've created using numpy and vanilla Python. I'm currently using Sigmoid activations with Cross Entropy cost function.

Without using the regularizer, I get 97% accuracy.

However, once I add the regularizer, I"m only getting about 11% despite, playing around with different hyper parameters. I've tried different learning rates:

.001, .1, 1

and different lambd values such as:

.5, .8, 1.0, 2.0 etc.

I can't seem to figure out what mistake I'm making. I feel like I'm missing a step maybe?

The only changes I've made are to the derivatives of the weights. I've implemented the gradients as follows:

def calculate_gradients(self,x, y, lambd):


        '''calculate all gradients with respect to
        cost. Here our cost function is cross_entropy

        last_layer_z_error = dC/dZ  (z is logit)
        All weight gradients also include regularization gradients

         x.shape[0]  = len of sample size

        '''



##### First we calculate the output layer gradients #########

        gradients, activations, zs = self.gather_backprop_data(x,y)

        #gradient of cost with respect to  Z of last layer
        last_layer_z_error = ((activations[-1] - y)) 



        #updating the weight_derivatives of final layer
        gradients['w'+ str(self.num_layers -1)] = np.dot(activations[-2].T,last_layer_z_error)/x.shape[0] + (lambd/x.shape[0])*(self.parameters['w'+ str(self.num_layers -1)])

        gradients['b'+ str(self.num_layers -1)] = np.mean(last_layer_z_error, axis =0)
        gradients['b'+ str(self.num_layers -1)] = np.expand_dims(gradients['b'+ str(self.num_layers -1)],0)


###HIDDEN LAYER GRADIENTS###

        z_previous_layer = last_layer_z_error



        for i in reversed(range(1,self.num_layers -1)):
            z_previous_layer =np.dot(z_previous_layer,self.parameters['w'+ str(i+1)].T, )*\
                                 (sigmoid_derivative(zs[i-1]))

            gradients['w'+str(i)] = np.dot((activations[i-1].T),z_previous_layer)/x.shape[0] + (lambd/x.shape[0])*(self.parameters['w'+str(i)])
            gradients['b'+str(i)] = np.mean(z_previous_layer, axis =0) 
            gradients['b'+str(i)] = np.expand_dims(gradients['b'+str(i)],0)


        return gradients

The entire code can be found here:

I've uploaded the entire notebook to Github if needed:

https://github.com/moondra2017/Neural-Networks-from-scratch/blob/master/Neural%20Network%20from%20scratch-Testing%20expanded%20Mnist-Sigmoid%20with%20cross-entroupy-with%20L2%20regularization.ipynb

0

There are 0 best solutions below