I'm doing an experiment related to CNN.
What I want to implement is the gradient descent with learning rate decay and the update rule from AlexNet.
The algorithm that I want to implements is below (captured picture from alexnet paper):
I think I did learning rate decay correctly and the code is below (I checked learning rate decay according to global_step correctly):
learning_rate = tf.train.exponential_decay(starter_learning_rate, global_step,
100000, 0.1, staircase=True)
Next, I should implement the update rule ( weight decay of 0.005 & momentum of 0.9 ) I think I did the momentum correctly but could not find a way to implement weight decay, the code is also below:
cross_entropy = tf.reduce_mean(
tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits = fc8))
train_step = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(cross_entropy,global_step=global_step)
Am I doing correctly "learning rate decay" and "momentum"? and How can I implement "weight decay of 0.005" correctly?
I used tf.layers.conv2d as a convolutional layer so that weights and biases are included in there. The code is below:
conv5 = tf.layers.conv2d(
inputs=conv4,
filters=256,
strides=1,
kernel_size=[3, 3],
kernel_initializer= tf.constant_initializer(pre_trained_model["conv5"][0]),
bias_initializer = tf.constant_initializer(pre_trained_model["conv5"][1]),
padding="SAME",
activation=tf.nn.relu,name='conv5')