The following is a portion of code I use for designing policy gradient algo. in tensorflow:
self.activation = tf.contrib.layers.fully_connected(inputs= state,num_outputs =\
num_actions,activation_fn=tf.nn.relu6,weights_initializer=tf.contrib.layers.xavier_initializer(),\
biases_initializer=tf.random_normal_initializer(mean=1.0,stddev=1.0),trainable=True)
action_prob = tf.nn.softmax(activation)
log_p = tf.log(tf.reduce_sum(tf.multiply(action_prob,action),axis=1))
tvars = tf.trainable_variables()
policy_gradients = tf.gradients(ys= log_p,xs = tvars)
The tensor log_p evaluates to something fine. However, the policy_gradients are all zero. Am I missing something?
Gradients can be 0 when log(x) = 0 and this will occur when x = 1 or x = 0 (not sure but probably for log (0) tensorflow produces nan and gradients are 0).
You can try to clip value passed to logarithm: