tensorflow Optimizer.minimize function

846 Views Asked by At

I am confused about minimize function. Eg. a distance variable X with shape [mini_batch_size, 1],

loss_1 = tf.reduce_mean(X),

loss_2 = X

then minimize(loss_1) is mini-batch gradient descent, but how about minimize(loss_2)? element-wise updating? If so, is it exactly the same as stochastic gradient descent?

1

There are 1 best solutions below

1
On

Actually this is a very technical thing in TF. loss_2 is.... equivalent to loss_1 up to the multiplication by the constant. It is not "SGD" as other answers suggest - this is not how TF works; it is also a mini batch update, and the only difference from loss_1 is that it is multiplied by batch_size, that's it.

The crucial element is hidden in a way tf.gradients is implemented. Namely, it expects scalar function to be passed as a first argument. However, if you pass multiple values it does not throw an error, instead it just sums them. You can find this information in official TF documentation of tf.gradients:

gradients(
    ys,
    xs,
    grad_ys=None,
    name='gradients',
    colocate_gradients_with_ops=False,
    gate_gradients=False,
    aggregation_method=None
)

[...]

Constructs symbolic partial derivatives of sum of ys w.r.t. x in xs.

So in fact your loss_2 is equivalent to:

equivalent_loss_2 = tf.reduce_sum(X)

and obviously the only difference from loss_1 is not dividing by batch_size. Nothing else.

x = tf.constant([[1.,2.,3.]]) # [3x1] , [batch_size x 1]

f = 2*x
f1 = tf.reduce_mean(f)
f2 = tf.reduce_sum(f)

g = tf.gradients(f, x)
g1 = tf.gradients(f1, x)
g2 = tf.gradients(f2, x)

with tf.Session() as sess:
    print(sess.run(g))
    print(sess.run(g1))
    print(sess.run(g2))

Prints:

[[array([ 2.,  2.,  2.], dtype=float32)]]
[[array([ 0.66666669,  0.66666669,  0.66666669], dtype=float32)]]
[[array([ 2.,  2.,  2.], dtype=float32)]]

and as expected, g and g2 are the same, while g1 is just g (or g2) divided by 3 (batch_size)