Custom Gradients in Tensor Flow - Unable to understand this example

132 Views Asked by At

I keep thinking that I am about to understand custom gradients but then I test it out this example and I can just not figure out what is going on. I am hoping somebody can walk me through what exactly is happening below. I think this essentially is down to me not understanding specifically what "dy" is in the backward function.

v = tf.Variable(2.0)
with tf.GradientTape() as t:
    x = v*v 
    output = x**2
print(t.gradient(output, v)) 
**tf.Tensor(32.0, shape=(), dtype=float32)** 

Everything is good here and the gradient is as one would expect. I then test out this example using custom gradients which (given my understanding) could not possibly affect the gradient given I have put in this massive threshold in clip_by_norm

@tf.custom_gradient
def clip_gradients2(y):
    def backward(dy):
        return tf.clip_by_norm(dy, 20000000000000000000000000)
    return y**2, backward

v = tf.Variable(2.0) 
with tf.GradientTape() as t: 
    x=v*v
    
    output = clip_gradients2(x) 


print(t.gradient(output, v))

tf.Tensor(4.0, shape=(), dtype=float32)

But it is reduced to 4, so this is somehow having an effect. How exactly is this resulting in a smaller gradient?

1

There are 1 best solutions below

1
On

When writing a custom gradient, you must define the whole derivative calculation by yourself. Without your custom gradient, we have the following derivative:

((x**2)**2)dx = (x**4)dx = 4*(x**3) = 32 when x=2

When you override your gradient calculation, you only have

(x**2)dx = 2x = 4 when x=2

You need to calculate the derivative in your function, i.e:

@tf.custom_gradient
def clip_gradients2(y):
    def backward(dy):
        dy = dy * (2*y)
        return tf.clip_by_norm(dy, 20000000000000000000000000)
    return y**2, backward

To get the desired behavior.