I'm trying to use TensorFlow's @tf.custom_gradient
functionality to assign a custom gradient to a function with multiple inputs. I can put together a working setup for only one input, but not for two or more.
I've based my code on TensorFlow's custom_gradient documentation, which works just fine for one input, as in this example:
import tensorflow as tf
import os
# Suppress Tensorflow startup info
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
# Custom gradient decorator on a function,
# as described in documentation
@tf.custom_gradient
def my_identity(x):
# The custom gradient
def grad(dy):
return dy
# Return the result AND the gradient
return tf.identity(x), grad
# Make a variable, run it through the custom op
x = tf.get_variable('x', initializer=1.)
y = my_identity(x)
# Calculate loss, make an optimizer, train the variable
loss = tf.abs(y)
opt = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train = opt.minimize(loss)
# Start a TensorFlow session, initialize variables, train
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
sess.run(train)
This example runs silently, then closes. No issues, no errors. The variable optimizes as expected. However, in my application, I need to do such a calculation with multiple inputs, so something of this form:
@tf.custom_gradient
def my_identity(x, z):
def grad(dy):
return dy
return tf.identity(x*z), grad
Running this in place of the example (and adding another variable input to the call of my_identify
) results in the following error output. Best as I can tell, the last parts of the error are from the dynamic generation of the op -- the information format matches the C++ formatting required in the op establishment (though that's about all I know about it).
Traceback (most recent call last):
File "testing.py", line 27, in <module>
train = opt.minimize(loss)
File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 400, in minimize
grad_loss=grad_loss)
File "/usr/lib/python3/dist-packages/tensorflow/python/training/optimizer.py", line 519, in compute_gradients
colocate_gradients_with_ops=colocate_gradients_with_ops)
File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 630, in gradients
gate_gradients, aggregation_method, stop_gradients)
File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 821, in _GradientsHelper
_VerifyGeneratedGradients(in_grads, op)
File "/usr/lib/python3/dist-packages/tensorflow/python/ops/gradients_impl.py", line 323, in _VerifyGeneratedGradients
"inputs %d" % (len(grads), op.node_def, len(op.inputs)))
ValueError: Num gradients 2 generated for op name: "IdentityN"
op: "IdentityN"
input: "Identity"
input: "x/read"
input: "y/read"
attr {
key: "T"
value {
list {
type: DT_FLOAT
type: DT_FLOAT
type: DT_FLOAT
}
}
}
attr {
key: "_gradient_op_type"
value {
s: "CustomGradient-9"
}
}
do not match num inputs 3
Based on other custom gradient options, I surmised that the issue was a lack of supplied gradient for the second input argument. So, I changed my function to this:
@tf.custom_gradient
def my_identity(x, z):
def grad(dy):
return dy
return tf.identity(x*z), grad, grad
This results in the following more familiar error:
Traceback (most recent call last):
File "testing.py", line 22, in <module>
y = my_identity(x, z)
File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 111, in decorated
return _graph_mode_decorator(f, *args, **kwargs)
File "/usr/lib/python3/dist-packages/tensorflow/python/ops/custom_gradient.py", line 132, in _graph_mode_decorator
result, grad_fn = f(*args)
ValueError: too many values to unpack (expected 2)
The @custom_gradient
decorator is only identifying the last returned element as a gradient. So, I tried putting the two gradients into a tuple as (grad, grad)
such that there would only be "two" outputs for the function. TensorFlow rejected this too, this time because it can't call a tuple like it would a Tensor -- entirely reasonable, in hindsight.
I've fussed around with the example some more, but to no avail. No matter what I try, I can't get the custom-defined gradient to deal with multiple inputs. I'm hoping that somebody with more knowledge than I regarding custom ops and gradients will have a better idea on this -- thanks in advance for the help!
If we use multiple variables as input, the number of gradients return from "grad" function should be equals to number of input variables, though we maybe don't care about some of them.
For example:
Note that the second output of "my_multiple" is a function, not a gradient tensor.