Compute Hessian of lossfunction in Tensorflow

431 Views Asked by At

I would like to compute the hessian of a loss function of a neural network in Tensorflow with respect to all the parameters (or trainable variables). By modifying the example code from the Tensorflow documentation (https://www.tensorflow.org/api_docs/python/tf/GradientTape) I managed to compute the hessian w.r.t the weight matrix for the first layer (if I'm not mistaken):

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    g = tape.gradient(loss,model.trainable_variables[0]) 
    h=tape.jacobian(g,model.trainable_variables[0])

If I try to compute it w.r.t model.trainable_variables instead the tape.jacobian complains that 'list object has no attribute shape'. I instead tried to flatten the model.trainable_variables and compute it w.r.t the flattened vector:

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    source = tf.concat([tf.reshape(x,[-1]) for x in model.trainable_variables],axis=0)
    g = tape.gradient(loss,source) 
    h=tape.jacobian(g,source)
   

The problem now is that g is empty (NoneType) for some reason. I noticed that source is tf.Tensor-type but model.trainable_variables[0] was of type tf.ResourceVariable so I tried changing this by declaring source as

source = resource_variable_ops.ResourceVariable(tf.concat([tf.reshape(x,[-1]) for x in model.trainable_variables],axis=0))

This didn't change anything though, so I'm guessing that this is not the issue. I also thought that the problem might be that the source-variable is not watched, but it seems that it is set to trainable and even if i do tape.watch(source), g is still empty.

Does anybody know how I can solve this?

1

There are 1 best solutions below

0
elbe On

Maybe you could use a loop on the trainable variables? I know it's a basic idea.

with tf.GradientTape(persistent=True) as tape:
    loss = tf.reduce_mean(model(x,training=True)**2)
    g_list, h_list = [], []
    for train_var in model.trainable_variables:
      g = tape.gradient(loss, train_var)
      g_list.append(g)
      h_list.append(tape.jacobian(g, train_var))

You could also use a second loop before computing the Jacobian and try to concatenate the output lists.