I know there is quite a bit of content out there about "computing the Hessian" in pytorch, but as far as I've seen I haven't found anything working for me. So to try to be most precise, the Hessian that I want is the Jacobian of the gradient of the loss with respect to the network parameters. Also called the matrix of second-order derivatives with respect to the parameters.
I found some code that works in an intuitive way, although shouldn't be fast. It clearly just computes the gradient of the gradient of the loss w.r.t. the params w.r.t the params, and it does it one element (of the gradient) at a time. I think the logic is definitely right but I am getting an error, having to do with requires_grad
. I'm a pytorch beginner so maybe its a simple thing, but the error seems to be saying that it can't take the gradient of the env_grads
variable, which is the output from the previous grad function call.
Any help with this would be greatly appreciated. Here is the code followed by the error message. I also printed out the env_grads[0]
variable so we can see that it is in fact a tensor, which is the correct output from the previous grad
call.
env_loss = loss_fn(env_outputs, env_targets)
total_loss += env_loss
env_grads = torch.autograd.grad(env_loss, params,retain_graph=True)
print(env_grads[0])
hess_params = torch.zeros_like(env_grads[0])
for i in range(env_grads[0].size(0)):
for j in range(env_grads[0].size(1)):
hess_params[i, j] = torch.autograd.grad(env_grads[0][i][j], params, retain_graph=True)[0][i, j] # <--- error here
print(hess_params)
Output:
tensor([[-6.4064e-03, -3.1738e-03, 1.7128e-02, 8.0391e-03],
[ 7.1698e-03, -2.4640e-03, -2.2769e-03, -1.0687e-03],
[-3.0390e-04, -2.4273e-03, -4.0799e-02, -1.9149e-02],
...,
[ 1.1258e-02, -2.5911e-05, -9.8133e-02, -4.6059e-02],
[ 8.1502e-04, -2.5814e-03, 4.1772e-02, 1.9606e-02],
[-1.0075e-02, 6.6072e-03, 8.3118e-04, 3.9011e-04]], device='cuda:0')
Error:
Traceback (most recent call last):
File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 258, in <module>
main(args)
File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 245, in main
deep_mask=args.deep_mask
File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/run_synthetic.py", line 103, in train
scale_grad_inverse_sparsity=scale_grad_inverse_sparsity
File "/home/jefferythewind/Projects/Irina/learning-explanations-hard-to-vary/and_mask/and_mask_utils.py", line 154, in get_grads_deep
hess_params[i, j] = torch.autograd.grad(env_grads[0][i][j], params, retain_graph=True)[0][i, j]
File "/home/jefferythewind/anaconda3/envs/rapids3/lib/python3.7/site-packages/torch/autograd/__init__.py", line 157, in grad
inputs, allow_unused)
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Don't mind if I answer my own question here. I found the hint that I needed in this thread from the PyTorch site, about half way down.
I noticed there is an argument for
autograd.grad
calledcreate_graph
so I set this toTrue
in the first call tograd
, and that ended solving the error.Modified working code: