I am working on a NN with Pytorch which simply maps points from the plane into real numbers, for example
model = nn.Sequential(nn.Linear(2,2),nn.ReLU(),nn.Linear(2,1))
What I want to do, since this network defines a map h:R^2->R, is to compute the gradient of this mapping h in the training loop. So for example
for it in range(epochs):
pred = model(X_train)
grad = torch.autograd.grad(pred,X_train)
....
The training set has been defined as a tensor requiring the gradient. My problem is that even if the output, for each fixed point, is a scalar, since I am propagating a set of N=100 points, the output is actually a Nx1 tensor. This brings to the error: autograd can compute the gradient just of scalar functions.
In fact, trying with the little change
pred = torch.sum(model(X_train))
everything works perfectly. However I am interested in all the single gradients so, is there a way to compute all these gradients together?
Actually computing the sum as presented above gives exactly the same result I expect of course, but I wanted to know if this is the only possiblity.
There are other possibilities but using
.sum
is the simplest way. Using.sum()
on the final loss vector and computingdpred/dinput
will give you the desired output. Here is why:Since,
pred = sum(loss) = sum (f(xi))
where
i
is the index of inputx
.dpred/dinput
will be a matrix[dpred/dx0, dpred/dx1, dpred/dx...]
Consider,
dpred/dx0
, it will be equal todf(x0)/dx0
, since otherdf(xi)/dx0
is 0.PS: Please excuse the crappy mathematical expressions... SO does not support latex/math expressions.