3 questions:
what is grad_outputs in chainer?
one example in chainer's function F.transpose, how to explain this backward code?
def backward(self, inputs, grad_outputs): gy = grad_outputs[0] inv_axes = self.axes if self.axes: axes = tuple(ax % len(self.axes) for ax in self.axes) inv_axes = tuple(numpy.argsort(axes)) gx = gy.transpose(inv_axes) return gx,
suppose I want implement self define function, but my inputs[0] and inputs[1] have different shape, in order to back propagation using differential chain rule, I have to write following code in
backward:a, b = inputs gy = grad_outputs[0] return a * gy, b * gy But, a and b is not same shape, and
a * gyandb * gymaybe report error? shape doesn't match to multiply?
*This answer applies to chainer v2, the
Functionclass's internal behavior may change after chainer v3 to support differentiable backpropagation.Back propagation proceeds from final layer to first layer to propagate its gradients in order to calculate gradient for each layer's parameters.
The function's
backwardfunction receives gradient of output, and need to calculate & return gradient of input.grad_outputsis the gradient for this function's output, in array (numpyorcupy) form.F.transpose's differentiation is also just a transpose, so it is just returning the transpose of gradient of output,gy. However rigorously,F.transpose's transpose order is specified when we forward the computation, this order is kept asself.axesand in it needs to be reverse ordered in backward computation. I guessinv_axesis the reversely ordered axes and it is used to calculate gradient of input, written asgx.return a * gy, b * gy. Shape does not matter and it can be different for each function's input (as well as the return values ofbackward)