ReLU is not derivable at 0, but in the implementation of PyTorch, it should be handled. So, the derivative at 0 is set to be 0 by default or?
I tried to set the weights and bias (the input of ReLU) to be zero while backpropagation, and the gradient of weights is 0, but not zero with the last conv layer in the residual block
Pytorch implements the derivative of
ReLUatx = 0by outputting zero. According to this article:The article also goes on to elaborate on the differences in outcomes between using ReLU'(0) = 0 and ReLU'(0) = 1, and notes that the effect is stronger if the numbers are lower precision.