My model works when I use torch.sigmoid
. I tried to make the sigmoid steeper by creating a new sigmoid function:
def sigmoid(x):
return 1 / (1 + torch.exp(-1e5*x))
But for some reason the gradient doesn't flow through it (I get NaN
). Is there a problem in my function, or is there a way to simply change the PyTorch implementation to be steeper (as my function)?
Code example:
def sigmoid(x):
return 1 / (1 + torch.exp(-1e5*x))
a = torch.tensor(0.0, requires_grad=True)
b = torch.tensor(0.58, requires_grad=True)
c = sigmoid(a-b)
c.backward()
a.grad
>>> tensor(nan)
The issue seems to be that when the input to your sigmoid implementation is negative, the argument to
torch.exp
becomes very large, causing an overflow. Usingtorch.autograd.set_detect_anomaly(True)
as suggested here, you can see the error:If you really need to use the function you have defined, a possible workaround could be to put a conditional check on the argument (but I am not sure if it would be stable, so I cannot comment on its usefulness):
Here, the expression in the else branch is equivalent to the original function, by multiplying the numerator and denominator by
torch.exp(1e5*x)
. This ensures that the argument totorch.exp
is always negative or close to zero.As noted by trialNerror, the exponent value is so high that except for values extremely close to zero, your gradient will evaluate to zero everywhere else since the actual slope will be extremely small and cannot be resolved by the data type. So if you plan to use it in a network you will likely find it very difficult to learn anything since gradients will almost always be zero. It might be better to select a smaller exponent, depending on your use case.