How to change PyTorch sigmoid function to be steeper

2.5k Views Asked by At

My model works when I use torch.sigmoid. I tried to make the sigmoid steeper by creating a new sigmoid function:

def sigmoid(x):
    return 1 / (1 + torch.exp(-1e5*x))

But for some reason the gradient doesn't flow through it (I get NaN). Is there a problem in my function, or is there a way to simply change the PyTorch implementation to be steeper (as my function)?

Code example:

def sigmoid(x):
  return 1 / (1 + torch.exp(-1e5*x))

a = torch.tensor(0.0, requires_grad=True)
b = torch.tensor(0.58, requires_grad=True)

c = sigmoid(a-b)
c.backward()
a.grad
>>> tensor(nan)
2

There are 2 best solutions below

3
On BEST ANSWER

The issue seems to be that when the input to your sigmoid implementation is negative, the argument to torch.exp becomes very large, causing an overflow. Using torch.autograd.set_detect_anomaly(True) as suggested here, you can see the error:

RuntimeError: Function 'ExpBackward' returned nan values in its 0th output.

If you really need to use the function you have defined, a possible workaround could be to put a conditional check on the argument (but I am not sure if it would be stable, so I cannot comment on its usefulness):

def sigmoid(x):
    if x >= 0:
        return 1./(1+torch.exp(-1e5*x))
    else:
        return torch.exp(1e5*x)/(1+torch.exp(1e5*x)) 

Here, the expression in the else branch is equivalent to the original function, by multiplying the numerator and denominator by torch.exp(1e5*x). This ensures that the argument to torch.exp is always negative or close to zero.

As noted by trialNerror, the exponent value is so high that except for values extremely close to zero, your gradient will evaluate to zero everywhere else since the actual slope will be extremely small and cannot be resolved by the data type. So if you plan to use it in a network you will likely find it very difficult to learn anything since gradients will almost always be zero. It might be better to select a smaller exponent, depending on your use case.

0
On

You put a dilation of 1e5 in your exponential. The exponential of 1e5 is so unbelievably high that there is no hope to get meaningful result here. You are probably getting a NaN because you are trying to backpropagate through a computational graph which at some point is evaluated to inf (and beyond!)

Anyway, to make the slope of a function steeper, remember that df(a.x)/dx = a.df(x)/dx so you need to multiply its argument by a value greater than 1 (and not negative, you will change the sign of you derivative), but not that huge ! Try with 10 maybe, it also depends on the order of magnitude of the inputs you are going to put in your function