I'm curious as to why Pytorch's binary_cross_entropy function seems to be implemented in such a way to calculate ln(0) = -100.
The binary cross entropy function from a math point of view calculates:
H = -[ p_0*log(q_0) + p_1*log(q_1) ]
In pytorch's binary_cross_entropy function, q is the first argument and p is the second.
Now suppose I do p = [1,0] and q = [0.25, 0.75]. In this case, F.binary_cross_entropy(q,p) returns, as expected: -ln(0.25) = 1.386.
If we reverse the function arguments and try F.binary_cross_entropy(p,q), this should return an error, since we would try calculating -0.75*ln(0), and ln(0) is in the limit -infinity.
Nonetheless, if I do F.binary_cross_entropy(p,q) it gives me 75 as the answer (see below):
> import torch.nn.functional as F
> pT = torch.Tensor([1,0])
> qT =torch.Tensor([0.25,0.75])
> F.binary_cross_entropy(pT,qT)
tensor(75.)
Why it was implemented in this way?
It is indeed filling the value with -100. You can find an example of that here.
This is most likely a hack to avoid an error caused by accidental rounding to zero.
Technically speaking, the input probabilities to
binary_cross_entropyare supposed to be generated by a sigmoid function, which is bounded asymptotically between(0, 1). This means the input should never actually be zero, but this may occur due to numerical precision issues for very small values.