Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

Question

Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

2.9k Views Asked by mohamad alikhani At 10 April 2023 at 18:18

i'm totally new to pytorch. I was taking an e-course and was experimenting with pytorch. So i came across the two loss functions(The hypothesis for using these two losses is numerical stability with logits):

nn.BCEWithLogitsLoss()

and

nn.BCELoss()

For appropriate adjustments to the code and these two loss functions, I had quite different accuracy curves! For example with nn.BCELoss() as the below code snippet:

model = nn.Sequential(
nn.Linear(D, 1),
nn.Sigmoid()
)

criterion = nn.BCELoss()

Accuracy plot was: enter image description here

And for nn.BCEWithLogitsLoss(), as below:

model = nn.Linear(D, 1)
criterion = nn.BCEWithLogitsLoss()

Accuracy plot was:enter image description here

The rest of the code is the same for both examples. (Note that, loss curves were similar and decent) The leaning curves for both snippets were something like this: enter image description here I couldn't figure out, what is causing this problem(if there is a bug in my code or something wrong with my pytorch. Thank you for your time, and help in advance.

Original Q&A

There are 2 best solutions below

**TheEngineerProgrammer** · Answer 1 · 2023-04-10T18:38:10.297000

nn.BCELoss() expects your output to be probabilities, that is with the sigmoid activation.
nn.BCEWithLogitsLoss() expects your output to be logits, that is without the sigmoid activation.

I think maybe you calculated something wrong (like accuracy). Here I give you a simple example based on your code:

With probabilities:

dummy_x = torch.randn(1000,1)
dummy_y = (dummy_x > 0).type(torch.float)

model1 = nn.Sequential(
    nn.Linear(1, 1),
    nn.Sigmoid()
)
criterion1 = nn.BCELoss()
optimizer = torch.optim.Adam(model1.parameters(), 0.001)

def binary_accuracy(preds, y, logits=False):
    if logits:
        rounded_preds = torch.round(torch.sigmoid(preds))
    else:
        rounded_preds = torch.round(preds)
    correct = (rounded_preds == y).float()
    accuracy = correct.sum() / len(y)
    return accuracy

for e in range(2000):
    y_hat = model1(dummy_x)
    loss = criterion1(y_hat, dummy_y)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y)}")

#Result:
Epoch: 100, Loss: 0.5840
Epoch: 100, Acc: 0.5839999914169312
Epoch: 200, Loss: 0.5423
Epoch: 200, Acc: 0.6499999761581421
...
Epoch: 1800, Loss: 0.2862
Epoch: 1800, Acc: 0.9950000047683716
Epoch: 1900, Loss: 0.2793
Epoch: 1900, Acc: 0.9929999709129333

Now with logits

model2 = nn.Linear(1, 1)
criterion2 = nn.BCEWithLogitsLoss()
optimizer2 = torch.optim.Adam(model2.parameters(), 0.001)
for e in range(2000):
    y_hat = model2(dummy_x)
    loss = criterion2(y_hat, dummy_y)
    optimizer2.zero_grad()
    loss.backward()
    optimizer2.step()

    if e != 0 and e % 100==0:
        print(f"Epoch: {e}, Loss: {loss:.4f}")
        print(f"Epoch: {e}, Acc: {binary_accuracy(y_hat, dummy_y, logits=True)}")

#Results: 
Epoch: 100, Loss: 1.1042
Epoch: 100, Acc: 0.007000000216066837
Epoch: 200, Loss: 1.0484
Epoch: 200, Acc: 0.01899999938905239
...
Epoch: 1800, Loss: 0.5019
Epoch: 1800, Acc: 0.9879999756813049
Epoch: 1900, Loss: 0.4844
Epoch: 1900, Acc: 0.9879999756813049

**Mohit Gupta** · Answer 2 · 2023-06-25T14:08:27.670000

You would need to modify the code according to the loss function (aka criterion) you are using. For BCEloss - Since you are using the sigmoid layer in your model: so the output are between 0 and 1.

For BCEWithLogitsLoss - Output is the logit. Logit can be negative or positive. Logit is z, where

z = w1*x1 + w2*x2 + ... wn*xn

So, for your predictions while using BCEWithLogitsLoss, you need to pass this output through a sigmoid layer (For this you can create a small function which returns

1/(1+np.exp(-np.dot(x,w)))

and then you should calculate the accuracy.

Hope this helps!!!

Pytorch's nn.BCEWithLogitsLoss() behaves totaly differently than nn.BCELoss()

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in MSE

Related Questions in LOGITS

Trending Questions

Popular # Hahtags

Popular Questions