Vision Transformer (ViT) implementation in Pytorch keeps returning same class label in output tensors

66 Views Asked by At

I have 15,000 datapoints and 5 classes. When I run this part of my code:

model.eval()
inputs, labels = next(iter(test_dataloader))
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
print("Predicted classes", outputs.argmax(-1))
print("Actual classes", labels)

I get the following output:

Predicted classes tensor([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2])
Actual classes tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0])
Actual classes tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0])

In the tensors, there are 32 items since my batch size is 32.

0, 1, 2, 3 and 4 each correspond to 'healthy', 'mild npdr', 'moderate npdr', 'severe npdr', and 'pdr', respectively. I have 5382 datapoints in my healthy class, 2443 in mild, 5292 in moderate, 1049 in severe and 708 in pdr.


What I've also found is after a couple epochs, the accuracy starts to become the exact same:

>>> Epoch 1 train loss: 1.4067111531252503 train accuracy: 0.35078578031767377
>>> Epoch 1 test loss: 1.3799789259510655 test accuracy: 0.35126050420168065
Validation loss decreased (inf --> 1.379979).  Saving model ...
epoch=1, learning rate=0.0010
>>> Epoch 2 train loss: 1.3790244847856543 train accuracy: 0.35750903437263637
>>> Epoch 2 test loss: 1.3944347340573546 test accuracy: 0.36168067226890754
EarlyStopping counter: 1 out of 3
epoch=2, learning rate=0.0010
>>> Epoch 3 train loss: 1.3741713842397094 train accuracy: 0.3529708378855366
>>> Epoch 3 test loss: 1.372046325796394 test accuracy: 0.36168067226890754
Validation loss decreased (1.379979 --> 1.372046).  Saving model ...
epoch=3, learning rate=0.0010
>>> Epoch 4 train loss: 1.371311823206563 train accuracy: 0.3564165055887049
>>> Epoch 4 test loss: 1.367056119826532 test accuracy: 0.35126050420168065
Validation loss decreased (1.372046 --> 1.367056).  Saving model ...
epoch=4, learning rate=0.0010
>>> Epoch 5 train loss: 1.369389453241902 train accuracy: 0.3569207496428271
>>> Epoch 5 test loss: 1.373937726020813 test accuracy: 0.36168067226890754
EarlyStopping counter: 1 out of 3
epoch=5, learning rate=0.0010
>>> Epoch 6 train loss: 1.3699962490348405 train accuracy: 0.36162702748130093
>>> Epoch 6 test loss: 1.3644213702089043 test accuracy: 0.36168067226890754
Validation loss decreased (1.367056 --> 1.364421).  Saving model ...
epoch=6, learning rate=0.0010
>>> Epoch 7 train loss: 1.369478802847606 train accuracy: 0.3548197327506513
>>> Epoch 7 test loss: 1.3675817212750834 test accuracy: 0.36168067226890754
EarlyStopping counter: 1 out of 3
epoch=7, learning rate=0.0010
>>> Epoch 8 train loss: 1.3686860168492923 train accuracy: 0.35876964450794185
>>> Epoch 8 test loss: 1.3695218537443428 test accuracy: 0.36168067226890754
EarlyStopping counter: 2 out of 3
Epoch 00015: reducing learning rate of group 0 to 2.0000e-04.
epoch=8, learning rate=0.0002
>>> Epoch 9 train loss: 1.366326274730826 train accuracy: 0.3552399361290865
>>> Epoch 9 test loss: 1.3646431353784376 test accuracy: 0.36168067226890754
EarlyStopping counter: 3 out of 3
Early stopping

Furthermore, when I print my outputs:

print(outputs)

I get a matrix of dimensions batch size, num_classes (32, 5) with numbers in the columns relatively similar to each other:

tensor([[ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557],
    [ 0.4999, -0.3836,  0.4566, -1.7039, -1.3557]],
   grad_fn=<AddmmBackward0>)

When i print the outputs on a subset of data, the values in the matrix are a bit varied. After training on a couple epochs, after the accuracy starts to become the same, the output matrix has the same numbers per row.

This error has been occurring since the start of this project. I personally think it is an underfitting issue, which is why I added EarlyStopping and a learning rate scheduler. After adding these things, I haven't gotten much progress in resolving this issue. This issue is most likely a bug involved with my inputs and labels, but I'm not certain yet.

**The code for this can be found in my GitHub repository: ** https://github.com/HydraulicSponge/VisionTransformer/blob/main/main.py

Any help in debugging would be appreciated. Also, please let me know if there is anything wrong with my code in general (mostly regarding the training/eval phases, EarlyStopping + lr scheduler, as well as the ViT, Attention, PositionalEncoding and Patch Embedding classes)

I may also just be calculating loss and accuracy wrong in my code. Please check over that and let me know.

Btw, This is how my folder is structured:

root_dir data training_data pdr severe npdr mild npdr moderate npdr healthy testing_data pdr severe npdr mild npdr moderate npdr healthy

This is the dataset I used for training:

https://www.kaggle.com/datasets/amanneo/diabetic-retinopathy-resized-arranged

I made sure to delete a lot of the "healthy" class images, as there were too many datapoints in the class in comparison to the other classes.

For testing, I used this dataset:

https://www.kaggle.com/competitions/aptos2019-blindness-detection/data

0

There are 0 best solutions below