Fine tuning error : Expected input batch_size (1) to match target batch_size (64)

413 Views Asked by At

I am new to pytorch. I'm fine tuning a Bert model to do a text binary-classification. Once I run my code, I get this following error :

Expected input batch_size (1) to match target batch_size (64).

Here's my model structure :

from torch import nn
# Get cpu or gpu device for training.

device = "cuda" if torch.cuda.is_available() else "cpu"

#device = "cpu"

print(f"Using {device} device")

class DistilBertClassification(nn.Module):
    
    def __init__(self):
        super(DistilBertClassification, self).__init__()
        self.dbert = dbert_pt
        self.dropout = nn.Dropout(p=0.1)
        self.linear1 = nn.Linear(768,64)
        self.ReLu = nn.ReLU()
        self.linear2 = nn.Linear(64,2)

    def forward(self, x):
        
        #print('----forward checkpoint 1----')
        
        x = self.dbert(input_ids=x[0])

        x = x["last_hidden_state"][:,0,:]
  
        x = self.dropout(x)
        x = self.linear1(x)
        x = self.ReLu(x)
        logits = self.linear2(x)
      
        return logits

model_pt = DistilBertClassification().to(device)

and here's the part of the code that gives the error :

it's when I try to compute the loss using loss = criterion(prediction , y.cuda())

from tqdm import tqdm

# pour chaque epochs
for e in range(epochs):
    
    
    model_pt.train() 
    
    train_loss = 0.0
    train_accuracy = []
    
    # Loop on batches
    for X, y in tqdm(train_loader_pt):
        
        # Get prediction & loss
        
        prediction = model_pt(X.cuda())
        
        loss = criterion(prediction , y.cuda())
        
        # Adjust the parameters of the model
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        train_loss += loss.item()
        
        prediction_index = prediction.argmax(axis=1)
        accuracy = (prediction_index==y.cuda())
        train_accuracy += accuracy
    
    train_accuracy = (sum(train_accuracy) / len(train_accuracy)).item()

the shape of prediction is : torch.Size([1, 2])

while the shape of y is : torch.Size([64])

1

There are 1 best solutions below

0
On

First of all, ensure that your input data is being batched properly when being fed into your network, (X). I assume that you are trying to use a batch size of 64, based off of y and the fact that you are using binary classification. You should also be careful with the shape of your y labels/prediction, as the loss function requires both to have the same dimension. You can fix this by changing the model to only output one logit in the final layer. Either way, both your y value and your predictions should be of size [batch_size, num_model_output_logits]