How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

424 Views Asked by At

I working on sarcasm dataset and my model like below:

I first tokenize my input text:

 PRETRAINED_MODEL_NAME = "roberta-base"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
import torch
from torch.utils.data import Dataset, DataLoader

MAX_LEN = 100

then I defined class for my dataset:

class SentimentDataset (Dataset):
    def __init__(self,dataframe):
        self.dataframe = dataframe

    def __len__(self):
        return len(self.dataframe)
    
    def __getitem__(self, idx):
        df = self.dataframe.iloc[idx]

        text = [df["comment"]]
        label = [df["label"]]

        data_t = tokenizer(text,max_length = MAX_LEN, return_tensors="pt", padding="max_length", truncation=True)
        label_t = torch.LongTensor(label)

        return {
             "input_ids":data_t["input_ids"].squeeze().to(device),
             "label": label_t.squeeze().to(device),
        }

then I create obj from my class for training set and set other parameters:

train_dataset = SentimentDataset(train_df)
BATCH_SIZE = 32
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
from transformers import AutoModelForSequenceClassification, AutoConfig

# For loading model stucture and pretrained weights:
model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)

import transformers


optimizer = torch.optim.Adam(model.parameters(), lr=2e-5, weight_decay=1e-5)

Then I use dataloader for training my data:

train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
EPOCHS = 5
for epoch in range(EPOCHS):
    print("\n******************\n epoch=",epoch)
    i = 0
    logits_list = []
    labels_list = []
    for batch in train_dataloader:
        i += 1
        optimizer.zero_grad()
        output_model = model(input_ids = batch["input_ids"], labels = batch["label"])
        loss = output_model.loss
        logits = output_model.logits
        logits_list.append(logits.cpu().detach().numpy())
        labels_list.append(batch["label"].cpu().detach().numpy())
        loss.backward()
        optimizer.step()
    #scheduler.step()
        if i % 50 ==0 :
            print("training loss:",loss.item())
            #print("validation loss:",loss.item())
    logits_list = np.concatenate(logits_list, axis=0)
    labels_list = np.concatenate(labels_list, axis=0)
    logits_list = np.argmax(logits_list, axis =1)
    print(classification_report(labels_list, logits_list))

My question is how can I change self attention layers number and head of multihead attention in my model?

1

There are 1 best solutions below

0
On

The short answer is: You can't.

You are using a pretrained model:

model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)

You can't easily change the pretrained model. It is possible to change pretrained models, but that is definitely not straightforward. You can download different pretrained model or you can train any model you like from scratch (which would take probably too much time and computational resources). The only thing you can easily change is the "depth" of the model - you can discard some of the transformer blocks.