How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

Question

How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

415 Views Asked by mahdi rafiei At 27 July 2025 at 22:15

I working on sarcasm dataset and my model like below:

I first tokenize my input text:

 PRETRAINED_MODEL_NAME = "roberta-base"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
import torch
from torch.utils.data import Dataset, DataLoader

MAX_LEN = 100

then I defined class for my dataset:

class SentimentDataset (Dataset):
    def __init__(self,dataframe):
        self.dataframe = dataframe

    def __len__(self):
        return len(self.dataframe)
    
    def __getitem__(self, idx):
        df = self.dataframe.iloc[idx]

        text = [df["comment"]]
        label = [df["label"]]

        data_t = tokenizer(text,max_length = MAX_LEN, return_tensors="pt", padding="max_length", truncation=True)
        label_t = torch.LongTensor(label)

        return {
             "input_ids":data_t["input_ids"].squeeze().to(device),
             "label": label_t.squeeze().to(device),
        }

then I create obj from my class for training set and set other parameters:

train_dataset = SentimentDataset(train_df)
BATCH_SIZE = 32
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
from transformers import AutoModelForSequenceClassification, AutoConfig

# For loading model stucture and pretrained weights:
model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)

import transformers


optimizer = torch.optim.Adam(model.parameters(), lr=2e-5, weight_decay=1e-5)

Then I use dataloader for training my data:

train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
EPOCHS = 5
for epoch in range(EPOCHS):
    print("\n******************\n epoch=",epoch)
    i = 0
    logits_list = []
    labels_list = []
    for batch in train_dataloader:
        i += 1
        optimizer.zero_grad()
        output_model = model(input_ids = batch["input_ids"], labels = batch["label"])
        loss = output_model.loss
        logits = output_model.logits
        logits_list.append(logits.cpu().detach().numpy())
        labels_list.append(batch["label"].cpu().detach().numpy())
        loss.backward()
        optimizer.step()
    #scheduler.step()
        if i % 50 ==0 :
            print("training loss:",loss.item())
            #print("validation loss:",loss.item())
    logits_list = np.concatenate(logits_list, axis=0)
    labels_list = np.concatenate(labels_list, axis=0)
    logits_list = np.argmax(logits_list, axis =1)
    print(classification_report(labels_list, logits_list))

My question is how can I change self attention layers number and head of multihead attention in my model?

Original Q&A

There are 1 best solutions below

**Michal Hradiš** · Answer 1

The short answer is: You can't.

You are using a pretrained model:

model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)

You can't easily change the pretrained model. It is possible to change pretrained models, but that is definitely not straightforward. You can download different pretrained model or you can train any model you like from scratch (which would take probably too much time and computational resources). The only thing you can easily change is the "depth" of the model - you can discard some of the transformer blocks.

How can I change self attention layer numbers and multihead attention head numbers in my model with Pytorch?

There are 1 best solutions below

Related Questions in PYTORCH

Related Questions in SENTIMENT-ANALYSIS

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in BERT-LANGUAGE-MODEL

Related Questions in SELF-ATTENTION

Trending Questions

Popular # Hahtags

Popular Questions