I working on sarcasm dataset and my model like below:
I first tokenize my input text:
PRETRAINED_MODEL_NAME = "roberta-base"
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(PRETRAINED_MODEL_NAME)
import torch
from torch.utils.data import Dataset, DataLoader
MAX_LEN = 100
then I defined class for my dataset:
class SentimentDataset (Dataset):
def __init__(self,dataframe):
self.dataframe = dataframe
def __len__(self):
return len(self.dataframe)
def __getitem__(self, idx):
df = self.dataframe.iloc[idx]
text = [df["comment"]]
label = [df["label"]]
data_t = tokenizer(text,max_length = MAX_LEN, return_tensors="pt", padding="max_length", truncation=True)
label_t = torch.LongTensor(label)
return {
"input_ids":data_t["input_ids"].squeeze().to(device),
"label": label_t.squeeze().to(device),
}
then I create obj from my class for training set and set other parameters:
train_dataset = SentimentDataset(train_df)
BATCH_SIZE = 32
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
from transformers import AutoModelForSequenceClassification, AutoConfig
# For loading model stucture and pretrained weights:
model = AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL_NAME).to(device)
import transformers
optimizer = torch.optim.Adam(model.parameters(), lr=2e-5, weight_decay=1e-5)
Then I use dataloader for training my data:
train_dataloader = DataLoader(train_dataset, batch_size = BATCH_SIZE)
EPOCHS = 5
for epoch in range(EPOCHS):
print("\n******************\n epoch=",epoch)
i = 0
logits_list = []
labels_list = []
for batch in train_dataloader:
i += 1
optimizer.zero_grad()
output_model = model(input_ids = batch["input_ids"], labels = batch["label"])
loss = output_model.loss
logits = output_model.logits
logits_list.append(logits.cpu().detach().numpy())
labels_list.append(batch["label"].cpu().detach().numpy())
loss.backward()
optimizer.step()
#scheduler.step()
if i % 50 ==0 :
print("training loss:",loss.item())
#print("validation loss:",loss.item())
logits_list = np.concatenate(logits_list, axis=0)
labels_list = np.concatenate(labels_list, axis=0)
logits_list = np.argmax(logits_list, axis =1)
print(classification_report(labels_list, logits_list))
My question is how can I change self attention layers number and head of multihead attention in my model?
The short answer is: You can't.
You are using a pretrained model:
You can't easily change the pretrained model. It is possible to change pretrained models, but that is definitely not straightforward. You can download different pretrained model or you can train any model you like from scratch (which would take probably too much time and computational resources). The only thing you can easily change is the "depth" of the model - you can discard some of the transformer blocks.