Say I have a sequence of tokens:
my_seq = "Chocolate has a history of human consumption tracing back to 400 AD and is rich in polyphenols such as catechins, anthocyanidins, and pro anthocyanidins. As chocolate and cocoa product consumption, along with interest in them as functional foods, increases worldwide, there is a need to systematically and critically appraise the available clinical evidence on their health effects."
Is it possible to increase/decrease the attention scores such that my model will put more/less emphasis on the first 10 tokens in that sequence?
from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
device = "cuda" if torch.cuda.is_available() else "cpu"
config = AutoConfig.from_pretrained(
"gpt2",
vocab_size=len(tokenizer),
n_ctx=1024, # Define your desired context length
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)
gpt2_model = GPT2LMHeadModel(config).to(device)
input_ids = tokenizer.encode(my_seq, return_tensors="pt").to(device)
I found that models can often take a tensor of attention_mask
:
output = gpt2_model.generate(
input_ids=input_ids,
attention_mask=attention_mask,
max_length=50, # Adjust as needed
num_return_sequences=1, # Number of output sequences you want
)
But from here I understand that it's just a binary tensor.
Is there another way to do this?