Following the tutorial here. I've tried to adapt it to my dataset.
But, I've noticed that during evaluation the Seq2SeqTrainer calls the compute_metrics 3 times.
The first time it passes the correct validation/test set, but the other 2 times I don't know what the hell is passing on or why is calling the compute_metrics 3 times?
Notice in the screenshot below the validation set has 6400 samples, which is correctly passed to the compute_metrics the first time it is being called by the Seq2SeqTrainer, but what are the other 2 calls passing the second time predictions and labels of size 127 and third time it calls the compute_metrics it passes some type of scalar values for both the predictions and labels.
Could anyone explain what the hell is going on here?
Why does the Seq2SeqTrainer calls the compute_metrics 3 times when it should only call it once passing the actual predictions and labels on validation set which is of size 6400 samples?
training_args = Seq2SeqTrainingArguments(
output_dir="t5_checkpoints",
learning_rate=2e-5,
per_device_train_batch_size=640,
per_device_eval_batch_size=640,
num_train_epochs=10,
weight_decay=0.01,
evaluation_strategy="epoch",
save_strategy="epoch",
save_total_limit=3,
predict_with_generate=True,
generation_max_length=128,
generation_num_beams=4,
load_best_model_at_end=True,
logging_steps=1,
)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=tokenized_data["train"],
eval_dataset=tokenized_data["valid"],
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)

It turns out the issue was a hidden state in the jupyter notebook which was recursively calling
compute_metricspassing as arguments parts ofpreds, labelsuntil there was nothing left but a scalar value. Now things "seem to work" (i.e., no errors but seems like the model is not learning when tested even though the task is simple. Train accuracy reaches 90% but upon evaluation the model produces wrong text summaries compared to the true labels.)