I am trying to finetune t5 for spelling and grammar correction for healthcare. The nature of graph seems confusing to me . What could have been gone wrong?
My dataset is DatasetDict({ train: Dataset({ features: ['sentence_with_error', 'original_sentence'], num_rows: 2506554 }) validation: Dataset({ features: ['sentence_with_error', 'original_sentence'], num_rows: 75104 }) })
Also my training arguments are as :
early_stopping_callback = EarlyStoppingCallback(
early_stopping_patience=3, # Number of evaluations to wait for improvement
early_stopping_threshold=0.01, # Threshold to determine what constitutes an improvement
)
args = Seq2SeqTrainingArguments(
pretrained_model,
evaluation_strategy="steps",
eval_steps=600,
logging_strategy="steps",
logging_steps=600,
save_strategy="steps",
save_steps=600,
learning_rate=5e-4,
per_device_train_batch_size=24,
per_device_eval_batch_size=24,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=1,
predict_with_generate=True,
fp16=True,
load_best_model_at_end=True,
)
trainer = Seq2SeqTrainer(
model=model,
args=args,
train_dataset=tokenized_data["train"],
eval_dataset=tokenized_data["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
callbacks=[early_stopping_callback], # Include the early stopping callback
)
Here is the nature of my graph for loss 
I tried adjusting learning rate as well. My max sequence length is set to 150.