What is the meaning of flat loss for both training and evaluation?

19 Views Asked by At

I am trying to finetune t5 for spelling and grammar correction for healthcare. The nature of graph seems confusing to me . What could have been gone wrong?

My dataset is DatasetDict({ train: Dataset({ features: ['sentence_with_error', 'original_sentence'], num_rows: 2506554 }) validation: Dataset({ features: ['sentence_with_error', 'original_sentence'], num_rows: 75104 }) })

Also my training arguments are as :


early_stopping_callback = EarlyStoppingCallback(
    early_stopping_patience=3,  # Number of evaluations to wait for improvement
    early_stopping_threshold=0.01,  # Threshold to determine what constitutes an improvement
)


args = Seq2SeqTrainingArguments(
    pretrained_model,
    evaluation_strategy="steps",
    eval_steps=600,
    logging_strategy="steps",
    logging_steps=600,
    save_strategy="steps",
    save_steps=600,
    learning_rate=5e-4,
    per_device_train_batch_size=24,
    per_device_eval_batch_size=24,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=1,
    predict_with_generate=True,
    fp16=True,
    load_best_model_at_end=True,
)


trainer = Seq2SeqTrainer(
    model=model,
    args=args,
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    callbacks=[early_stopping_callback],  # Include the early stopping callback

)

Here is the nature of my graph for loss enter image description here

I tried adjusting learning rate as well. My max sequence length is set to 150.

0

There are 0 best solutions below