How can I check the loss when training RoBERTa in huggingface/transformers?

2.4k Views Asked by At

I trained a RoBERTa model from scratch using transformers, but I can't check the training loss during training using

https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb

In the notebook, loss is printed every 500 steps, but there is no training loss logged during training:

Iteration: 100%|█████████▉| 20703/20711 [4:42:54<00:07,  1.14it/s][A  
Iteration: 100%|█████████▉| 20704/20711 [4:42:54<00:05,  1.24it/s][A  
Iteration: 100%|█████████▉| 20705/20711 [4:42:55<00:05,  1.20it/s][A  
Iteration: 100%|█████████▉| 20706/20711 [4:42:56<00:04,  1.18it/s][A  
Iteration: 100%|█████████▉| 20707/20711 [4:42:57<00:03,  1.19it/s][A  
Iteration: 100%|█████████▉| 20708/20711 [4:42:58<00:02,  1.16it/s][A  
Iteration: 100%|█████████▉| 20709/20711 [4:42:59<00:01,  1.14it/s][A  
Iteration: 100%|█████████▉| 20710/20711 [4:43:00<00:00,  1.13it/s][A  
Iteration: 100%|██████████| 20711/20711 [4:43:00<00:00,  1.45it/s][A  
Iteration: 100%|██████████| 20711/20711 [4:43:00<00:00,  1.22it/s]  
Epoch: 100%|██████████| 13/13 [61:14:16<00:00, 16952.06s/it]  
Epoch: 100%|██████████| 13/13 [61:14:16<00:00, 16958.16s/it]

compress roberta.20200717.zip on ./pretrained
save roberta.20200717.zip on minio(petcharts)

No values are printed for the loss, so I don't know if the training converged well or not. How can I monitor the loss during training?

1

There are 1 best solutions below

0
On

Just try executing the notebook again (e.g. directly in colab) using a newer version of the library, for which logging has been reworked. Be mindful that Trainer may need a slightly different set of arguments because of deprecations.

The example shows the training loss in output without issues

Training metrics

I'm using

tokenizers                    0.9.4          
transformers                  4.0.0rc1

and these training arguments seem to do the job:

from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir="./EsperBERTo",
    overwrite_output_dir=True,
    num_train_epochs=1,
    per_device_train_batch_size=64,
    save_steps=10_000,
    save_total_limit=2,
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset,
)