Wandb website for Huggingface Trainer shows plots and logs only for the first model

597 Views Asked by At

I am finetuning multiple models using for loop as follows.

for file in os.listdir(args.data_dir):
    finetune(args, file)

BUT wandb website shows plots and logs only for the first file i.e., file1 in data_dir although it is training and saving models for other files. It feels very strange behavior.

wandb: Synced bertweet-base-finetuned-file1: https://wandb.ai/***/huggingface/runs/***

This is a small snippet of finetuning code with Huggingface:

def finetune(args, file):
    training_args = TrainingArguments(
        output_dir=f'{model_name}-finetuned-{file}',
        overwrite_output_dir=True,
        evaluation_strategy='no',
        num_train_epochs=args.epochs,
        learning_rate=args.lr,
        weight_decay=args.decay,
        per_device_train_batch_size=args.batch_size,
        per_device_eval_batch_size=args.batch_size,
        fp16=True, # mixed-precision training to boost speed
        save_strategy='no',
        seed=args.seed,
        dataloader_num_workers=4,
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=tokenized_dataset['train'],
        eval_dataset=None,
        data_collator=data_collator,
    )
    trainer.train()
    trainer.save_model()
2

There are 2 best solutions below

0
On BEST ANSWER

wandb.init(reinit=True) and run.finish() helped me to log the models separately on wandb website.

The working code looks like below:


for file in os.listdir(args.data_dir):
    finetune(args, file)

import wandb
def finetune(args, file):
    run = wandb.init(reinit=True)
    ...
    run.finish()

Reference: https://docs.wandb.ai/guides/track/launch#how-do-i-launch-multiple-runs-from-one-script

1
On

Hey I work at Weights & Biases. My first guess is that you have to call wandb.finish() at the end of your finetune function. This will close the wandb process. Then when you start a new iteration, a new wandb process should be spun up.

If you would like to log additional config data that isn't logged by the W&B integration in the Trainer you can always call wandb.init before kicking off your training, see wandb.init docs here and log to that. The Trainer should pick up that there is already a wandb process running and so will just log to that process instead of spinning up a new one.

def finetune():
  config = {'my-config-thing1':44, 'my-config-thing12':'cats'}

  wandb.init(project='my-project-name', config=config, ...)


  .... # Your Hugging Face code here


  wandb.finish()