failed in convergence using adabelif optimizer

29 Views Asked by At

using adabelif optimizer for training the vit model but failed to converge in finetuning task i used optax.adabelif() optimizer ,in this change learing rate from 0.0001 to 0.1 but failed also change the other parameters but dont get converge value in here the graph it show enter image description here

here is piece of code where go to train.py file and change the optimzer from sgd to adabelif but in adabeilf it dont converge


  total_steps = config.total_steps

  lr_fn = utils.create_learning_rate_schedule(total_steps, config.base_lr,
                                              config.decay_type,
                                              config.warmup_steps)
  tx = optax.chain(
      optax.clip_by_global_norm(config.grad_norm_clip),
      optax.adabelief(
          learning_rate=0.00001 ,
          b1=0.9,
          b2=0.999,
          weight_decay=0.0,  
          eps=1e-8, 
          eps_root=1e-16
          ),
    #   optax.sgd(
    #       learning_rate=lr_fn,
    #       momentum=0.9,
    #       accumulator_dtype='bfloat16',
    #       nesterov=True,
    #   ),
  )

i also do changing in finetunning part in code here is the piece of code

# 100 Steps take approximately 15 minutes in the TPU runtime.
total_steps = 100
warmup_steps = 5
decay_type = 'cosine'
grad_norm_clip = 0.25  
# This controls in how many forward passes the batch is split. 8 works well with a TPU runtime that has 8 devices. 64 should work on a GPU. You can of course
# also adjust the batch_size above, but that would require you to adjust the learning rate accordingly.
accum_steps = 8
base_lr = 0.03

If any could tell me which side I'm doing incorrectly. What I need to do to get convergence, that'd be awesome.

0

There are 0 best solutions below