Pytorch schedule learning rate

I am trying to re-implement one paper, which suggests to adjust the learning rate as below:

The learning rate is decreased by a factor of the regression value with patience epochs 10 on the change value of 0.0001.

Should I use the torch.optim.lr_scheduler.ReduceLROnPlateau()?

I am not sure what value should I pass to each parameter.

  1. Is the change value in the statement denotes to the parameter threshold?

  2. Is the factor in the statement denotes to the parameter factor?


Pytorch has many ways to let you reduce the learning rate. It is quite well explained here:

@Antonino DiMaggio explained ReduceOnPlateau quite well. I just want to complement the answer to reply to the comment of @Yan-JenHuang:

Is it possible to decrease the learning_rate by minus a constant value instead by a factor?

First of all, you should be very careful to avoid negative values of lr! Second, subtracting a value of the learning rate is not common practice. But in any case...

You have first to make a custom lr scheduler (I modified the code of LambdaLR

torch.optim.lr_scheduler import _LRScheduler

class SubtractLR(_LRScheduler):
    def __init__(self, optimizer, lr_lambda, last_epoch=-1, min_lr=e-6):
        self.optimizer = optimizer
        self.min_lr = min_lr  # min learning rate > 0 

        if not isinstance(lr_lambda, list) and not isinstance(lr_lambda, tuple):
            self.lr_lambdas = [lr_lambda] * len(optimizer.param_groups)
            if len(lr_lambda) != len(optimizer.param_groups):
                raise ValueError("Expected {} lr_lambdas, but got {}".format(
                    len(optimizer.param_groups), len(lr_lambda)))
            self.lr_lambdas = list(lr_lambda)
        self.last_epoch = last_epoch
        super(LambdaLR, self).__init__(optimizer, last_epoch)

    def get_lr(self):
        if not self._get_lr_called_within_step:
            warnings.warn("To get the last learning rate computed by the scheduler, "
                          "please use `get_last_lr()`.")

        return [(max(base_lr - lmbda(self.last_epoch), self.min_lr)
                for lmbda, base_lr in zip(self.lr_lambdas, self.base_lrs)] # reduces the learning rate

Than you can use it in your training.

 lambda1 = lambda epoch: e-4 # constant to subtract from lr
 scheduler = SubtractLR(optimizer, lr_lambda=[lambda1])
 for epoch in range(100):
 lambda1 = lambda epoch: epoch * e-6 # increases the value to subtract lr proportionally to the epoch
 scheduler = SubtractLR(optimizer, lr_lambda=[lambda1])
 for epoch in range(100):

You can also modify the code of ReduceLROnPlateau to subtract the learning rate instead of mutiplying it. Your should change this line new_lr = max(old_lr * self.factor, self.min_lrs[i]) to something like new_lr = max(old_lr - self.factor, self.min_lrs[i]). You can take a look at the code yourself:


torch.optim.lr_scheduler.ReduceLROnPlateau is indeed what you are looking for. I summarized all of the important stuff for you.

mode=min: lr will be reduced when the quantity monitored has stopped decreasing

factor: factor by which the learning rate will be reduced

patience: number of epochs with no improvement after which learning rate will be reduced

threshold: threshold for measuring the new optimum, to only focus on significant changes (change value). Say we have threshold=0.0001, if loss is 18.0 on epoch n and loss is 17.9999 on epoch n+1 then we have met our criteria to multiply the current learning rate by the factor.

criterion = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min',
    factor=0.1, patience=10, threshold=0.0001, threshold_mode='abs')

for epoch in range(20):
    # training loop stuff
    loss = criterion(...)

You can check more details in the documentation:


As a supplement for the above answer for ReduceLROnPlateau that threshold also has modes(rel|abs) in lr scheduler for pytorch (at least for vesions>=1.6), and the default is 'rel' which means if your loss is 18, it will change at least 18*0.0001=0.0018 to be recognized as an improvement. So, watch out the threshold mode as well.