I'm looking at VW's docs for update rule options, and I'm confused about the equation that specifies the learning rate schedule using the parameters
initial_t
,
power_t
,
and decay_learning_rate
.
Based on the equation below this line in the docs
specify the learning rate schedule whose generic form
if initial_t
is equal to zero (which is the setting by default), it seems that the learning rate will always be zero, for all timesteps and epochs. Is this right?
Also, what would happen if both initial_t
and power_t
are set to zero? I tried initializing a VW with those settings and it didn't complain.
initial_t
is set to zero by default. By default the initial learning rate will not useinitial_t
to calculate its value but will start off at its default value, which is0.5
.Per the documentation, the flags
adaptive
,normalized
, andinvariant
are on by default. If any of them is specified, the other flags are turned off. In the case that you turn on theinvariant
flag (so in the case that we are not using normalized or adaptive) the initial learning rate will be calculated using theinitial_t
andpower_t
values, and the defaultinitial_t
is set to one instead of zero.If
initial_t
is explicitly set to zero combined with theinvariant
flag being set, then yes, the learning rate will also be zero.If the initial learning rate is calculated using
initial_t
andpower_t
and both are explicitly set to zero, c++ should evaluatepowf(0,0)
to1
resulting in the learning rate set to its default value, which can be specified by--learning_rate
If you are running vowpalwabbit via the command line, you should be able to see what these values are set to: