Adam algorithm in Keras:
Initialize
m_0
as 1st-moment vector,Initialize
v_0
as 2nd-moment vector,The update rule for
theta
with gradientg
:
lr_t = learning_rate * sqrt(1 - beta_2^t) / (1 - beta_1^t)
m_t = beta_1 * m_{t-1} + (1 - beta_1) * g_t
v_t = beta_2 * v_{t-1} + (1 - beta_2) * g^2
theta_t = theta_{t-1} - lr_t * m_t / (sqrt{v_t} + epsilon)
I want to know how to save m_t
and v_t
at each step t
in Keras.