In sklearn's tSNE implementation, the gradient update is done as follows (gradient_descent function in _t_sne.py on sklearn's github):
error, grad = objective(p, *args, **kwargs)
grad_norm = linalg.norm(grad)
inc = update * grad < 0.0
dec = np.invert(inc)
gains[inc] += 0.2
gains[dec] *= 0.8
np.clip(gains, min_gain, np.inf, out=gains)
grad *= gains
update = momentum * update - learning_rate * grad
p += update
What is unclear to me is where the += 0.2 and *= 0.8 come from. I couldn't find anything in the original t-SNE paper and I can't reconcile the updates in the sklearn implementation with the update formula in the paper: tSNE paper gradient update
Does anybody know the logic behind the implementation or how I can reconcile the two?
Thanks in advance.