gradient descent update rule in sklearn's tSNE implementation

93 Views Asked by At

In sklearn's tSNE implementation, the gradient update is done as follows (gradient_descent function in _t_sne.py on sklearn's github):

    error, grad = objective(p, *args, **kwargs)
    grad_norm = linalg.norm(grad)

    inc = update * grad < 0.0
    dec = np.invert(inc)
    gains[inc] += 0.2
    gains[dec] *= 0.8
    np.clip(gains, min_gain, np.inf, out=gains)
    grad *= gains
    update = momentum * update - learning_rate * grad
    p += update

What is unclear to me is where the += 0.2 and *= 0.8 come from. I couldn't find anything in the original t-SNE paper and I can't reconcile the updates in the sklearn implementation with the update formula in the paper: tSNE paper gradient update

Does anybody know the logic behind the implementation or how I can reconcile the two?

Thanks in advance.

0

There are 0 best solutions below