GRU Pytorch Keras and Formula in Wiki are different

111 Views Asked by At

I am implementing GRU in iOS device using cblas library. And I used formula of GRU from Wiki, and also the same formula like in Wikipedia I learned on Coursera. And I found that results with the same weights in my implementation and tf.Keras are different. After debugging I found that GRU in Keras and Torch use different formula for calculating h_t:

In wiki formula next:

h_t = (1 - z) * h_t_previous + z * h_tilda.

When in Keras and Torch:

h_t = (1 - z) * h_tilda + z * h_t_previous.

Can someone explain why they are different?? Also it is logically that update gate multiplies new value (What I would like to update from new value), not? Fun fact, that, MPSGRUDescriptor has flipOutputGates variable for handling this crutch with this two formulas.

0

There are 0 best solutions below