Implementations of the `GRU` cell is different from the descriptions

115 Views Asked by Alexey Birukov At 07 June 2025 at 15:08

I need to put GRU cell for inference on certain hardware. And as I just found, definitions, available on Internet from multiple sources, for example, https://en.wikipedia.org/wiki/Gated_recurrent_unit, is not agree with cell implementations on both pytorch (https://pytorch.org/docs/master/generated/torch.nn.GRU.html) and tensorflow, namely

$\hat{h}_t = \phi_h(W_{h} x_t + U_{h} (f_t \odot h_{t-1}) + b_h)$
vs
$n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn}))$

In former case, gate applyed before matrix multiplication, in latter - after.
Me pretty surprized. And can't find any discussion about the issue. GRU already have some variants (see wikipedia), but they can be covered by some maximal implementation, while here we have incompatible versions. To make inference work, I have to make pipeline exactly as on training. Is this all right, so I just must to look cerefully for each possible source of monolithic cell, or is there one right implementation? What is canonical GRU cell for measurments?

Original Q&A

Implementations of the `GRU` cell is different from the descriptions

There are 0 best solutions below

Related Questions in TENSORFLOW

Related Questions in PYTORCH

Related Questions in RECURRENT-NEURAL-NETWORK

Related Questions in GRU

Trending Questions

Popular # Hahtags

Popular Questions