Implementations of the `GRU` cell is different from the descriptions

109 Views Asked by At

I need to put GRU cell for inference on certain hardware. And as I just found, definitions, available on Internet from multiple sources, for example, https://en.wikipedia.org/wiki/Gated_recurrent_unit, is not agree with cell implementations on both pytorch (https://pytorch.org/docs/master/generated/torch.nn.GRU.html) and tensorflow, namely
enter image description here
foo
vs
bar

In former case, gate applyed before matrix multiplication, in latter - after.
Me pretty surprized. And can't find any discussion about the issue. GRU already have some variants (see wikipedia), but they can be covered by some maximal implementation, while here we have incompatible versions. To make inference work, I have to make pipeline exactly as on training. Is this all right, so I just must to look cerefully for each possible source of monolithic cell, or is there one right implementation? What is canonical GRU cell for measurments?

0

There are 0 best solutions below