I need to put GRU
cell for inference on certain hardware. And as I just found, definitions, available on Internet from multiple sources, for example, https://en.wikipedia.org/wiki/Gated_recurrent_unit, is not agree with cell implementations on both pytorch
(https://pytorch.org/docs/master/generated/torch.nn.GRU.html) and tensorflow
, namely
vs
In former case, gate applyed before matrix multiplication, in latter - after.
Me pretty surprized. And can't find any discussion about the issue. GRU
already have some variants (see wikipedia), but they can be covered by some maximal implementation, while here we have incompatible versions. To make inference work, I have to make pipeline exactly as on training. Is this all right, so I just must to look cerefully for each possible source of monolithic cell, or is there one right implementation? What is canonical GRU
cell for measurments?