confusion about pytorch LSTM implementation

512 Views Asked by tom_cat At 12 March 2022 at 05:29

as we all known, pytorch's LSTM implementation is a layered Bi-directional LSTM.

the first layer's input dimension is supposed to be (L,N,H_in) . If we use bidirectional LSTM, then the output of first layer is (L, N, 2*H_hiddensize) official doc

I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?

I check the source code of its implementation. source code but i fail to understand it.

layers = [_LSTMLayer(**self.input_size**, self.hidden_size,
                             self.bias, batch_first=False,
                             bidirectional=self.bidirectional, **factory_kwargs)]

for layer in range(1, num_layers):
    layers.append(_LSTMLayer(**self.hidden_size**, self.hidden_size,
                                     self.bias, batch_first=False,
                                     bidirectional=self.bidirectional,
                                     **factory_kwargs))

for idx, layer in enumerate(self.layers):
    x, hxcx[idx] = layer(x, hxcx[idx])

Why the output of first layer (shape: L,N,2H_hiddensize) can be fed into the second layer which expect (shape: L,N, H_hiddensize) but not (shape: L,N,2H_hiddensize)

Original Q&A

There are 2 best solutions below

Yong On 17 July 2022 at 07:48

I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?

Yes, the output of bidirectional LSTM will concatenate the last step of forward hidden and the first step of reverse hidden

reference: Pytorch LSTM documentation

For bidirectional LSTMs, h_n is not equivalent to the last element of output; the former contains the final forward and reverse hidden states, while the latter contains the final forward hidden state and the initial reverse hidden state.

dhruvbird On 08 July 2023 at 23:16

A bi-directional LSTM can be viewed as 2 independent LSTMs that have nothing to do with each other except that they share the input tensor. The forward LSTM consumes the input in the forward direction whereas the reverse LSTM consumes it in the reverse direction (of the time dimension).

confusion about pytorch LSTM implementation

There are 2 best solutions below

Related Questions in PYTORCH

Related Questions in LSTM

Related Questions in BILSTM

Trending Questions

Popular # Hahtags

Popular Questions