as we all known, pytorch's LSTM implementation is a layered Bi-directional LSTM.
the first layer's input dimension is supposed to be (L,N,H_in) . If we use bidirectional LSTM, then the output of first layer is (L, N, 2*H_hiddensize) official doc
I can't figure out how this output is fed into the second LSTM layer. will the output of backforward layer and the forward layer be merged or concatenated?
I check the source code of its implementation. source code but i fail to understand it.
layers = [_LSTMLayer(**self.input_size**, self.hidden_size,
self.bias, batch_first=False,
bidirectional=self.bidirectional, **factory_kwargs)]
for layer in range(1, num_layers):
layers.append(_LSTMLayer(**self.hidden_size**, self.hidden_size,
self.bias, batch_first=False,
bidirectional=self.bidirectional,
**factory_kwargs))
for idx, layer in enumerate(self.layers):
x, hxcx[idx] = layer(x, hxcx[idx])
Why the output of first layer (shape: L,N,2H_hiddensize) can be fed into the second layer which expect (shape: L,N, H_hiddensize) but not (shape: L,N,2H_hiddensize)
Yes, the output of bidirectional LSTM will concatenate the last step of forward hidden and the first step of reverse hidden
reference: Pytorch LSTM documentation