Refer to this post to know the background of the problem: Does the TensorFlow embedding_attention_seq2seq method implement a bidirectional RNN Encoder by default?
I am working on the same model, and want to replace the unidirectional LSTM layer with a Bidirectional layer. I realize I have to use static_bidirectional_rnn instead of static_rnn, but I am getting an error due to some mismatch in the tensor shape.
I replaced the following line:
encoder_outputs, encoder_state = core_rnn.static_rnn(encoder_cell, encoder_inputs, dtype=dtype)
with the line below:
encoder_outputs, encoder_state_fw, encoder_state_bw = core_rnn.static_bidirectional_rnn(encoder_cell, encoder_cell, encoder_inputs, dtype=dtype)
That gives me the following error:
InvalidArgumentError (see above for traceback): Incompatible shapes: [32,5,1,256] vs. [16,1,1,256] [[Node: gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/BroadcastGradientArgs = BroadcastGradientArgs[T=DT_INT32, _device="/job:localhost/replica:0/task:0/cpu:0"](gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape, gradients/model_with_buckets/embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/Attention_0/add_grad/Shape_1)]]
I understand that the outputs of both the methods are different, but I do not know how to modify attention code to incorporate that. How do I send both the forward and backward states to the attention module- do I concatenate both the hidden states?
I find from the error message that the batch size of two tensors somewhere don't match, one is 32 and the other is 16. I suppose it is because the output list of the bidirectional rnn is double sized of that of the unidirectional one. And you just don't adjust to that in the following code accordingly.
You can reference this code: