I was trying to perform character level translation using keras seq2seq model, but I'm unable to add attention layer.
I took the reference of keras seq2seq documentation. https://keras.io/examples/nlp/lstm_seq2seq/
#Encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))
#Encoder Bi-LSTM 1
enc_lstm1 = Bidirectional(LSTM(512,return_sequences=True,return_state=True,dropout=0.2, name="LSTM_1"))
encoder_outputs1, forw_state_h, forw_state_c, back_state_h, back_state_c = enc_lstm1(encoder_inputs)
#Encoder Bi-LSTM Combine
final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])
#Encoder States
encoder_states =[final_enc_h, final_enc_c]
#Decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))
#Decoder LSTM
decoder_lstm = LSTM(1024, return_sequences=True, return_state=True, dropout=0.4)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)
##################
attention_layer = Attention()([encoder_outputs,decoder_outputs])
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attention_layer])
#Dense
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
final_output = decoder_dense(decoder_concat_input)
model = Model(inputs=[encoder_inputs, decoder_inputs],
outputs=decoder_outputs)
##########################
model.summary()
Model.summary() doesn't include attention layer.
Model: "model_4"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_15 (InputLayer) [(None, None, 149)] 0 []
bidirectional_21 (Bidirectiona [(None, None, 1024) 2711552 ['input_15[0][0]']
l) , (None, 512),
(None, 512),
(None, 512),
(None, 512)]
bidirectional_22 (Bidirectiona [(None, None, 1024) 6295552 ['bidirectional_21[0][0]']
l) , (None, 512),
(None, 512),
(None, 512),
(None, 512)]
bidirectional_23 (Bidirectiona [(None, None, 1024) 6295552 ['bidirectional_22[0][0]']
l) , (None, 512),
(None, 512),
(None, 512),
(None, 512)]
input_16 (InputLayer) [(None, None, 73)] 0 []
concatenate_14 (Concatenate) (None, 1024) 0 ['bidirectional_23[0][1]',
'bidirectional_23[0][3]']
concatenate_15 (Concatenate) (None, 1024) 0 ['bidirectional_23[0][2]',
'bidirectional_23[0][4]']
lstm_7 (LSTM) [(None, None, 1024) 4497408 ['input_16[0][0]',
, (None, 1024), 'concatenate_14[0][0]',
(None, 1024)] 'concatenate_15[0][0]']
==================================================================================================
Total params: 19,800,064
Trainable params: 19,800,064
Non-trainable params: 0
__________________________
And when I try to compile and train the model
optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy',
###########################################
model.fit(X_train, y_train,epochs = 10)
I get following error.
Epoch 1/10
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
<ipython-input-24-28437da6d50a> in <module>()
6 steps_per_epoch = steps_per_epoch,
7 validation_data = data_batch_generator(X_test, y_test),
----> 8 validation_steps=steps_per_epoch_val,
9 #validation_split=0.2,
10 #callbacks=callbacks_list
1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
53 ctx.ensure_initialized()
54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55 inputs, attrs, num_outputs)
56 except core._NotOkStatusException as e:
57 if name is not None:
InvalidArgumentError: Graph execution error:
When I tried the same architecture without attention, it works. Please help me to solve this error. Thank you in advance.
One thing I noticed is that you never defined "encoder_outputs" in the snippet that you posted. Consider changing the Attention line to Attention()([encoder_outputs1,decoder_outputs]). You should also consider placing the attention layer before the decoder LSTM. Moreover, you might need an embedding layer in both the encoder and decoder. Look at the decoder code below