How to add keras attention layer in seq2seq encoder decoder model?

1.3k Views Asked by At

I was trying to perform character level translation using keras seq2seq model, but I'm unable to add attention layer.

I took the reference of keras seq2seq documentation. https://keras.io/examples/nlp/lstm_seq2seq/

#Encoder
encoder_inputs = Input(shape=(None, num_encoder_tokens))

#Encoder Bi-LSTM 1
enc_lstm1 = Bidirectional(LSTM(512,return_sequences=True,return_state=True,dropout=0.2, name="LSTM_1"))
encoder_outputs1, forw_state_h, forw_state_c, back_state_h, back_state_c = enc_lstm1(encoder_inputs)


#Encoder Bi-LSTM Combine
final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])

#Encoder States
encoder_states =[final_enc_h, final_enc_c]

#Decoder
decoder_inputs = Input(shape=(None, num_decoder_tokens))

#Decoder LSTM
decoder_lstm = LSTM(1024, return_sequences=True, return_state=True, dropout=0.4) 
decoder_outputs, _, _ = decoder_lstm(decoder_inputs, initial_state=encoder_states)


##################
attention_layer = Attention()([encoder_outputs,decoder_outputs])
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attention_layer])

#Dense
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
final_output = decoder_dense(decoder_concat_input)

model = Model(inputs=[encoder_inputs, decoder_inputs], 
              outputs=decoder_outputs)

##########################

model.summary()

Model.summary() doesn't include attention layer.

Model: "model_4"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_15 (InputLayer)          [(None, None, 149)]  0           []                               
                                                                                                  
 bidirectional_21 (Bidirectiona  [(None, None, 1024)  2711552    ['input_15[0][0]']               
 l)                             , (None, 512),                                                    
                                 (None, 512),                                                     
                                 (None, 512),                                                     
                                 (None, 512)]                                                     
                                                                                                  
 bidirectional_22 (Bidirectiona  [(None, None, 1024)  6295552    ['bidirectional_21[0][0]']       
 l)                             , (None, 512),                                                    
                                 (None, 512),                                                     
                                 (None, 512),                                                     
                                 (None, 512)]                                                     
                                                                                                  
 bidirectional_23 (Bidirectiona  [(None, None, 1024)  6295552    ['bidirectional_22[0][0]']       
 l)                             , (None, 512),                                                    
                                 (None, 512),                                                     
                                 (None, 512),                                                     
                                 (None, 512)]                                                     
                                                                                                  
 input_16 (InputLayer)          [(None, None, 73)]   0           []                               
                                                                                                  
 concatenate_14 (Concatenate)   (None, 1024)         0           ['bidirectional_23[0][1]',       
                                                                  'bidirectional_23[0][3]']       
                                                                                                  
 concatenate_15 (Concatenate)   (None, 1024)         0           ['bidirectional_23[0][2]',       
                                                                  'bidirectional_23[0][4]']       
                                                                                                  
 lstm_7 (LSTM)                  [(None, None, 1024)  4497408     ['input_16[0][0]',               
                                , (None, 1024),                   'concatenate_14[0][0]',         
                                 (None, 1024)]                    'concatenate_15[0][0]']         
                                                                                                  
==================================================================================================
Total params: 19,800,064
Trainable params: 19,800,064
Non-trainable params: 0
__________________________

And when I try to compile and train the model

optimizer = tf.keras.optimizers.Adam()
model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy', 

###########################################

model.fit(X_train, y_train,epochs = 10)

I get following error.

Epoch 1/10
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-24-28437da6d50a> in <module>()
      6     steps_per_epoch = steps_per_epoch,
      7     validation_data = data_batch_generator(X_test, y_test),
----> 8     validation_steps=steps_per_epoch_val,
      9     #validation_split=0.2,
     10     #callbacks=callbacks_list

1 frames
/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     53     ctx.ensure_initialized()
     54     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 55                                         inputs, attrs, num_outputs)
     56   except core._NotOkStatusException as e:
     57     if name is not None:

InvalidArgumentError: Graph execution error:

When I tried the same architecture without attention, it works. Please help me to solve this error. Thank you in advance.

1

There are 1 best solutions below

0
On

One thing I noticed is that you never defined "encoder_outputs" in the snippet that you posted. Consider changing the Attention line to Attention()([encoder_outputs1,decoder_outputs]). You should also consider placing the attention layer before the decoder LSTM. Moreover, you might need an embedding layer in both the encoder and decoder. Look at the decoder code below

decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_embedding = tf.keras.layers.Embedding(latent_dim, embed_dim, input_length=Max_length_of_sequences, mask_zero=True))

attention_layer = keras.layers.Attention()

attention_sequence = attention_layer(inputs=[decoder_embedding, encoder_outputs1], mask=[decoder_embedding._keras_mask, encoder_outputs1._keras_mask])

normalization = keras.layers.LayerNormalization()
attention_sequence = normalization(decoder_embedding + attention_sequence)

decoder_LSTM = LSTM(
        latent_dim, return_sequences=True, return_state=True, dropout=0.1, 
        recurrent_dropout=0.1)

decoder_outputs, _, _ = decoder_LSTM(attention_sequence, initial_state=encoder_states)


dense_layer = keras.layers.Dense(num_decoder_tokens, activation='softmax')

final_output = dense_layer(decoder_concat_input)