I am building a LSTM network using elmo embeddings with keras. My objective is to minimize the RMSE. The elmo embeddings are obtained using the following code segment:
def ElmoEmbedding(x):
return elmo_model(inputs={
"tokens": tf.squeeze(tf.cast(x, tf.string)),
"sequence_len": tf.constant(batch_size*[max_len])
},
signature="tokens",
as_dict=True)["elmo"]
The model is defined as below:
def create_model(max_len):
input_text = Input(shape=(max_len,), dtype=tf.string)
embedding = Lambda(ElmoEmbedding, output_shape=(max_len, 1024))(input_text)
x = Bidirectional(LSTM(units=512, return_sequences=False,
recurrent_dropout=0.2, dropout=0.2))(embedding)
out = Dense(1, activation = "relu")(x)
model = Model(input_text, out)
return model
The model is compiled as:
model.compile(optimizer = "rmsprop", loss = root_mean_squared_error,
metrics =[root_mean_squared_error])
And then trained as:
model.fit(np.array(X_tr), y_tr, validation_data=(np.array(X_val), y_val),
batch_size=batch_size, epochs=5, verbose=1)
The root_mean_square_error is defined as:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))
The dataset size I have is 9652 consisting of sentences and the label is a numeric value. The dataset is divided into train and validation set. The maximum sentence length is 142. I added padding (PAD) to make each sentence of length 142. So, a sentence looks like this:
['france', 'is', 'hunting', 'down', 'its', 'citizens', 'who', 'joined', 'twins', 'without', 'trial', 'in', 'iraq']
['france', 'is', 'hunting', 'down', 'its', 'citizens', 'who', 'joined', 'twins', 'without', 'trial', 'in', 'iraq', '__PAD__', '__PAD__', '__PAD__',...., '__PAD__']
When I train this model, I get the following output
Train on 8704 samples, validate on 928 samples
Epoch 1/5
8704/8704 [==============================] - 655s 75ms/step - loss: 0.9960 -
root_mean_squared_error: 0.9960 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 2/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 -
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 3/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 -
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 4/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 -
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Epoch 5/5
8704/8704 [==============================] - 650s 75ms/step - loss: 0.9354 -
root_mean_squared_error: 0.9354 - val_loss: 0.9389 - val_root_mean_squared_error: 0.9389
Both the loss and metric do not improve and remain same from epoch 2-5.
I am not sure what is wrong here? Any help would be appreciated.