I'm currently working on a forecasting project using LSTM , The code is the following:
def lstm_mulv(neurons, activation):
model = Sequential()
model.add(LSTM(units=neurons, return_sequences=True, input_shape=(1, 2), activation=activation))
model.add(LSTM(units=neurons, return_sequences=True, activation=activation))
model.add(LSTM(units=neurons, return_sequences=True,activation=activation))
model.add(LSTM(units=neurons, activation=activation))
model.add(Dense(units=1, activation=activation))
model.compile(optimizer='adam', loss='mean_squared_error')
return model
model_mul = KerasRegressor(build_fn=lstm_mulv, epochs=100, batch_size=5)
grid_mul = GridSearchCV(estimator=model_mul, param_grid=param_grid, cv=5, scoring=['neg_mean_squared_error','neg_root_mean_squared_error','neg_mean_absolute_percentage_error'],refit='neg_root_mean_squared_error')
grid_result_mul = grid_mul.fit(train_X, train_y)
mul_means_mse = grid_result_mul.cv_results_['mean_test_neg_mean_squared_error']
mul_means_rmse = grid_result_mul.cv_results_['mean_test_neg_root_mean_squared_error']
mul_means_mape = grid_result_mul.cv_results_['mean_test_neg_mean_absolute_percentage_error']
mul_params = grid_result_mul.cv_results_['params']
for mul_means_mse, mul_means_rmse, mul_means_mape, mul_params in zip(mul_means_mse, mul_means_rmse, mul_means_mape, mul_params):
print("mse = %f , rmse = %f , mape = %f with: %r" % (mul_means_mse, mul_means_rmse, mul_means_mape, mul_params))
however I'm facing an issue with the Tanh activation function. Specifically, during the tuning process, my LSTM model produces NaN values, but other activation function works just fine. Does anybody know why> (https://i.stack.imgur.com/lhJ2F.png)
During the tuning phase, I already scaled my data using MinMaxScaller. I expected the training process to proceed smoothly without encountering NaN values, because Tanh activation function has been used widely in LSTM networks