I am working with an LSTM model on time series data. My idea is to add recurrent dropout to the LSTM layer to mitigate overfitting. By default, Keras uses a cuDNN kernel to optimize training. According to the official Keras documentation, recurrent_dropout is not compatible with cuDNN. One of the creators of Keras, Francois Chollet, recommends setting unroll = True in the LSTM layer to optimally run training with the previously mentioned dropout.
The model I am working with is as follows:
inputs = keras.Input(shape=(sequence_length, raw_data.shape[-1]))
x = layers.LSTM(32, recurrent_dropout = 0.25, unroll=True)(inputs)
x = layers.BatchNormalization()(x) # Avoid exploding gradient
outputs = layers.Dense(1)(x)
model = keras.Model(inputs, outputs)
callbacks = [
keras.callbacks.ModelCheckpoint("basic_lstm6.tf",
save_best_only=True)
]
model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
history = model.fit(
train_dataset,
epochs=20,
validation_data=val_dataset_except_last,
callbacks=callbacks
)
When training starts, I get the following error:
WARNING:tensorflow:Layer lstm_7 will not use cuDNN kernels since it doesn't meet the criteria. It will use a generic GPU kernel as fallback when running on GPU.
Epoch 1/20
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-19-caf68cfbe99f> in <cell line: 13>()
11
12 model.compile(optimizer="rmsprop", loss="mse", metrics=["mae"])
---> 13 history = model.fit(
14 train_dataset,
15 epochs=20,
1 frames
/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__train_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: Exception encountered when calling layer 'lstm_7' (type LSTM).
Unrolling requires a fixed number of timesteps.
Call arguments received by layer 'lstm_7' (type LSTM):
• inputs=tf.Tensor(shape=(None, None, 2), dtype=float32)
• mask=None
• training=True
• initial_state=None
The "train_dataset" dataset is calculated as follows:
raw_data = merged_df[['VIX', 'Bid price']].to_numpy()
num_train_samples = int(0.5 * len(raw_data))
num_val_samples = int(0.25 * len(raw_data))
num_test_samples = len(raw_data) - num_train_samples - num_val_samples
from tensorflow import keras
sampling_rate = 2 #
sequence_length = 40
delay = sampling_rate * (sequence_length + 24 - 1)
batch_size = 100
train_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=bid_prices[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=0,
end_index=num_train_samples)
val_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=bid_prices[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=num_train_samples,
end_index=num_train_samples + num_val_samples)
test_dataset = keras.utils.timeseries_dataset_from_array(
raw_data[:-delay],
targets=bid_prices[delay:],
sampling_rate=sampling_rate,
sequence_length=sequence_length,
shuffle=True,
batch_size=batch_size,
start_index=num_train_samples + num_val_samples)
I used timeserie_dataset_from_array() and when I print the shape of a batch, for example, from train_dataset, it returns the shape: samples shape: (100, 40, 2) targets shape: (100)
The code is similar to the one in the book "Deep Learning with Python, Second Edition" by Francois Chollet (Chapter 10). You can find the code on his GitHub: https://github.com/fchollet/deep-learning-with-python-notebooks/blob/master/chapter10_dl-for-timeseries.ipynb