Why do my GRUs training loss and testing loss differ so much from each other?

34 Views Asked by At

I want to use a GRU to detect different sections in a time series. Basically, the input data consists of noisy sections and sections where their value is constant for a couple of time steps. I have synthetically generated data and according labels. The labels are arrays of equal length as the time series and are 0 for the noisy section and 1 for the flat sections.

After training, my network has a MSE loss of around 0.01-0.02. However, even if I predict the network's output on the same training data (or on other instances from the synthetically generated data), the MSE loss ist always around 10 times higher. Also, the network does seem to put up the patterns of the data, since it does generate noisy and flat sections itself for the predictions, but it does not seem to try to approach the values 0 or 1 but stays somewhere inbetween.

  • What could be the reason that training and testing loss are so different? It can't really be overfitting since the loss is even as high if I predict on instances from the training data

  • Can I somehow output the predictions during training that apparently give a loss ouf around 0.01?

  • What are other strategies to debug the network?

This is my code:

from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

with open(f'training_data', 'rb') as f:
    X_train = np.load(f)
    y_train = np.load(f)

X_test = X_train[0]
y_test = y_train[0]
X_train = X_train[1:]
y_train = y_train[1:]

mse = keras.losses.MeanSquaredError()
def GRUmodel():
    units1 = 40
    units2 = 60
    dropout = 0.1
    learning_rate = 0.01
    gru_model = keras.models.Sequential([
        keras.layers.GRU(units1, return_sequences=True, input_shape=[None, 1], dropout=dropout),
        keras.layers.GRU(units2, return_sequences=True, dropout=dropout),
        keras.layers.GRU(1, return_sequences=True)
    ])
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    gru_model.compile(loss=mse, optimizer=optimizer, metrics=['accuracy'])
    return gru_model
name = "grutest.h5"
checkpoint = keras.callbacks.ModelCheckpoint(name)
model = GRUmodel()
X_test = X_test.reshape(-1, 1)
history = model.fit(X_train, y_train, epochs=40, callbacks=[checkpoint])
model = keras.models.load_model(name)
prediction = model.predict(X_test)
prediction = prediction.reshape(-1,)
print('MSE:', mse(y_test, prediction).numpy())
plt.plot(X_test)
plt.plot(y_test)
plt.plot(prediction)
plt.show()

This is what the training instances (blue) and the labels (orange) look like:

Plot of the training data with labels

And this is the output of the network:

enter image description here

I expect the network to output values either close to 0 or close to 1, based on the patterns which it clearly picks up.

0

There are 0 best solutions below