Why do my GRUs training loss and testing loss differ so much from each other?

43 Views Asked by Markus L At 07 June 2025 at 15:19

I want to use a GRU to detect different sections in a time series. Basically, the input data consists of noisy sections and sections where their value is constant for a couple of time steps. I have synthetically generated data and according labels. The labels are arrays of equal length as the time series and are 0 for the noisy section and 1 for the flat sections.

After training, my network has a MSE loss of around 0.01-0.02. However, even if I predict the network's output on the same training data (or on other instances from the synthetically generated data), the MSE loss ist always around 10 times higher. Also, the network does seem to put up the patterns of the data, since it does generate noisy and flat sections itself for the predictions, but it does not seem to try to approach the values 0 or 1 but stays somewhere inbetween.

What could be the reason that training and testing loss are so different? It can't really be overfitting since the loss is even as high if I predict on instances from the training data
Can I somehow output the predictions during training that apparently give a loss ouf around 0.01?
What are other strategies to debug the network?

This is my code:

from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

with open(f'training_data', 'rb') as f:
    X_train = np.load(f)
    y_train = np.load(f)

X_test = X_train[0]
y_test = y_train[0]
X_train = X_train[1:]
y_train = y_train[1:]

mse = keras.losses.MeanSquaredError()
def GRUmodel():
    units1 = 40
    units2 = 60
    dropout = 0.1
    learning_rate = 0.01
    gru_model = keras.models.Sequential([
        keras.layers.GRU(units1, return_sequences=True, input_shape=[None, 1], dropout=dropout),
        keras.layers.GRU(units2, return_sequences=True, dropout=dropout),
        keras.layers.GRU(1, return_sequences=True)
    ])
    optimizer = keras.optimizers.Adam(learning_rate=learning_rate)
    gru_model.compile(loss=mse, optimizer=optimizer, metrics=['accuracy'])
    return gru_model
name = "grutest.h5"
checkpoint = keras.callbacks.ModelCheckpoint(name)
model = GRUmodel()
X_test = X_test.reshape(-1, 1)
history = model.fit(X_train, y_train, epochs=40, callbacks=[checkpoint])
model = keras.models.load_model(name)
prediction = model.predict(X_test)
prediction = prediction.reshape(-1,)
print('MSE:', mse(y_test, prediction).numpy())
plt.plot(X_test)
plt.plot(y_test)
plt.plot(prediction)
plt.show()

This is what the training instances (blue) and the labels (orange) look like:

And this is the output of the network:

I expect the network to output values either close to 0 or close to 1, based on the patterns which it clearly picks up.

Original Q&A

Why do my GRUs training loss and testing loss differ so much from each other?

There are 0 best solutions below

Related Questions in TENSORFLOW

Related Questions in KERAS

Related Questions in RECURRENT-NEURAL-NETWORK

Related Questions in GRU

Trending Questions

Popular # Hahtags

Popular Questions