Saving and loading neupy algorithm with dill library can return different predictions for the same time period?

263 Views Asked by At

First of all thank you for reading this, and thank you in advance if you can help. This is the algorithm that I´m using for supervised learning:

   # Define neural network
cgnet = algorithms.LevenbergMarquardt(
    connection=[
        layers.Input(XTrain.shape[1]),
        layers.Relu(6),
        layers.Linear(1)
    ],
    mu_update_factor=2,
    mu=0.1,
    shuffle_data=True,
    verbose=True,
    decay_rate=0.1,
    addons=[algorithms.WeightElimination]
)

Cross validation results are good (k=10):

[0.16767815652364237, 0.13396493112368024, 0.19033966833586402, 0.12023567250054788, 0.11826824035439124, 0.13115856672872392, 0.14250003819441104, 0.12729442202775898, 0.31073760721487326, 0.19299511349686768]
[0.9395976956178138, 0.9727526340820827, 0.9410503161549465, 0.9740922179654977, 0.9764171089773663, 0.9707258917808179, 0.9688830174583372, 0.973160633351555, 0.8551738446276884, 0.936661707991699]
MEA: 0.16 (+/- 0.11)
R2: 0.95 (+/- 0.07)

After training I have saved the algorithm with dill:

with open('network-storage.dill', 'wb') as f:
    dill.dump(cgnet, f)

Then if I load the network with dill and consider the X values of the entire training set I get the same R2 (0.9691), until now everything is ok. This are the results:

Prediction of the entire training series (1992 to 2022

If I try to do the same thing but with only the last few years [2018-2022] I get this (prediction of y with X training values (2018 to 2022): [Prediction for part of the training series (2018 to 2022

Instead of this (prediction of y with X training values (1992 to 2022): enter image description here

Why do I get different predictions for the same period when I load different X values range? (X input from 1992 to 2022: y prediction for 1992 to 2022 is ok. (X input from 2018 to 2022: y prediction for 2018 to 2022 is not ok.

This is the code:

import numpy as np
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import dill
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from sklearn.model_selection import KFold
from scipy.interpolate import Rbf
from scipy import stats
from neupy import layers, environment, algorithms
from neupy import plots


# Import data 
data = pd.read_excel('DataAL_Incremento.xlsx', index_col=0, header=1).iloc[:,[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,-1]]
data.columns = ['PPO4L(in)','PPO4(in)','NH4L(in)','NH4(in)','NO3L(in)','NNO3(in)','CBOL(in)', 'CBO(in)','Temp(In)','Temp(alb)','Tair ','Tdew',
                'Wvel','Cl_aL(in)','Cl_a(in)','ODL(in)','OD(in)','Qin(in)','ODalb','PPO4(alb)','NNO3(alb)']


# Add filtered data
tmp0 = data.iloc[:,[9, 6, 14]].rolling(9, center=False, axis=0).mean()
tmp0.columns = ['Temp(alb)_09','CBOL(in)_09','Cl_a(in)_09']
tmp1 = data.iloc[:,[9, 6, 14]].rolling(15, center=False, axis=0).mean()
tmp1.columns = ['Temp(alb)_15', 'CBOL(in)_15','Cl_a(in)_15']
tmp2 = data.iloc[:,[9, 6, 14]].rolling(31, center=False, axis=0).mean()
tmp2.columns = ['Temp(alb)_31', 'CBOL(in)_31','Cl_a(in)_31']
data = pd.concat((data, tmp0, tmp1, tmp2), axis=1)

# Drop empty records
data = data.dropna()

# Define data
X = data.loc[:, ['CBOL(in)', 'CBO(in)','Temp(In)','Temp(alb)','Tair ','Cl_aL(in)','Cl_a(in)','OD(in)','Temp(alb)_31', 'CBOL(in)_31','Cl_a(in)_31']]

y = data.loc[:, ['ODalb']]


years = data.index.year
yearsTrain = range(1992,2022)
yearsTest = 2019,2020,2021

#yearsTrain, yearsTest = train_test_split(np.unique(years), test_size=0.2, train_size=0.8, random_state=None)

XTrain = X.query('@years in @yearsTrain')
yTrain = y.query('@years in @yearsTrain').values.ravel()
XTest = X.query('@years in @yearsTest')
yTest = y.query('@years in @yearsTest').values.ravel()

results = y.query('@years in @yearsTest')


#===============================================================================
# Neural network
#===============================================================================

# Define neural network
cgnet = algorithms.LevenbergMarquardt(
    connection=[
        layers.Input(XTrain.shape[1]),
        layers.Relu(6),
        layers.Linear(1)
    ],
    mu_update_factor=2,
    mu=0.1,
    shuffle_data=True,
    verbose=True,
    decay_rate=0.1,
    addons=[algorithms.WeightElimination]
)

# Scale
XScaler = StandardScaler()
XScaler.fit(XTrain)
XTrainScaled = XScaler.transform(XTrain)
XTestScaled = XScaler.transform(XTest)

yScaler = StandardScaler()
yScaler.fit(yTrain.reshape(-1, 1))
yTrainScaled = yScaler.transform(yTrain.reshape(-1, 1)).ravel()
yTestScaled = yScaler.transform(yTest.reshape(-1, 1)).ravel()

# Train 
cgnet.train(XTrainScaled, yTrainScaled, XTestScaled, yTestScaled, epochs=30)

yEstTrain = yScaler.inverse_transform(cgnet.predict(XTrainScaled).reshape(-1, 1)).ravel()
mae = np.mean(np.abs(yTrain-yEstTrain))
results['ANN'] = yScaler.inverse_transform(cgnet.predict(XTestScaled).reshape(-1, 1)).ravel()

# Metrics
mse  = np.mean((yTrain-yEstTrain)**2)
mseTes = np.mean((yTest-results['ANN'])**2)
maeTes = np.mean(np.abs(yTest-results['ANN']))
meantrain = np.mean(yTrain)
ssTest = (yTrain-meantrain)**2
r2=(1-(mse/(np.mean(ssTest))))
meantest = np.mean(yTest)
ssTrain = (yTest-meantest)**2
r2Tes=(1-(mseTes/(np.mean(ssTrain))))


# Plot results
print("NN MAE: %f (All), %f (Test) " % (mae, maeTes))
print ("NN MSE: %f (All), %f (Test) " % (mse, mseTes))
print ("NN R2: %f (All), %f (Test) " % (r2, r2Tes))

results.plot()
plt.show(block=True)

plots.error_plot(cgnet)
plt.show(block=True)

plt.scatter(yTest,results['ANN'])
plt.xlabel('True Values')
plt.ylabel('Predictions')

plt.show(block=True)


#===============================================================================
# Save algorithms - Neural network
#===============================================================================

with open('network-storage.dill', 'wb') as f:
    dill.dump(cgnet, f)

#===============================================================================
# Load algorithms - Neural network
#===============================================================================

#Prepare data

dataVal = pd.read_excel('DataAL_IncrementoTeste.xlsx', index_col=0, header=1).iloc[:,[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,-1]]

dataVal.columns = ['PPO4L(in)','PPO4(in)','NH4L(in)','NH4(in)','NO3L(in)','NNO3(in)','CBOL(in)', 'CBO(in)','Temp(In)','Temp(alb)','Tair ','Tdew',
                   'Wvel','Cl_aL(in)','Cl_a(in)','ODL(in)','OD(in)','Qin(in)','ODalb','PPO4(alb)','NNO3(alb)']


# Add filtered data
tmp0 = dataVal.iloc[:,[9, 6, 14]].rolling(9, center=False, axis=0).mean()
tmp0.columns = ['Temp(alb)_09','CBOL(in)_09','Cl_a(in)_09']
tmp1 = dataVal.iloc[:,[9, 6, 14]].rolling(15, center=False, axis=0).mean()
tmp1.columns = ['Temp(alb)_15', 'CBOL(in)_15','Cl_a(in)_15']
tmp2 = dataVal.iloc[:,[9, 6, 14]].rolling(31, center=False, axis=0).mean()
tmp2.columns = ['Temp(alb)_31', 'CBOL(in)_31','Cl_a(in)_31']
dataVal = pd.concat((dataVal, tmp0, tmp1, tmp2), axis=1)

# Drop empty records (removes adjacent columns)
dataVal = dataVal.dropna()

# Define data
Xval = dataVal.loc[:, ['CBOL(in)', 'CBO(in)','Temp(In)','Temp(alb)','Tair ','Cl_aL(in)','Cl_a(in)','OD(in)','Temp(alb)_31', 'CBOL(in)_31','Cl_a(in)_31']]
yval = dataVal.loc[:, ['ODalb']]

years = dataVal.index.year
yearsTrain = range(2018,2022)

XFinalVal = Xval.query('@years in @yearsTrain')
yFinalVal = yval.query('@years in @yearsTrain').values.ravel()
resultsVal = yval.query('@years in @yearsTrain')


# Load algorithms 
with open('network-storage.dill', 'rb') as f:
    cgnet = dill.load(f)
# Scale X
    XScaler = StandardScaler()
    XScaler.fit(XFinalVal)
    XFinalScaled = XScaler.transform(XFinalVal)

# Scale y  
    yScaler = StandardScaler()
    yScaler.fit(yFinalVal.reshape(-1, 1))
    yTrainScaled = yScaler.transform(yFinalVal.reshape(-1, 1)).ravel()
# Predict
    y_predicted = yScaler.inverse_transform(cgnet.predict(XFinalScaled).reshape(-1, 1)).ravel()

    resultsVal['ANN'] = y_predicted
    scoreMean = metrics.mean_absolute_error(yFinalVal, y_predicted)
    scoreR2 = metrics.r2_score(yFinalVal, y_predicted)


print(scoreMean)
print(scoreR2)


plt.scatter(yFinalVal,y_predicted)

plt.xlabel('True Values')
plt.ylabel('Predictions')

plt.show(block=True)

resultsVal.plot()
plt.show(block=True)


#===============================================================================
# Cross validation - Neural network
#===============================================================================
XScaler = StandardScaler()
XScaler.fit(XTrain)
XTrainScaled = XScaler.transform(XTrain)
XTestScaled = XScaler.transform(XTest)

yScaler = StandardScaler()
yScaler.fit(yTrain.reshape(-1, 1))
yTrainScaled = yScaler.transform(yTrain.reshape(-1, 1)).ravel()
yTestScaled = yScaler.transform(yTest.reshape(-1, 1)).ravel()

kfold = KFold(n_splits=10, shuffle=True, random_state=None)
scoresMean = []   
scoresR2 = [] 

for train, test in kfold.split(XTrainScaled):
    x_train, x_test = XTrainScaled[train], XTrainScaled[test]
    y_train, y_test = yTrainScaled[train], yTrainScaled[test]

    cgnet = algorithms.LevenbergMarquardt(
        connection=[
            layers.Input(XTrain.shape[1]),
            layers.Relu(6),
            layers.Linear(1)
        ],
        mu_update_factor=2,
        mu=0.1,
        shuffle_data=True,
        verbose=True,
        decay_rate=0.1,
        addons=[algorithms.WeightElimination]
    )

    cgnet.train(x_train, y_train, epochs=100)
    y_predicted = cgnet.predict(x_test)

    scoreMean = metrics.mean_absolute_error(y_test, y_predicted)
    scoreR2 = metrics.r2_score(y_test, y_predicted)
    scoresMean.append(scoreMean)
    scoresR2.append(scoreR2)

print(scoresMean)
print(scoresR2)
scoresMean = np.array(scoresMean)
scoresR2 = np.array(scoresR2)

print("MEA: %0.2f (+/- %0.2f)" % (scoresMean.mean(), scoresMean.std() * 2))

print("R2: %0.2f (+/- %0.2f)" % (scoresR2.mean(), scoresR2.std() * 2))
1

There are 1 best solutions below

1
On BEST ANSWER

I think that one of the problems might be with the scaling that you apply before the training. In the training stage you fit scaler function using training data

XScaler = StandardScaler()
XScaler.fit(XTrain)

But after you loaded network using dill you've fitted scaler with different data (validation data specificaly)

XScaler = StandardScaler()
XScaler.fit(XFinalVal)

In the second case, you use different scaling for the prediction which network hasn't seen during the training. New scaling might create different distrubition of the samples compare to the one that networks expects.

In order to make effect from the training reproducible you also need to save XScaler and load it at the same time when you load network.

Everything that I've described also true for the yScaler