Why i've got a three different MSE values

280 Views Asked by At

I wrote an mlp and want start to tune it to fit a best results. But i've stucked with several different MSE.

from pandas import read_csv
from keras.models import Sequential
from keras.layers import Dense
from keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn import metrics
import numpy
import joblib

# load dataset
#dataframe = read_csv("housing.csv", delim_whitespace=True, header=None)
dataframe = read_csv("100.csv",  header=None)
dataset = dataframe.values
# split into input (X) and output (Y) variables
X = dataset[:,0:6]
Y = dataset[:,6]
# define the model
def larger_model():
    # create model
    model = Sequential()
    model.add(Dense(20, input_dim=6, kernel_initializer='normal', activation='relu'))
    model.add(Dense(50, kernel_initializer='normal', activation='relu'))
    model.add(Dense(1, kernel_initializer='normal', activation='linear'))
    # Compile model
    model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mae','mse'])
    return model
# evaluate model with standardized dataset
estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('mlp', KerasRegressor(build_fn=larger_model, epochs=100, batch_size=5, verbose=1)))
pipeline = Pipeline(estimators)
kfold = KFold(n_splits=2)
results = cross_val_score(pipeline, X, Y, cv=kfold)
pipeline.fit(X, Y)
prediction = pipeline.predict(X)

result_test = Y

print("%.2f (%.2f) MSE" % (results.mean(), results.std()))
print('Mean Absolute Error:', metrics.mean_absolute_error(prediction, result_test))
print('Mean Squared Error:', metrics.mean_squared_error(prediction, result_test))

Gives me that result:

Epoch 98/100
200/200 [==============================] - 0s 904us/step - loss: 0.0086 - mae: 0.0669 - mse: 0.0086
Epoch 99/100
200/200 [==============================] - 0s 959us/step - loss: 0.0032 - mae: 0.0382 - mse: 0.0032
Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973
200/200 [==============================] - 0s 600us/step
21.959478

-0.03 (0.02) MSE
Mean Absolute Error: 0.1959771416462339
Mean Squared Error: 0.0705598179059006

So i see here a 3 different mse results. Why so and which one i should take in mind to understand an overall model score when i willbe tune it?

1

There are 1 best solutions below

1
On BEST ANSWER

Basically what I understood was if you print the results variable then you will get 2 MSE because you used n_splits=2.

-0.03 (0.02) MSE

Above output is the mean or average of the results(MSE) and std of the results(MSE).

Epoch 100/100
200/200 [==============================] - 0s 894us/step - loss: 0.0973 - mae: 0.2052 - mse: 0.0973

Above outputs mse = 0.0973 this is I think for split=2 and it will take only 50% of whole data(X) because remaining 50% it will take as validation data.

Mean Squared Error: 0.0705598179059006

Above output is coming where you are predicting on whole data, not 50% by using best model so obviously, you will get 3 different MSEs for the above 3 prints.

I am also solving a very similar kind of problem, so do one thing divide the dataset into train and test and use train data for training and when you are predicting use test dataset then calculate MSE on test data or else keep this as it is and take Mean Squared Error: 0.0705598179059006 as your final mse.