sklearn gridsearch scores don't match explicitly evaluated scores

22 Views Asked by At

I'm trying to figure out how does grid search works but can't understand why scores on individual split don't match scores explicitly evaluated on those splits. I'll be more concrete with an example.

My cross validation scheme is:

tscv = TimeSeriesSplit(n_splits=3 , test_size = 1 )

And be [y|X] my sample, with y of shape (n_samples, 2) and X of shape (n_samples, 10). If I define:

clf_Lasso = GridSearchCV(estimator = Lasso(), 
                 param_grid = { 'alpha' : [10] },  
                 refit=True, 
                 cv=tscv, 
                 return_train_score=True,
                 scoring = 'neg_mean_squared_error'
                 )
model_Lasso =  clf_Lasso.fit(X, y)
grid_search_scores_Lasso = pd.DataFrame(
         model_Lasso.cv_results_ )[['param_alpha', 'split0_train_score', 'split1_train_score', 'split2_train_score']

I expect the last line to return a pandas Dataframe with one row only and the mean squared errors evaluated at each of my three split for alpha = 10.

I then run:

mse_Lasso = []
for train, test in tscv.split(X):
    Xcv = X.iloc[train] ; ycv = y.iloc[train]
    Xcv_test = X.iloc[test] ; ycv_test = y.iloc[test]
    tmp = Lasso(alpha=10).fit(Xcv, ycv)
    mse_Lasso.append( mean_squared_error(tmp.predict(Xcv_test), ycv_test) )
      

I expect mse_Lasso to be a list containing the same values of the first row of the previous dataset, having:

from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error

While I get, in the first case:

  param_alpha  split0_train_score  split1_train_score  split2_train_score  
0          10           -8.075127           -8.073908           -8.067685   

and:

[10.227336344351109, 12.195915550359423, 16.63612266112668]

in the second one... What am I doing wrong?

Please help...

PS: if I run multiple values of alphas and select the best one they provide the same prediction

0

There are 0 best solutions below