Hyperparameter tuning evaluation set

63 Views Asked by At

I want to compare the performance of an estimator with randomly set parameters, and one that has been tuned via grid search.

I understand that this is the general procedure of conducting a grid search with cross validation

X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=0)
param_grid = {'C': [0.01, 0.1, 1, 10, 100],
              'gamma': [0.001, 0.01, 0.1, 1, 10, 100]}

grid_search = GridSearchCV(SVC(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

grid_search.best_params
# evaluation
grid_search.score(X_test, y_test)

Calling .fit() performs cross validation on the training set and avoids the need to manually split the training set into another validation set, saving the final test set for evaluation when the model is trained using the ideal parameters.

However, if the purpose of a grid search is to find the ideal parameters, why is there a need to evaluate on a test set? The final score is hence inconsequential as the parameters are already fixed.

Would it hence be possible to call .fit(iris.data, iris.target) and cross-validate using the entire dataset to obtain the parameters, then perform a separate cross-validation using the entire dataset to compare the performance of the tuned model and the random model?

1

There are 1 best solutions below

2
On

I want to compare the performance of an estimator with randomly set parameters, and one that has been tuned via grid search.

There's a chance that a model with randomly-set hyperparameters will outperform the grid search, so you'd want to try multiple random initialisations. Could you explain what the motivation is for this set up? If you're after a baseline model to compare against the tuned model, then the default parameters could be used.

However, if the purpose of a grid search is to find the ideal parameters, why is there a need to evaluate on a test set? The final score is hence inconsequential as the parameters are already fixed.

After choosing your final hyperparameters using grid search, the test score is used to estimate how well the tuned model generalises to new and unseen data. You can't use the training data to estimate generalisation ability because the tuned model has already seen the training data.

Would it hence be possible to call .fit(iris.data, iris.target) and cross-validate using the entire dataset to obtain the parameters, then perform a separate cross-validation using the entire dataset to compare the performance of the tuned model and the random model?

If you just want to compare two models against each other, and you don't care about how a model would cope with new data, you can ditch the test set and do as you've described above. However, without a test set, you won't be able to report an unbiased measure of generalisation ability. If you can afford to, consider keeping a small test set (e.g. 5%) just in case you need it.