Problem faced while using GridsearchCV on randomForestClassifier

35 Views Asked by At

I am working on a classification problem related to heart disease using RandomForestClassifier. While performing hyperparameter tuning on RandomForestClassifier, I am facing the following issue. I am using sklearn Pipeline and ColumnTransformer for preprocessing.

Error: 720 fits failed out of a total of 2160.
The score on these train-test partitions for these parameters will be set to nan.
If these failures are not expected, you can try to debug them by setting error_score='raise'.
UserWarning: One or more of the test scores are non-finite
numerical_pipeline = Pipeline(
steps=[('scaler',StandardScaler())]
)

categorical_pipeline = Pipeline(
steps=[('encoder',OneHotEncoder(handle_unknown='ignore'))]  
)

preprocessor = ColumnTransformer(
[('numerical_pipeline',numerical_pipeline,numerical_features),
 ('categorical_pipeline',categorical_pipeline,categorical_features)]`

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.3)`

scaled_X_train = preprocessor.fit_transform(X_train)
scaled_X_test = preprocessor.transform(X_test)`

param_grid={'max_depth':[3,5,10,None],
          'n_estimators':[10,100,200],
          'max_features':[1,3,5,7],
          'min_samples_leaf':[1,2,3],
          'min_samples_split':[1,2,3]
       }

grid = GridSearchCV(RandomForestClassifier(),param_grid=param_grid,cv=5,scoring='accuracy',verbose=True,n_jobs=-1)
grid.fit(scaled_X_train,y_train)
1

There are 1 best solutions below

1
Muhammed Yunus On

From the error message it seems like some of the hyperparameter combinations could be leading to the error condition. Some of your fits run fine but a portion fail. Remove 1 from the list of values for min_samples_split, as it has to be 2 or greater.

If that doesn't resolve the error, add error_score='raise' to GridSearchCV, so that when it encounters an error it will print the full stack trace.