When attempting to use Regression Trees, how do you determine if/ when to use tune_grid() versus fit_resamples()?
I tried these two things:
1.
using tune_grid
tune_spec<- decision_tree(min_n= tune(), tree_depth= tune(), cost_complexity=tune()) %>% set_engine("rpart") %>% set_mode("regression")
tree_grid<- tune_spec %>% extract_parameter_set_dials() %>% grid_regular(levels=3)
set.seed(275)
folds<- vfold_cv(train_set, v=3)
tune_results<- tune_grid(tune_spec, outcome~., resamples= folds, grid= tree_grid, metrics= metric_set(rmse))
That resulted in following error:
factor has new levels... there were issues with some computations
2.
using fit_resamples
tune_results<- fit_resamples(tune_spec, outcome~., resamples= folds, grid= tree_grid, metrics= metric_set(rmse))
That resulted in Error:
! 3 arguments have been tagged for tuning in these components: model_spec.
Please use one of the tuning functions (e.g. `tune_grid()`) to optimize them.
Before I try to figure out what's going wrong, I'd like to know which one I'm supposed to be using in the first place.
You should use
fit_resamples()if you don't have any arguments totune(). Otherwise you should usetune_grid()or finetune variants.So in your situation, since you have used
tune(), you want to usetune_grid(). Which you did. but you are getting the errorfactor has new levels... there were issues with some computations. This is happening because some of your predictors, are categorical, and when then model is being fit inside thetune_grid()it is first trained on the analysis data set, then it predicts on the corresponding assessment data set. One or more of the categorical variables had levels only appear in the assessment data set.One way to deal with this is to use recipes to do preprocessing. The step step_novel() was created to deal with this exact problem.
Then your code would look this this, where I used a
workflow()to combine the recipe and the model specification.