Regression trees with tidymodels

107 Views Asked by Lillian Welsh At 17 June 2023 at 20:23

When attempting to use Regression Trees, how do you determine if/ when to use tune_grid() versus fit_resamples()?

I tried these two things:

1.

using tune_grid

tune_spec<- decision_tree(min_n= tune(), tree_depth= tune(), cost_complexity=tune()) %>% set_engine("rpart") %>% set_mode("regression")
tree_grid<- tune_spec %>% extract_parameter_set_dials() %>% grid_regular(levels=3)
set.seed(275)
folds<- vfold_cv(train_set, v=3)
tune_results<- tune_grid(tune_spec, outcome~., resamples= folds, grid= tree_grid, metrics= metric_set(rmse))

That resulted in following error:

factor has new levels... there were issues with some computations

2.

using fit_resamples

tune_results<- fit_resamples(tune_spec, outcome~., resamples= folds, grid= tree_grid, metrics= metric_set(rmse))

That resulted in Error:

! 3 arguments have been tagged for tuning in these components: model_spec. 
Please use one of the tuning functions (e.g. `tune_grid()`) to optimize them.

Before I try to figure out what's going wrong, I'd like to know which one I'm supposed to be using in the first place.

Original Q&A

There are 1 best solutions below

EmilHvitfeldt On 19 June 2023 at 18:25

You should use fit_resamples() if you don't have any arguments to tune(). Otherwise you should use tune_grid() or finetune variants.

So in your situation, since you have used tune(), you want to use tune_grid(). Which you did. but you are getting the error factor has new levels... there were issues with some computations. This is happening because some of your predictors, are categorical, and when then model is being fit inside the tune_grid() it is first trained on the analysis data set, then it predicts on the corresponding assessment data set. One or more of the categorical variables had levels only appear in the assessment data set.

One way to deal with this is to use recipes to do preprocessing. The step step_novel() was created to deal with this exact problem.

Then your code would look this this, where I used a workflow() to combine the recipe and the model specification.

rec_spec <- recipe(outcome ~ ., data = train_set) %>%
  step_novel(all_nominal_predictors())

tune_spec <- decision_tree(
    min_n = tune(), tree_depth = tune(), cost_complexity = tune()
  ) %>% 
  set_engine("rpart") %>% 
  set_mode("regression")

wf_spec <- workflow(rec_spec, tune_spec)

tree_grid <- wf_spec %>% 
  extract_parameter_set_dials() %>% 
  grid_regular(levels = 3)

set.seed(275)
folds <- vfold_cv(train_set, v=3)

tune_results <- tune_grid(
  wf_spec, 
  resamples = folds, 
  grid = tree_grid, 
  metrics = metric_set(rmse)
)

Regression trees with tidymodels

1.

2.

There are 1 best solutions below

Related Questions in R

Related Questions in CROSS-VALIDATION

Related Questions in HYPERPARAMETERS

Related Questions in TIDYMODELS

Related Questions in PARSNIP

Trending Questions

Popular # Hahtags

Popular Questions