Hello I have the following ranger model:
X <- train_df[, -1]
y <- train_df$Price
rf_model <- ranger(Price ~ ., data = train_df, mtry = 11 ,splitrule = "extratrees" ,min.node.size = 1, num.trees =100)
I am trying to accomplish two things,
- Give me an average performance metric, cross-validating across non intersecting variance data sets, and give me a more stable accuracy metric, despite the change in seed value
- Set up cross validation to find the most optimal mtry, and num.trees combo.
What I have tried:
**The following worked for optimizing for mtry,splitrule and min.node.size, but I can not add the number of trees into the equation, as it gives me an error in the case of doing so. ** # define the parameter grid to search over param_grid <- expand.grid(mtry = c(1:ncol(X)), splitrule = c( "variance", "extratrees", "maxstat"), min.node.size = c(1, 5, 10))
# set up the cross-validation scheme
cv_scheme <- trainControl(method = "cv",
number = 5,
verboseIter = TRUE)
# perform the grid search using caret
rf_model <- train(x = X,
y = y,
method = "ranger",
trControl = cv_scheme,
tuneGrid = param_grid)
# view the best parameter values
rf_model$bestTune
One easy way to do it, is to add a
num.treesargument intrainand iterate over that argument.The other way is to create your customized model see this chapter Using Your Own Model
there is an RPubs paper by Pham Dinh Khanh demonstrating that here
Created on 2023-04-19 with reprex v2.0.2