I am following this example and I want to change one part of the code from:
# default RF model
m1 <- randomForest(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with lowest MSE
btree <- which.min(m1$mse)
to it's equivalent ranger-based code. The issue is that ranger doesn't provide access directly to number of trees with the lowest MSE. How can I calculate the and store in a variable (I call this var btree) the number of trees with the lowest MSE?
library(rsample) # data splitting
library(randomForest) # basic implementation
library(ranger) # a faster implementation of randomForest
set.seed(123)
ames_split <- initial_split(AmesHousing::make_ames(), prop = .7)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
# for reproduciblity
set.seed(123)
# default RF model
m1 <- randomForest(
formula = Sale_Price ~ .,
data = ames_train
)
# the equivalent in ranger
m1 <- ranger(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with lowest MSE (randomForest package)
btree <- which.min(m1$mse)
Based on the ranger documentation:
prediction.error: Overall out-of-bag prediction error. For classification this is accuracy (proportion of misclassified observations), for probability estimation the Brier score, for regression the mean squared error and for survival one minus Harrell's C-index.
So if I do:
m1 <- ranger(
formula = Sale_Price ~ .,
data = ames_train
)
# number of trees with highest r2
btree = which.max(m1$prediction.error)
print(btree)
The result is:
[1] 1
which obviously is not right.
I don't think there is a way to get this directly from the
rangeroutputs. But you could run predictions for each tree and calculate it yourself. For example: