randomforest variable importance: results of mlr3 is different from randomForest packages

74 Views Asked by At

i want to know why the results are different? Thanks for the help!

mlr3

library(mlr3)
library(mlr3verse)
library(mlr3learners)
library(randomForest)
library(tidyverse)
library(tidymodels)

tasks = as_task_classif(iris, target = 'Species')
learners = lrn("classif.randomForest", predict_type = 
"prob",importance= c('gini'))
set.seed(123, kind = "Mersenne-Twister")

mlr3_result = learners$train(tasks)
mlr3_result$model
Call:
 randomForest(formula = formula, data = data, classwt = classwt,      cutoff = cutoff, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          3        47        0.06

a$model$importance

   setosa  versicolor   virginica MeanDecreaseAccuracy MeanDecreaseGini
Petal.Length 0.335345171 0.307701242 0.294076163          0.310261341        43.088229
Petal.Width  0.330789167 0.311838647 0.273864031          0.303464596        44.258135
Sepal.Length 0.037133413 0.020715911 0.042752581          0.034013485         9.715093
Sepal.Width  0.008714192 0.004354224 0.008025792          0.006962512         2.238209


randomForest

set.seed(123, kind = "Mersenne-Twister")
randomForest_result <- randomForest(iris[,1:4], 
                    iris$Species, 
                    importance = TRUE)
randomForest_result

Call:
 randomForest(x = iris[1:4], y = iris$Species, importance = TRUE) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08

randomForest_result[["importance"]]
 setosa  versicolor   virginica MeanDecreaseAccuracy MeanDecreaseGini
Sepal.Length 0.036131008 0.023774906 0.038354330          0.033578769         9.798189
Sepal.Width  0.008306837 0.001895114 0.007878582          0.006106488         2.236535
Petal.Length 0.328732260 0.308242048 0.293766472          0.307467294        43.093269
Petal.Width  0.337599283 0.315079112 0.267375505          0.303321451        44.042372

tidymodels

library(tidyverse)
library(tidymodels)
set.seed(123, kind = "Mersenne-Twister")
tidy_results = rand_forest(mode = "classification",) %>%
  set_engine("randomForest",importance = T) %>%
  fit(
    Species ~.,
    data = iris
  )
tidy_results

Call:
 randomForest(x = maybe_data_frame(x), y = y, importance = ~T) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08

tidy_results[["fit"]][["importance"]]
 setosa  versicolor   virginica MeanDecreaseAccuracy MeanDecreaseGini
Sepal.Length 0.036131008 0.023774906 0.038354330          0.033578769         9.798189
Sepal.Width  0.008306837 0.001895114 0.007878582          0.006106488         2.236535
Petal.Length 0.328732260 0.308242048 0.293766472          0.307467294        43.093269
Petal.Width  0.337599283 0.315079112 0.267375505          0.303321451        44.042372

tidymodels and randomForest have the same results, but mlr3 and randomForest not, setting the seed does not yield the same results! I make some mistakes in code? I feel confused...

randomForest:4.7.1.1 mlr3:0.17.0 tidymodels:1.1.1

0

There are 0 best solutions below