R ranger treeInfo final nodes have the same class

Question

R ranger treeInfo final nodes have the same class

329 Views Asked by David At 27 July 2025 at 19:16

When I use ranger for a classification model and treeInfo() to extract a tree, I see that sometimes a split results in two identical terminal nodes. Is this expected behaviour? Why does it make sense to introduce a split where the final nodes are the same?

From this question, I take that the prediction variable could be the majority class (albeit for python and another random forest implementation). The ranger ?treeInfo documentation says it should be the predicted class.

MWE

library(ranger)

data <- iris
data$is_versicolor <- factor(data$Species == "versicolor")
data$Species <- NULL

rf <- ranger(is_versicolor ~ ., data = data,
             num.trees = 1, # no need for many trees in this example
             max.depth = 3, # keep depth at an understandable level
             seed = 1351, replace = FALSE)
treeInfo(rf, 1)
#>   nodeID leftChild rightChild splitvarID splitvarName splitval terminal prediction
#> 1      0         1          2          2 Petal.Length     2.60    FALSE       <NA>
#> 2      1        NA         NA         NA         <NA>       NA     TRUE      FALSE
#> 3      2         3          4          3  Petal.Width     1.75    FALSE       <NA>
#> 4      3         5          6          2 Petal.Length     4.95    FALSE       <NA>
#> 5      4         7          8          0 Sepal.Length     5.95    FALSE       <NA>
#> 6      5        NA         NA         NA         <NA>       NA     TRUE       TRUE
#> 7      6        NA         NA         NA         <NA>       NA     TRUE       TRUE
#> 8      7        NA         NA         NA         <NA>       NA     TRUE      FALSE
#> 9      8        NA         NA         NA         <NA>       NA     TRUE      FALSE

In this example, the last four rows (final nodes with nodeID 5 and 6, as well as 7 and 8) have the prediction TRUE and FALSE. Graphically this would look like this

Original Q&A

There are 1 best solutions below

**David** · Answer 1

I think I found a (partial) answer to the issue, namely the mtry and min.node.size arguments and their functionality.

As the random forest chooses only mtry variables at each split, the final split might take only variables into account, which do not split the data in a way that results in a maximum gini difference (or whatever metric was chosen) but still in each final node, a given class might prevail.

Playing around with mtry and min.node.size can change this. But we still might get splits with the same results.

R ranger treeInfo final nodes have the same class

MWE

There are 1 best solutions below

Related Questions in R

Related Questions in RANDOM-FOREST

Related Questions in R-RANGER

Trending Questions

Popular # Hahtags

Popular Questions