Ctree classification with weights - results displayed

3.8k Views Asked by At

Let's say I want to use the iris data example, but correctly classifying versicolor is 5 times more important to me.

library(party)
data(iris)
irisct <- ctree(Species ~ .,data = iris, weights=ifelse(iris$Species=='versicolor', 5, 1))
plot(irisct)

Then the tree graph changes the number of observations and conditional probabilities in each node (it multiplies versicolor by 5). Is there a way to "disable" this, i.e. show the original number of observations (total = 150 for iris)?

Many thanks for your help!

1

There are 1 best solutions below

8
On BEST ANSWER

The enhanced reimplementation of ctree() in package partykit also has somewhat more flexible plotting capabilities. Specifically, the node_barplot() panel function gained a mainlab argument that can be used for customizing the main labels. For example for the iris data:

library("partykit")
ct <- ctree(Species ~ ., data = iris)

You can set up a vector of labels and then supply a function that accesses these:

lab <- paste("Foo", 1:7)
ml <- function(id, nobs) lab[as.numeric(id)]
plot(ct, tp_args = list(mainlab = ml))

Of course, the example above is not very meaningful but could be modified to accomplish what you want with a little bit of coding.

However, be warned about the upsampling of certain observations using the weights argument. The ctree() function really treats the weights as case weights and consequently the significance tests used for splitting do change. With increased number of observations, all p-values become smaller and hence the tree selects more splits (unless mincriterion is increased simultaneously). Compare the ct tree above with 4 terminal nodes with

ct2 <- ctree(Species ~ ., data = iris, weights = rep(2, 150))
ct3 <- ctree(Species ~ ., data = iris, weights = rep(2, 150), mincriterion = 0.999)

The resulting number of terminal nodes are

c(width(ct), width(ct2), width(ct3))
[1] 4 6 4