Let's say I want to use the iris data example, but correctly classifying versicolor is 5 times more important to me.
library(party)
data(iris)
irisct <- ctree(Species ~ .,data = iris, weights=ifelse(iris$Species=='versicolor', 5, 1))
plot(irisct)
Then the tree graph changes the number of observations and conditional probabilities in each node (it multiplies versicolor by 5). Is there a way to "disable" this, i.e. show the original number of observations (total = 150 for iris)?
Many thanks for your help!
The enhanced reimplementation of
ctree()in packagepartykitalso has somewhat more flexible plotting capabilities. Specifically, thenode_barplot()panel function gained amainlabargument that can be used for customizing the main labels. For example for the iris data:You can set up a vector of labels and then supply a function that accesses these:
Of course, the example above is not very meaningful but could be modified to accomplish what you want with a little bit of coding.
However, be warned about the upsampling of certain observations using the
weightsargument. Thectree()function really treats theweightsas case weights and consequently the significance tests used for splitting do change. With increased number of observations, all p-values become smaller and hence the tree selects more splits (unlessmincriterionis increased simultaneously). Compare thecttree above with 4 terminal nodes withThe resulting number of terminal nodes are