Let's say I want to use the iris data example, but correctly classifying versicolor is 5 times more important to me.
library(party)
data(iris)
irisct <- ctree(Species ~ .,data = iris, weights=ifelse(iris$Species=='versicolor', 5, 1))
plot(irisct)
Then the tree graph changes the number of observations and conditional probabilities in each node (it multiplies versicolor by 5). Is there a way to "disable" this, i.e. show the original number of observations (total = 150 for iris)?
Many thanks for your help!
The enhanced reimplementation of
ctree()
in packagepartykit
also has somewhat more flexible plotting capabilities. Specifically, thenode_barplot()
panel function gained amainlab
argument that can be used for customizing the main labels. For example for the iris data:You can set up a vector of labels and then supply a function that accesses these:
Of course, the example above is not very meaningful but could be modified to accomplish what you want with a little bit of coding.
However, be warned about the upsampling of certain observations using the
weights
argument. Thectree()
function really treats theweights
as case weights and consequently the significance tests used for splitting do change. With increased number of observations, all p-values become smaller and hence the tree selects more splits (unlessmincriterion
is increased simultaneously). Compare thect
tree above with 4 terminal nodes withThe resulting number of terminal nodes are