Splitting rules in mvpart vs rpart

1k Views Asked by At

I would like to make classification trees to predict the presence/absence of 1 bird species based on several variables. I know that rpart handles univariate partitioning and mvpart handles multivariate partitioning, but I'd like to use mvpart for my one-variable tree because of its more flexible output. Does anyone know of a reason that I should not do this? Will the splits be different in rpart vs mvpart with the same exact input?

1

There are 1 best solutions below

0
On

It cannot be guaranteed that the splits will be the same; mvpart() is minimising the within groups sums of squares whereas rpart for a classification tree will be minimising the Gini coefficient (by default IIRC).

You may end up with the same model/splits but as the two functions are using two different measures of node impurity this may just be a fluke.

FYI, mvpart is fitting a regression model but you want a classification model.

Finally, consider using the party package and its function ctree; it has much nicer outputs than rpart by default but is, again, doing something slightly different in terms of model fitting.

As an aside, also look into the plotmo package which includes enhanced plots for a number of tree-like models including, IIRC, rpart ones.