R - Random Forest - Error on Applying confusion matrix on test data

894 Views Asked by At

I am trying to implement a simple random forest algorithm in R just to know how R and Random Forest works, and test the accuracy in a test set.

My sample data (five rows of 561 total rows) is :

bulbasaur[1:5,]
   Appt_date count no_of_reps PerReCount
1 2016-01-01     2          1   2.000000
2 2016-01-04   174         58   3.000000
3 2016-01-05   206         59   3.491525
4 2016-01-06   203         61   3.327869
5 2016-01-07   236         64   3.687500

The code that I have written is:

install.packages("caret")
library(caret)

leaf <- bulbasaur
ctrl = trainControl(method="repeatedcv", number=100, repeats=50, selectionFunction = "oneSE")
in_train = createDataPartition(leaf$PerReCount, p=.75, list=FALSE)

#random forest
trf = train(PerReCount ~ ., data=leaf, method="rf", metric="RMSE",trControl=ctrl, subset = in_train)


#boosting
tgbm = train(PerReCount ~ ., data=leaf, method="gbm", metric="RMSE",
             trControl=ctrl, subset = in_train, verbose=FALSE)

resampls = resamples(list(RF = trf, GBM = tgbm))
difValues = diff(resampls)
summary(difValues)



######Using it on test matrix
test = leaf[-in_train,]
test$pred.leaf.rf = predict(trf, test, "raw")
confusionMatrix(test$pred.leaf.rf, test$PerReCount)

However, I get the following error:

Error in confusionMatrix.default(test$pred.leaf.rf, test$PerReCount) : 
  the data cannot have more levels than the reference

I tried some changes, like taking leaf$PerReCount <- as.factors(leaf$PerReCount) , and adding type = "class" , but the accuracy that came was abysmal, and I don't want to change it from regression to classification. How can I resolve it without converting to factors, or in any other way such that the issue can be resolved, or get an accuracy count without using confusion matrix maybe. Thanks

1

There are 1 best solutions below

0
On BEST ANSWER

The issue as @Damiano suggested is correct and regression model will not give a confusion matrix as its not yes or no. The issue I resolved is using RMSE:

piko.chu = predict(trf, test)
RMSE.forest <- sqrt(mean((piko.chu-test$PerReCount)^2))