I am trying to implement a simple random forest algorithm in R just to know how R and Random Forest works, and test the accuracy in a test set.
My sample data (five rows of 561 total rows) is :
bulbasaur[1:5,]
Appt_date count no_of_reps PerReCount
1 2016-01-01 2 1 2.000000
2 2016-01-04 174 58 3.000000
3 2016-01-05 206 59 3.491525
4 2016-01-06 203 61 3.327869
5 2016-01-07 236 64 3.687500
The code that I have written is:
install.packages("caret")
library(caret)
leaf <- bulbasaur
ctrl = trainControl(method="repeatedcv", number=100, repeats=50, selectionFunction = "oneSE")
in_train = createDataPartition(leaf$PerReCount, p=.75, list=FALSE)
#random forest
trf = train(PerReCount ~ ., data=leaf, method="rf", metric="RMSE",trControl=ctrl, subset = in_train)
#boosting
tgbm = train(PerReCount ~ ., data=leaf, method="gbm", metric="RMSE",
trControl=ctrl, subset = in_train, verbose=FALSE)
resampls = resamples(list(RF = trf, GBM = tgbm))
difValues = diff(resampls)
summary(difValues)
######Using it on test matrix
test = leaf[-in_train,]
test$pred.leaf.rf = predict(trf, test, "raw")
confusionMatrix(test$pred.leaf.rf, test$PerReCount)
However, I get the following error:
Error in confusionMatrix.default(test$pred.leaf.rf, test$PerReCount) :
the data cannot have more levels than the reference
I tried some changes, like taking leaf$PerReCount <- as.factors(leaf$PerReCount)
, and adding type = "class"
, but the accuracy that came was abysmal, and I don't want to change it from regression to classification. How can I resolve it without converting to factors, or in any other way such that the issue can be resolved, or get an accuracy count without using confusion matrix maybe. Thanks
The issue as @Damiano suggested is correct and regression model will not give a confusion matrix as its not yes or no. The issue I resolved is using RMSE: