I am looking to cross validate my GBM and Decision Trees models to compare them. I am helping a family member with a school science project, and I don't know of a library to help me cross validate the models I mentioned. At the moment the code I use for both models are:
GBM:
library(gbm, quietly=TRUE)
library(gbm)
library(caret)
set.seed(123)
# Get the time to train the GBM model
system.time(
model_gbm <- gbm(as.integer(Class) -1 ~ .
, distribution = "bernoulli"
, data = rbind(datos.entreno,datos.test)
, n.trees = 500
, interaction.depth = 3
, n.minobsinnode = 100
, shrinkage = 0.01
, bag.fraction = 0.5
, train.fraction = nrow(datos.entreno) / (nrow(datos.entreno) + nrow(datos.test))
)
)
gbm.iter = gbm.perf(model_gbm, method = "test")
predicted_probs.GBM.train <- predict(model_gbm, newdata = datos.entreno, n.trees = gbm.iter, type = "response")
predicted_probs.GBM.test <- predict(model_gbm, newdata = datos.test, n.trees = gbm.iter, type = "response")
# Convertir las probabilidades en predicciones de clase (0 o 1)
predicted_class.GBM.train <- ifelse(predicted_probs.GBM.train >= 0.5, 1, 0)
predicted_class.GBM.test <- ifelse(predicted_probs.GBM.test >= 0.5, 1, 0)
Decision Trees:
library(rpart)
library(rpart.plot)
decisionTree_model <- rpart(Class ~ . , datos.test, method = 'class')
predicted_val <- predict(decisionTree_model, datos.test, type = 'class')
probability <- predict(decisionTree_model, datos.test, type = 'prob')
rpart.plot(decisionTree_model)
PD1:Sorry for my English if it is not very well understood. PD2:I don't know if LogLoss and AUC are the final indicators of VC.
I tried to use code from some personal forums to make CVs in the GBM and Decision Trees models, but I did not get any results. In short what I want to do is to make CV of 5 and 10 folds and get LogLoss and AUC.
If possible, please correct my code. I am not very familiar with Rstudio and R. And what other incators can I calculate to compare both models and say which one is better.