R - Cross Validation in GBM model and Decision Trees

62 Views Asked by At

I am looking to cross validate my GBM and Decision Trees models to compare them. I am helping a family member with a school science project, and I don't know of a library to help me cross validate the models I mentioned. At the moment the code I use for both models are:

GBM:

library(gbm, quietly=TRUE)
library(gbm)
library(caret)
set.seed(123)
# Get the time to train the GBM model
system.time(
  model_gbm <- gbm(as.integer(Class) -1  ~ .
                   , distribution = "bernoulli"
                   , data = rbind(datos.entreno,datos.test)
                   , n.trees = 500
                   , interaction.depth = 3
                   , n.minobsinnode = 100
                   , shrinkage = 0.01
                   , bag.fraction = 0.5
                   , train.fraction = nrow(datos.entreno) / (nrow(datos.entreno) + nrow(datos.test))
  )
)

gbm.iter = gbm.perf(model_gbm, method = "test")

predicted_probs.GBM.train <- predict(model_gbm, newdata = datos.entreno, n.trees = gbm.iter, type = "response")
predicted_probs.GBM.test <- predict(model_gbm, newdata = datos.test, n.trees = gbm.iter, type = "response")

# Convertir las probabilidades en predicciones de clase (0 o 1)
  predicted_class.GBM.train <- ifelse(predicted_probs.GBM.train >= 0.5, 1, 0)
  predicted_class.GBM.test <- ifelse(predicted_probs.GBM.test >= 0.5, 1, 0)

Decision Trees:

library(rpart)
library(rpart.plot)
decisionTree_model <- rpart(Class ~ . , datos.test, method = 'class')
predicted_val <- predict(decisionTree_model, datos.test, type = 'class')
probability <- predict(decisionTree_model, datos.test, type = 'prob')

rpart.plot(decisionTree_model)

PD1:Sorry for my English if it is not very well understood. PD2:I don't know if LogLoss and AUC are the final indicators of VC.

I tried to use code from some personal forums to make CVs in the GBM and Decision Trees models, but I did not get any results. In short what I want to do is to make CV of 5 and 10 folds and get LogLoss and AUC.

If possible, please correct my code. I am not very familiar with Rstudio and R. And what other incators can I calculate to compare both models and say which one is better.

0

There are 0 best solutions below