How to ensamble models in R with caretEnsemble library (or other way)?

28 Views Asked by At

I am trying to solve a problem of ensembling different models (xgboost, glm, random forest and so on) made with caret library with same traincontrol on same data. Better to say: When I predict my binary outcome with different models from above they give different probabilities (and sometimes one gives <0.5 "no", and another >0.5 "yes"). So I'd like to make one final prediction averaging (no good) or ensembling (my idea) them.

With caret I developed some models with trainControl like:

fitControl <- trainControl(
method = "LGOCV",
number = 6,
repeats = 1000,
verbose = TRUE,
summaryFunction = twoClassSummary,
classProbs=TRUE,
returnResamp="final"   )

and training like:

train(my_binary_y ~., data = my_data,
                        method = "Adaboost.M1",
                        trControl=fitControl,
                        metric = "Spec"
                        ,tuneGrid=my_hyperparameters_grid
)

So I created models for different methods (like adaboost, xgboost, random forest etc. etc.). With long hyperparameters tuning, it costs me a huge amount of time. I saved models with best performance.

So now I have several good models who can predict my outcome. But I need to create one final prediction from models. I found solution in caretEnsemble library.

I create caretList with my models like:

ensemble_list <- as.caretList(list(model1=xgb_model, model2=rf_model, model3=adaboost_model))

Let's train ensemble:

stackControl <- trainControl(sampling="rose",method="repeatedcv", number=5, repeats=2, savePredictions=TRUE, classProbs=TRUE)
set.seed(999)
stack.glm <- caretStack(ensemble_list, method="glm", trControl=stackControl)

This results in an error:

Error in check_caretList_model_types(list_of_models) : 

No predictions saved by train. Please re-run models with trainControl set with savePredictions = TRUE.

I did not put savePredictions = TRUE in fitControl during initial models training.

Bad thing: I can't remake all the models because in real there are about 40 of them for different situations and outcomes with LONG hyperparameter tuning.

Is there way to make caretEnsemble work without initial savePredictions = TRUE? Or maybe there is another way to ensemble models? Or some good way to make one prediction of binary outcome from several models?

0

There are 0 best solutions below