I am doing a stack of models in R as follows:
ctrl <- trainControl(method="repeatedcv", number=5, repeats=3, returnResamp="final", savePredictions="final", classProbs=TRUE, selectionFunction="oneSE", verboseIter=TRUE)
models_stack <- caretStack(
model_list,
data=train_data,
tuneLength=10,
method="glmnet",
metric="ROC",
trControl=ctrl
)
1) Why am I seeing the following error? What can I do? I am stuck now.
Timing stopped at: 0.89 0.005 0.91
Show Traceback
Error in (function (x, y, family = c("gaussian", "binomial", "poisson", : unused argument (data = list(c(-0.00891097103286995, 0.455282701499392, 0.278236211515583, 0.532932725880776, 0.511036607368827, 0.688757947257125, -0.560727863490874, -0.21768155316146, 0.642219917023467, 0.220363129901216, 0.591732278371339, 1.02850020403572, -1.02417799431585, 0.806359545011601, -1.21490317454699, -0.671361009441299, 0.927344615788642, -0.10449847318776, 0.595493217624868, -1.05586363903119, -0.138457794869817, -1.026253562838, -1.38264471633224, -1.32900800143341, 0.0383617314263342, -0.82222313323842, -0.644251885665736, -0.174126438952992, 0.323934240274895, -0.124613523895458, 0.299359713721601, -0.723599218327519, -0.156528054435544, -0.76193093842169, 0.863217455799044, -1.01340448660914, -0.314365383747751, 1.19150804114605, 0.314703439577839, 1.55580594654149, -0.582911462615421, -0.515291378382375, 0.305142268138296, 0.513989405541095, -1.85093305614114, 0.436468060668601, -2.18997828727424, 1.12838871469007, -1.17619542016998, -0.218175589380355
2) Is there not supposed to have a "data" parameter? If i need to use a different dataset for my level 1 supervisor model what I can do?
3) Also I wanted to use AUC/ROC but got these errors
The metric "AUC" was not in the result set. Accuracy will be used instead.
and
The metric "ROC" was not in the result set. Accuracy will be used instead.
I saw some online examples that ROC can be used, is it because it is not for this model? What metrics can I use besides Accuracy for this model? If I need to use ROC, what are the other options.
As requested by @RLave, this is how my model_list is done
grid.xgboost <- expand.grid(.nrounds=c(40,50,60),.eta=c(0.2,0.3,0.4),
.gamma=c(0,1),.max_depth=c(2,3,4),.colsample_bytree=c(0.8),
.subsample=c(1),.min_child_weight=c(1))
grid.rf <- expand.grid(.mtry=3:6)
model_list <- caretList(y ~.,
data=train_data_0,
trControl=ctrl,
tuneList=list(
xgbTree=caretModelSpec(method="xgbTree", tuneGrid=grid.xgboost),
rf=caretModelSpec(method="rf", tuneGrid=grid.rf)
)
)
My train_data_0 and train_data are both from the same dataset. My dataset predicators are all numeric values with the label as a binary label
your question contains three questions:
caretStackshould not have adataparameter, thedatais generated based on predictions of models incaretList. Take a look at this reproducible example:using the Sonar data set:
create grid for hyper parameter tune for xgboost:
create grid for rf tune:
create train control:
tune the models:
create the stacked ensamble:
2) Is there not supposed to have a "data" parameter? If i need to use a different dataset for my level 1 supervisor model what I can do?
caretStackneeds only the predictions from the base models, in order to create an ensemble of models trained on different data you must create a newcaretListwith the appropriatedataspecified there.3) Also I wanted to use AUC/ROC but got these errors
The easiest way to use AUC as metric is to set:
summaryFunction = twoClassSummaryintrainControl