I'm fitting classification models for binary issues using MLR package in R. For each model, I perform a cross-validation with embedded feature selection using "selectFeatures" function. In output, I retrieve mean AUCs over test sets and predictions. To do so, after having get some advices (Get predictions on test sets in MLR), I use "makeFeatSelWrapper" function in combination with "resample" function. The goal seems to be achieved but results are strange. With a logistic regression as classifier, I get an AUC of 0.5 which means no variable selected. This result is unexpected as I get an AUC of 0.9824432 with this classifier using the method mentioned in the linked question. With a neural network as classifier, I get an error message
Error in sum(x) : invalid 'type' (list) of argument
What is wrong?
Here is the sample code:
# 1. Find a synthetic dataset for supervised learning (two classes)
###################################################################
install.packages("mlbench")
library(mlbench)
data(BreastCancer)
# generate 1000 rows, 21 quantitative candidate predictors and 1 target variable
p<-mlbench.waveform(1000)
# convert list into dataframe
dataset<-as.data.frame(p)
# drop thrid class to get 2 classes
dataset2 = subset(dataset, classes != 3)
# 2. Perform cross validation with embedded feature selection using logistic regression
#######################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.logreg", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of hold-out sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
# 3. Perform cross validation with embedded feature selection using neural network
##################################################################################
library(BBmisc)
library(nnet)
library(mlr)
# Choice of data
mCT <- makeClassifTask(data =dataset2, target = "classes")
# Choice of algorithm i.e. neural network
mL <- makeLearner("classif.nnet", predict.type = "prob")
# Choice of cross-validations for folds
outer = makeResampleDesc("CV", iters = 10,stratify = TRUE)
# Choice of feature selection method
ctrl = makeFeatSelControlSequential(method = "sffs", maxit = NA,alpha = 0.001)
# Choice of sampling between training and test within the fold
inner = makeResampleDesc("Holdout",stratify = TRUE)
lrn = makeFeatSelWrapper(mL, resampling = inner, control = ctrl)
r = resample(lrn, mCT, outer, extract = getFeatSelResult,measures = list(mlr::auc,mlr::acc,mlr::brier),models=TRUE)
If you run your logistic regression part of the code a couple of times, you should also get the
Error in sum(x) : invalid 'type' (list) of argument
error. However, I find it strange that fixing a particular seed (e.g.,set.seed(1)
) before resampling does not ensure that the error does or does not appear.The error occurs in internal
mlr
code for printing the output of feature selection to the console. A very simple workaround is to simply avoid printing such output withshow.info = FALSE
inmakeFeatSelWrapper
(see code below). While this removes the error, it is possible that what caused it may have other consequences, although I it is possible the error only affects the printing code.When running your code, I only get AUC above 0.90. Please find below a your code for logistic regression, slightly re-organized and with the workaround. I have added a droplevels() to the dataset2 to remove the missing level 3 from the factor, though this is not related with the workaround.
Edit: I've reported an issue and created a pull request with a fix.