I'm trying to use information_gain and mrmr feature filtering, but also a combination of information_gain and mrmr feature filtering (the union of the two). I've tried creating a reprex below.
library("mlr3verse")
task <- tsk('sonar')
filters = list("nop" = po("nop"),
"information_gain" = po("filter", flt("information_gain")),
"mrmr" = po("filter", flt("mrmr")),
"ig_mrmr" = po("branch", c("ig2", "mrmr2"), id = "ig_mrmr") %>>%
gunion(list("ig2" = po("filter", flt("information_gain")),
"mrmr2" = po("filter", flt("mrmr")))) %>>%
po("featureunion", id = "union_igmrmr"))
pipe =
po("branch", names(filters), id = "branch1") %>>%
gunion(unname(filters)) %>>%
po("unbranch", names(filters), id = "unbranch1") %>>%
po(lrn('classif.rpart'))
pipe$plot()
Looks good so far, and here you can see that I'm trying to combine the ig & mrmr selected features.
Next I set the parameters, which I think is correct:
ps <- ParamSet$new(list(
ParamDbl$new("classif.rpart.cp", lower = 0, upper = 0.05),
ParamInt$new("information_gain.filter.nfeat", lower = 20L, upper = 60L),
ParamFct$new("information_gain.type", levels = c("infogain", "symuncert")),
ParamInt$new("ig2.filter.nfeat", lower = 20L, upper = 60L),
ParamFct$new("ig2.type", levels = c("infogain", "symuncert")),
ParamInt$new("mrmr.filter.nfeat", lower = 20L, upper = 60L),
ParamInt$new("mrmr2.filter.nfeat", lower = 20L, upper = 60L),
ParamFct$new("branch1.selection", levels = names(filters)),
ParamFct$new("ig_mrmr.selection", levels = c("ig2", "mrmr2"))
))
The dependencies are where I'm struggling. I can set the "nested" parameters on EITHER the outer branch or the inner branch, but I'm not sure how to trigger them on BOTH. In the example below they are set on the outer branch.
ps$add_dep("information_gain.filter.nfeat", "branch1.selection", CondEqual$new("information_gain"))
ps$add_dep("information_gain.type", "branch1.selection", CondEqual$new("information_gain"))
ps$add_dep("mrmr.filter.nfeat", "branch1.selection", CondEqual$new("mrmr"))
ps$add_dep("ig2.filter.nfeat", "branch1.selection", CondEqual$new("ig_mrmr"))
ps$add_dep("ig2.type", "branch1.selection", CondEqual$new("ig_mrmr"))
ps$add_dep("mrmr2.filter.nfeat", "branch1.selection", CondEqual$new("ig_mrmr"))
ps
glrn <- GraphLearner$new(pipe)
glrn$predict_type <- "prob"
cv5 <- rsmp("cv", folds = 5)
task$col_roles$stratum <- task$target_names
instance <- TuningInstanceSingleCrit$new(
task = task,
learner = glrn,
resampling = cv5,
measure = msr("classif.auc"),
search_space = ps,
terminator = trm("evals", n_evals = 5)
)
tuner <- tnr("random_search")
tuner$optimize(instance)
Note that I don't hit an error until I try to optimize the tuner.
Error message:
Error in self$assert(xs) :
Assertion on 'xs' failed: Parameter 'ig2.filter.nfeat' not available. Did you mean 'branch1.selection' / 'information_gain.filter.nfeat' / 'information_gain.filter.frac'?.
From your description it sounds as if you do not intend to use a branch for
c("ig2", "mrmr2")
:since you intend to combine the output of these two. In other words you want them both applied in the same instance of resampling.
The easiest way to see the parameters you can tune is:
From this you would see some parameters you specified do not have full names. For instance:
Lets specify correct names for parameters:
and now everything runs without problems:
This gallery post will be useful:
https://mlr3gallery.mlr-org.com/posts/2020-04-23-pipelines-selectors-branches/
as well as others
https://mlr3gallery.mlr-org.com/
If you feel some aspect of mlr3 is not understandable and you can not find a suitable gallery post/book example you should request it.
Link to book: https://mlr3book.mlr-org.com/