How can we Fix a PipeOp
's $state
, so that its parameters or config are fixed from the beginning and remain the same in both training and prediction.
task = tsk("iris")
pos1 = po("scale", param_vals =list(
center = T,
scale = T,
affect_columns = selector_name("Sepal.Width")))
pos1$state
pos1$state$center <- c(Sepal.Width = 0)
pos1$state$scale <- c(Sepal.Width = 2)
graph <- pos1 %>>% lrn("classif.xgboost", eval_metric = "mlogloss")
gl <- GraphLearner$new(graph)
gl$train(task)
gl$state
In the code above, the parameters center
and scale
from po("scale")
are recalculated based on the data even when I try to fix them as zero and two (not sure whether I did this correctly), respectively.
A
PipeOp
's$state
should never be manually changed. I.e., it is more like a logging slot for you to inspect and where thePipeOp
finds all the information it needs to carry out its prediction step after being trained.PipeOpScale
will always scale the training data to mean 0 and scales them by their root-mean-square (see?scale
) and stores the "learned" parameters (i.e., mean and root-mean-square of the training data, e.g., the attributes returned by thescale
function) as the$state
. During prediction, the data will be transformed analogously resulting in a probably different mean and root-mean-square.Assuming you want to scale
"Sepal.Width"
to mean 0 and root-mean-square 2 both during training and prediction (as suggested by your code above; but this may be a bad idea), you can usePipeOpColApply
: