In my dataset I have a binary Target (0 or 1) variable, and 8 features: nchar, rtc, Tmean, week_day, hour, ntags, nlinks and nex. week_day is a factor while the others are numeric. I built a decision tree classifier, but my question concerns the feature scaling:
library(caTools)
set.seed(123)
split = sample.split(dataset$Target, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
training_set[-c(2,4)] = scale(training_set[-c(2,4)])
test_set[-c(2,4)] = scale(test_set[-c(2,4)])
The model returns that Tmean=-0.057 and ntags=2 are two splitting points. How can I recover the original value of these two features, that is, that assumed by the variables before the rescaling operation performed by scale().
If the data were scaled with
scale, the following functionunscalemight be of help solving the question.The original vector and the unscaled one are
all.equalbut notidentical, due to floating-point precision.The function also works with data.frames but the class of its output is
"matrix", not the original"data.frame". This is the result ofscale's output.But the right way of scaling/unscaling an object with a
dimattribute, such as a data.frame, is vector by vector. This can be done with alapplyloop, for instance.