In my dataset
I have a binary Target
(0 or 1) variable, and 8 features: nchar
, rtc
, Tmean
, week_day
, hour
, ntags
, nlinks
and nex
. week_day
is a factor while the others are numeric. I built a decision tree classifier, but my question concerns the feature scaling:
library(caTools)
set.seed(123)
split = sample.split(dataset$Target, SplitRatio = 0.75)
training_set = subset(dataset, split == TRUE)
test_set = subset(dataset, split == FALSE)
# Feature Scaling
training_set[-c(2,4)] = scale(training_set[-c(2,4)])
test_set[-c(2,4)] = scale(test_set[-c(2,4)])
The model returns that Tmean=-0.057
and ntags=2
are two splitting points. How can I recover the original value of these two features, that is, that assumed by the variables before the rescaling operation performed by scale()
.
If the data were scaled with
scale
, the following functionunscale
might be of help solving the question.The original vector and the unscaled one are
all.equal
but notidentical
, due to floating-point precision.The function also works with data.frames but the class of its output is
"matrix"
, not the original"data.frame"
. This is the result ofscale
's output.But the right way of scaling/unscaling an object with a
dim
attribute, such as a data.frame, is vector by vector. This can be done with alapply
loop, for instance.