I am trying to build a time-series model using a random forest. However, I get the same mistake, everytime I run the code, which is:
Error in [.data.frame
(data, , all.vars(Terms), drop = FALSE) :
undefined columns selected
I know most of the theory behind random forests pretty well, but haven't really run much code using it.
Here is my code:
library(randomForest)
library(caret)
fitControl <- trainControl(
method = "repeatedcv",
number = 10,
repeats = 1,
classProbs = FALSE,
verboseIter = TRUE,
preProcOptions=list(thresh=0.95,na.remove=TRUE,verbose=TRUE))
set.seed(1234)
rf_grid <- expand.grid(mtry = c(1:6))
fit <- train(df.ts[,1]~.,
data=df.ts[,2:6],
method="rf",
preProcess=c("center","scale"),
tuneGrid = rf_grid,
trControl=fitControl,
ntree = 200,
metric="RMSE")
For a reproducible example, you can run the code on the following dataset:
df.ts <- structure(list(ts.t = c(315246, 219908, 193014, 231970, 248246,
+ 247112, 268218, 263637, 264306, 245730, 256548, 227525, 304468,
+ 229614, 202985), ts1 = c(233913, 315246, 219908, 193014, 231970,
+ 248246, 247112, 268218, 263637, 264306, 245730, 256548, 227525,
+ 304468, 229614), ts2 = c(253534, 233913, 315246, 219908, 193014,
+ 231970, 248246, 247112, 268218, 263637, 264306, 245730, 256548,
+ 227525, 304468), ts3 = c(226650, 253534, 233913, 315246, 219908,
+ 193014, 231970, 248246, 247112, 268218, 263637, 264306, 245730,
+ 256548, 227525), ts6 = c(213268, 242558, 250554, 226650, 253534,
+ 233913, 315246, 219908, 193014, 231970, 248246, 247112, 268218,
+ 263637, 264306), ts12 = c(333842, 210279, 193051, 174262, 216712,
+ 144327, 213268, 242558, 250554, 226650, 253534, 233913, 315246,
+ 219908, 193014)), .Names = c("ts.t", "ts1", "ts2", "ts3", "ts6", "ts12"), row.names = 13:27, class = "data.frame")
I hope someone can spot my error(s)
Thanks,
The formula should correspond to the names of the variables in
data
. E.g.y ~ .
predictsy
using all other variables indata
. Alternatively you could usey = df.ts[,1], x = df.ts[, -1]
instead offormula
anddata
.Thus the correct syntax would be: