I'm rolling a loop over the degree of the approximating polynomial for training with caret
ds = 1:20
for(i in 1:length(ds)){
print(i)
d=ds[i]
fit = train(y~poly(x,degree=d),data=training,method="lm",trControl=fitCtrl)
# other operations
}
running the code gives
Error in `[.data.frame`(data, 0, cols, drop = FALSE) :
undefined columns selected
using d=4 doesn't work, but fixing the degree in the call, i.e. degree=4, works.
Any guess of what's going on here?
Thanks!
EDIT:
library(caret)
set.seed(1)
omega = 0.5*pi
xi = 0.5
phi = 0.5*pi
f = function(t)1-exp(-xi*omega*t)*sin(sqrt(1-xi^2)*omega*t+phi)/sin(phi)
sigma = 0.03
train.n = 100
x = seq(0,2*pi,by=2*pi/(train.n-1))
y = f(x)+rnorm(train.n,mean=0,sd=sigma)
training = data.frame(x=x,y=y)
fitCtrl <- trainControl(method = "LOOCV",verboseIter = FALSE)
ds = 1:20
for(i in 1:length(ds)){
print(i)
d=4
fit=train(y~poly(x,degree=4),data=training,method="lm",trControl=fitCtrl)
}
The problem here is actually that
caret
is usingall.vars()
on your formula under the hood to create the dataframe needed for modeling. As you can see,d
is thought to be one of these variables.Typically, one could solve these issues with the use of
I()
in the formula orforce()
around it, but not withall.vars()
.The only way to fix this is to not send in
d
in your formula, but have it be a number beforehand.Using
as.formula(paste0("y ~ poly(x, degree=", d, ")")
in your loop will achieve this (as also suggested by @akrun).Here is a working example based on your code:
Created on 2022-03-20 by the reprex package (v2.0.1)