Error in eval(predvars, data, env) : object ' ' not found in R pls()

525 Views Asked by At

I've seen this question come up a lot but have yet to find a satisfactory solution, particularly for my case.

I am running partial least squares regression in R using pls() package, and would then like to calculate root mean square error of prediction using RMSEP() on newdata using the fitted model. This throws up the error, and I believe it is specifically because I am coding the function as follows:

plsr( Y ~ X[whatever , whatever ] ...

where I need to index specific parts of dataframe$X. Here is an example:

library(pls)

gasoline <- gasoline

#Split dataframe between training and testing data
set.seed(123)
split <- sample.split(gasoline$octane, SplitRatio = 0.70)

gasoline$train <- split

gas.fit <- plsr(octane ~ NIR[ ,1:10] + NIR[ ,20:30],
                        ncomp = 10, 
                        data = gasoline[gasoline$train ,],  
                        validation = "LOO", 
                        scale = FALSE, 
                        center = TRUE,
                        method = "simpls"
)

#I can use RMSEP() on the fitted model
RMSEP(gas.fit)

#I can use the fitted model to predict octane of my test set
predict(gas.fit, newdata = gasoline[!gasoline$train ,])  

#But I cannot get the RMSEP of the test predictions
RMSEP(gas.fit, estimate = "test", newdata = gasoline[!gasoline$train ,])

This last command throws up the error:

Error in eval(predvars, data, env) : object 'NIR' not found

What I know: I know the object 'NIR' should be present, since I've opted to combine train and test data into a single dataframe.

RMSEP() function works fine on models of style "plsr( Y ~ X[whatever , whatever ]" as long as you don't call newdata. predict() function works fine in both cases.

What I've tried: Mevik & Wehrens (2007) insist we use the format

plsr( octane ~ NIR,
...
data = gasoline
...)

and not

plsr( gasoline$octane ~ gasoline$NIR,

which is more akin to what I am doing in my example, but not exactly the same. Even so, I've tried the following adjustment:

gas.fit <- plsr(octane ~ NIR,
                        ncomp = 10, 
                        data = c(
              gasoline[gasoline$train ,]$NIR[ , 1:10],gasoline[gasoline$train ,]$NIR[ ,20:30]
                        ),  
                        validation = "LOO", 
                        scale = FALSE, 
                        center = TRUE,
                        method = "simpls"
)

But this is no good either ('envir' not of length one); also it means I have to include an additional gasoline$octane as well which further violates the length criterion.

I'd really like to find a solution to this approach as my end use goal is to include the plsr() model in a for() loop of the style:

gas.fit <- plsr(octane ~ NIR[ ,i:(i+20)],

as part of a Moving Window PLSR algorithm.

0

There are 0 best solutions below