Why do PLS regression coefficients with R [pls] differ with those from other R packages?

726 Views Asked by At

Out of curiosity, I am trying to figure out why the PLS regression coefficients obtained with pls differ from the coefficients obtained with plsRglm, ropls, or plsdepot which all provide the same results.

Here is some code to start with. I have tried to play with the scale, center, and method arguments of the plsr function... but no success so far.

library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)

data(Cornell)

pls.plsr <- plsr(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  ncomp = 3, 
  scale = TRUE, 
  center = TRUE
)

plsRglm.plsr <- plsR(
  Y~X1+X2+X3+X4+X5+X6+X7, 
  data = Cornell, 
  nt = 3, 
  scaleX = TRUE
)

ropls.plsr <- opls(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"], 
  scaleC = "standard"
)

plsdepot.plsr <- plsreg1(
  as.matrix(Cornell[, grep("X", colnames(Cornell))]),
  Cornell[, "Y"],
  comps = 3
)

## extract PLS regression coefficients for the PLS model with three components
coef(pls.plsr) # a
coef(plsRglm.plsr, type = "original") # b
coef(plsRglm.plsr, type = "scaled") # c
coef(ropls.plsr) # c
plsdepot.plsr$std.coefs # c
plsdepot.plsr$reg.coefs # b
1

There are 1 best solutions below

2
On BEST ANSWER

Firstly, just for re-formatting purposes, we write:

library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)

data(Cornell)
pls.plsr <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                 data = Cornell, 
                 ncomp = 3, scale = T, center = T)
plsRglm.plsr <- plsR(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7, 
                    data = Cornell, 
                    nt = 3, scaleX = TRUE)
ropls.plsr <- opls(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                   Cornell[, "Y"], scaleC = "standard")
plsdepot.plsr <- plsreg1(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
                         Cornell[, "Y"], comps = 3)

That done, you may extract the coefficients in the original scale:

### ORIGINAL SCALE -  plsRglm, plsdepot
coef(plsRglm.plsr, type = "original")
plsdepot.plsr$reg.coefs

Or you can have them scaled:

### SCALED - plsRglm, ropls, plsdepot
coef(plsRglm.plsr, type = "scaled")
coef(ropls.plsr)
plsdepot.plsr$std.coefs

Therefore all methods now give rise to the same coefficients... Except for pls::plsr. Why? You may ask. The key is in the command. When you run:

coef(pls.plsr) # , , 3 comps

You see ", , 3". That is characteristic of a tensor object. What is this? Coefficients should be simply a vector. The reason is that coef is a generic function and it is not working properly for pls::plsr models. To see what it is actually extracting:

pls.plsr$coefficients
matrix(pls.plsr$coefficients, ncol = 3) # or in matrix form. coef simply extracts the third column (it should not)

But you can see the same fit for all models if you examine the equivalent object in each R-package as in:

matrix(pls.plsr$projection, ncol = 3)    
plsRglm.plsr$wwetoile
plsdepot.plsr$mod.wgs
ropls.plsr@weightStarMN

Therefore, for pls::plsr you were simply not extracting the coefficients.