Why predict in mlr does not work when there are fewer observations used for prediction than training?

208 Views Asked by At

I am trying to use multiple linear regression in R, and I have trained my data by loading it from a file. But when I try to predict, I get a warning message:

"Warning messages:
1: 'newdata' had 45 rows but variables found have 8676 rows
2: In predict.lm(reg, tin) :
  prediction from a rank-deficient fit may be misleading"

My code is simple :

yval = read.table("value_of_y.txt",header = T)
xval = read.table("Rmat.txt",header = T)
reg<-lm(yval$y~xval$x1+xval$x2+xval$x3+xval$x4+xval$x5+xval$x6+xval$x7+xval$x8+xval$x9+xval$x10+xval$x11+xval$x12+xval$x13+xval$x14)
summary(reg)
tin = read.table("Rtest.txt",header = T)
predict(reg,tin)

My training data (Rmat.txt):

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 1 -1 1 1 1 1 1 1 1 1 1 1 1
-1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1
1 -1 -1 1 1 1 1 1 1 1 1 1 1 1

(value_of_y.txt):
5
-5
5
5
-5
5
5
-5
-5

My testing data which I use for prediction (Rtest.txt)

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14
-1 1 -1 1 1 1 1 1 -1 1 -1 1 -1 1
-1 1 -1 1 1 1 1 1 1 1 -1 1 -1 1
-1 -1 1 1 1 1 1 1 1 1 -1 1 -1 1
-1 -1 -1 1 1 1 1 1 1 1 -1 1 -1 1

How should I use the predict function instead?

1

There are 1 best solutions below

5
On

You need to be more careful when using the formula syntax with lm and predict. The column names in the model and the new data.frame must match exactly and this is not possible when you use the "$" syntax in the formula. Try something like

yval = read.table("value_of_y.txt",header = T)
xval = read.table("Rmat.txt",header = T)
reg<-lm(y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10+x11+x12+x13+x14, cbind(yval, xval))
summary(reg)
tin = read.table("Rtest.txt",header = T)
predict(reg,tin)