Predict function on a Regression model giving error

1.9k Views Asked by At

I am trying to predict the predict values of y variable based on my polynomial model.

lumber.predict.plm=lm(lumber.unemployment.women$lumber.1980.2000 ~ 
                        scale(lumber.unemployment.women$woman.1980.2000) +
                        I(scale(lumber.unemployment.women$woman.1980.2000)^2))

xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.whole=data.frame(x=seq(xmin, xmax, length.out=500))
predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
                                       interval="confidence")

All of the above commands work fine except the last one. It gives the following error -

predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
+                                        interval="confidence")

#Error in `$<-.data.frame`(`*tmp*`, "lumber", value = c(134.507238798567,  : 
#  replacement has 252 rows, data has 500
#In addition: Warning message:
#'newdata' had 500 rows but variables found have 252 rows

Data frame properties on which Regression is being carried out..

str(lumber.unemployment.women)
#'data.frame':  252 obs. of  2 variables:
# $ lumber.1980.2000: num  108.2 109.9 109.6 99.8 97 ...
# $ woman.1980.2000 : num  5.8 5.9 5.7 6.3 6.4 6.5 6.6 6.7 6.3 6.7 ...

Why should predicted values depend on the number of observations that I have in the data frame ?

2

There are 2 best solutions below

5
On

I think the following is your problem although the error message seems a bit obscure to me. Here is a simplified version of your code:

L=data.frame(woman=1:100, lumber=1:100+rnorm(100))
L.lm= lm(lumber ~ woman, data=L) 
xmin =-20; xmax= 120;

The following gives an error because the original data doesn't have "x" variable in your new data. Note that the lm() above did not automatically assign it to a variable called "x".

nd=data.frame(x=seq(xmin, xmax, length.out=500))
predict(L.lm, newdata=nd,interval="confidence")

Error in eval(expr, envir, enclos) : object 'woman' not found

Rather it is looking for "woman". SO if you did summary(L.lm) you would find the coefficient was "woman" not "x".

The following works as original and new data contain the same variables

nd=data.frame(woman=seq(xmin, xmax, length.out=500))
predict(L.lm, newdata=nd,interval="confidence")

        fit       lwr       upr
1 -20.32932 -20.85072 -19.80792
2 -20.04737 -20.56699 -19.52775
3 -19.76542 -20.28327 -19.24757
4 -19.48347 -19.99955 -18.96740
5 -19.20153 -19.71582 -18.68723
6 -18.91958 -19.43210 -18.40705
etc..

ps just to be clear this will also work with ...

L.lm= lm(lumber ~ poly(woman,2), data=L)

a cleaner way of expressing polynomial fits.

0
On

Just modified the linear model name.. and it works fine. Don't know the root cause of the error though!! Would be great, if someone can explain the cause of earlier error note. Modified script noted below.

lumber.predict.plm1=lm(lumber.1980.2000 ~ scale(woman.1980.2000) +
                        I(scale(woman.1980.2000)^2), data=lumber.unemployment.women)
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.all=data.frame(woman.1980.2000=seq(xmin,xmax,length.out=100))
predicted.lumber.all$lumber=predict(lumber.predict.plm1,newdata=predicted.lumber.all)
> str(predicted.lumber.all)
'data.frame':   100 obs. of  2 variables:
 $ woman.1980.2000: num  3.3 3.36 3.42 3.48 3.54 ...
 $ lumber         : num  195 193 192 190 188 ...