I am trying to predict the predict values of y variable based on my polynomial model.
lumber.predict.plm=lm(lumber.unemployment.women$lumber.1980.2000 ~
scale(lumber.unemployment.women$woman.1980.2000) +
I(scale(lumber.unemployment.women$woman.1980.2000)^2))
xmin=min(lumber.unemployment.women$woman.1980.2000)
xmax=max(lumber.unemployment.women$woman.1980.2000)
predicted.lumber.whole=data.frame(x=seq(xmin, xmax, length.out=500))
predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
interval="confidence")
All of the above commands work fine except the last one. It gives the following error -
predicted.lumber.whole$lumber=predict(lumber.predict.plm,newdata=predicted.lumber.whole,
+ interval="confidence")
#Error in `$<-.data.frame`(`*tmp*`, "lumber", value = c(134.507238798567, :
# replacement has 252 rows, data has 500
#In addition: Warning message:
#'newdata' had 500 rows but variables found have 252 rows
Data frame properties on which Regression is being carried out..
str(lumber.unemployment.women)
#'data.frame': 252 obs. of 2 variables:
# $ lumber.1980.2000: num 108.2 109.9 109.6 99.8 97 ...
# $ woman.1980.2000 : num 5.8 5.9 5.7 6.3 6.4 6.5 6.6 6.7 6.3 6.7 ...
Why should predicted values depend on the number of observations that I have in the data frame ?
I think the following is your problem although the error message seems a bit obscure to me. Here is a simplified version of your code:
The following gives an error because the original data doesn't have "x" variable in your new data. Note that the
lm()
above did not automatically assign it to a variable called "x".Rather it is looking for "woman". SO if you did
summary(L.lm)
you would find the coefficient was "woman" not "x".The following works as original and new data contain the same variables
ps just to be clear this will also work with ...
a cleaner way of expressing polynomial fits.