GLM prediction in r

4.9k Views Asked by At

I spilt the data set into train and test as following:

splitdata<-split(sb[1:nrow(sb),], sample(rep(1:2, as.integer(nrow(sb)/2))))
test<-splitdata[[1]]
train<-rbind(splitdata[[2]])

sb is the name of original data set, so it is 50/50 train and test.

Then I fitted a glm using the training set.

fitglm<-  glm(num_claims~year+vt+va+public+pri_bil+persist+penalty_pts+num_veh+num_drivers+married+gender+driver_age+credit+col_ded+car_den, family=poisson, train)

now I want to predict using this glm, say the next 10 observations.

I have trouble to specify the newdata in predict(),

I tried:

pred<-predict(fitglm,newdata=data.frame(train),type="response", se.fit=T)

this will give a number of predictions that is equal to the number of samples in training set.

and finally, how to plot these predictions with confidence intervals?

Thank you for the help

1

There are 1 best solutions below

2
On BEST ANSWER

If you are asking how to construct predictions on the next 10 in the test set then:

pred10<-predict(fitglm,newdata=data.frame(test)[1:10, ], type="response", se.fit=T) 

Edit 9 years later:

@carsten's comment is correct regarding how to construct a confidence interval. If one has a non-linear link function for a glm-object, fitglm then this is a reasonably general method to recover the inverse of the link function and construct a two-sided 95% CI on the response scale:

pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
pred.fit <- predict(fitglm, newdata=newdata, se.fit=TRUE)
CI.pred.upper <- family(fitglm)$linkinv(  # that information is in the model 
                        pred.fit+  1.96*pred.fit$se.fit )

CI.pred.lower <- family(fitglm)$linkinv(  # that information is in the model
                        pred.fit$fit - 1.96*pred.fit$se.fit )