I am very new to R and I am having problems to understand the output of my sum contrasted negative binomial regression with and without interaction between two factors (categorical). Maybe somebody can help!
My dependent variable are species counts (observations). I included an offset term to account for varying "sample" effort. My two (categorical) factors are type of habitat (3 levels) and period (2 levels). I used sum contrasts to center my factors to compare against the "grand mean".
I am interested in the question, if period changes/effects the habitat use of the species.
First I ran a model including only the factor habitat:
contrasts(counts$habitat) =contr.sum(3,base=1,contrasts=TRUE)
nb_reg<-glm.nb(formula=counts~habitat+offset(log(obs)),data=speciescounts)
summary(nb_reg)
Call:
glm.nb(formula = counts ~ habitat + offset(log(obs)), data = speciescounts,
init.theta = 0.4115369822, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3610 -1.0464 -0.8868 0.1151 4.9992
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8424 0.1968 -14.440 < 2e-16 ***
habitat1 0.5624 0.2163 2.600 0.00931 **
habitat3 -0.1403 0.2218 -0.633 0.52701
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4115) family taken to be 1)
Null deviance: 230.15 on 254 degrees of freedom
Residual deviance: 219.83 on 252 degrees of freedom
AIC: 759.4
Number of Fisher Scoring iterations: 1
Theta: 0.4115
Std. Err.: 0.0641
2 x log-likelihood: -751.3990
To get the "missing" habitat type 2, I changed the base in the contrast code, which got me this output:
Call:
glm.nb(formula = counts ~ habitat + offset(log(obs)), data = speciescounts,
init.theta = 0.4115369822, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3610 -1.0464 -0.8868 0.1151 4.9992
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.8424 0.1968 -14.440 <2e-16 ***
habitat2 -0.4221 0.3695 -1.142 0.253
habitat3 -0.1403 0.2218 -0.633 0.527
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4115) family taken to be 1)
Null deviance: 230.15 on 254 degrees of freedom
Residual deviance: 219.83 on 252 degrees of freedom
AIC: 759.4
Number of Fisher Scoring iterations: 1
Theta: 0.4115
Std. Err.: 0.0641
2 x log-likelihood: -751.3990
What would the interpretation be here? For me it seems as if habitat1 was the most popular, most frequently used habitat tpye??
Now I added the interaction with period; I used sum contrast on period as well and also changed the base here, to get the "missing" interaction. I will only show the two outputs, as it is the same principle as for habitat above.
nb_reg<-glm.nb(formula=counts~habitat*period+offset(log(obs)),data=speciescounts)
summary(nb_reg)
Call:
glm.nb(formula = counts ~ habitat * period + offset(log(obs)),
data = speciescounts, init.theta = 0.4672185898, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4925 -1.0366 -0.7316 0.1038 4.6313
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.93450 0.19567 -14.997 < 2e-16 ***
habitat1 0.64894 0.21399 3.033 0.00243 **
habitat2 -0.36639 0.36510 -1.004 0.31560
period1 0.09005 0.19567 0.460 0.64537
habitat1:period1 -0.49599 0.21399 -2.318 0.02046 *
habitat2:period1 0.07686 0.36510 0.211 0.83325
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4672) family taken to be 1)
Null deviance: 247.03 on 254 degrees of freedom
Residual deviance: 221.38 on 249 degrees of freedom
AIC: 751.73
Number of Fisher Scoring iterations: 1
Theta: 0.4672
Std. Err.: 0.0756
2 x log-likelihood: -737.7250
Call:
glm.nb(formula = counts ~ habitat * period + offset(log(obs)),
data = speciescounts, init.theta = 0.4672185898, link = log)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4925 -1.0366 -0.7316 0.1038 4.6313
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.93450 0.19567 -14.997 <2e-16 ***
habitat2 -0.36639 0.36510 -1.004 0.3156
habitat3 -0.28255 0.22502 -1.256 0.2092
period1 0.09005 0.19567 0.460 0.6454
habitat2:period1 0.07686 0.36510 0.211 0.8333
habitat3:period1 0.41913 0.22502 1.863 0.0625
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for Negative Binomial(0.4672) family taken to be 1)
Null deviance: 247.03 on 254 degrees of freedom
Residual deviance: 221.38 on 249 degrees of freedom
AIC: 751.73
Number of Fisher Scoring iterations: 1
Theta: 0.4672
Std. Err.: 0.0756
2 x log-likelihood: -737.7250
So, there is a significant interaction between habitat1 and period1, but I am again not sure how to interpret it. For me it seems as if there are less animals in habitat1 during period1, but less compared to what?
I would be happy, if somebody could help! Thank you!
For your interaction model, it seems like you have set the second level of period as the reference level (as does SAS). The intercept estimate equates to the second period at habitat 1. the 'main effects' of period refer to the differences in habitat for the base level of period which seems to be period 2.
BTW I'll add a couple of comments. Not sure about the strength of your interaction effect' and I would be looking at other possible parameterisations such as comparing periods within habitat. ... counts ~ habitat/period -1 , data = .... or treating the counts as a bivariate response for the two periods. Another idea would be to make the first count a covariate and just analyse the second count as the response. Model might be .... count2 ~ count1*habitat, ....