Interpretation of Negative Binomial regression with interaction, offset term and sum contrasts

714 Views Asked by At

I am very new to R and I am having problems to understand the output of my sum contrasted negative binomial regression with and without interaction between two factors (categorical). Maybe somebody can help!

My dependent variable are species counts (observations). I included an offset term to account for varying "sample" effort. My two (categorical) factors are type of habitat (3 levels) and period (2 levels). I used sum contrasts to center my factors to compare against the "grand mean".

I am interested in the question, if period changes/effects the habitat use of the species.

First I ran a model including only the factor habitat:

contrasts(counts$habitat) =contr.sum(3,base=1,contrasts=TRUE)
nb_reg<-glm.nb(formula=counts~habitat+offset(log(obs)),data=speciescounts)
summary(nb_reg)

Call:
glm.nb(formula = counts ~ habitat + offset(log(obs)), data = speciescounts, 
    init.theta = 0.4115369822, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3610  -1.0464  -0.8868   0.1151   4.9992  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -2.8424     0.1968 -14.440  < 2e-16 ***
habitat1       0.5624     0.2163   2.600  0.00931 ** 
habitat3      -0.1403     0.2218  -0.633  0.52701    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.4115) family taken to be 1)

    Null deviance: 230.15  on 254  degrees of freedom
Residual deviance: 219.83  on 252  degrees of freedom
AIC: 759.4

Number of Fisher Scoring iterations: 1


              Theta:  0.4115 
          Std. Err.:  0.0641 

 2 x log-likelihood:  -751.3990

To get the "missing" habitat type 2, I changed the base in the contrast code, which got me this output:

Call:
glm.nb(formula = counts ~ habitat + offset(log(obs)), data = speciescounts, 
    init.theta = 0.4115369822, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3610  -1.0464  -0.8868   0.1151   4.9992  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)    -2.8424     0.1968 -14.440   <2e-16 ***
habitat2       -0.4221     0.3695  -1.142    0.253    
habitat3       -0.1403     0.2218  -0.633    0.527    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.4115) family taken to be 1)

    Null deviance: 230.15  on 254  degrees of freedom
Residual deviance: 219.83  on 252  degrees of freedom
AIC: 759.4

Number of Fisher Scoring iterations: 1


              Theta:  0.4115 
          Std. Err.:  0.0641 

 2 x log-likelihood:  -751.3990

What would the interpretation be here? For me it seems as if habitat1 was the most popular, most frequently used habitat tpye??

Now I added the interaction with period; I used sum contrast on period as well and also changed the base here, to get the "missing" interaction. I will only show the two outputs, as it is the same principle as for habitat above.

nb_reg<-glm.nb(formula=counts~habitat*period+offset(log(obs)),data=speciescounts)
summary(nb_reg)

Call:
glm.nb(formula = counts ~ habitat * period + offset(log(obs)), 
    data = speciescounts, init.theta = 0.4672185898, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4925  -1.0366  -0.7316   0.1038   4.6313  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
(Intercept)          -2.93450    0.19567 -14.997  < 2e-16 *** 
habitat1              0.64894    0.21399   3.033  0.00243 **   
habitat2             -0.36639    0.36510  -1.004  0.31560      
period1               0.09005    0.19567   0.460  0.64537      
habitat1:period1     -0.49599    0.21399  -2.318  0.02046 *   
habitat2:period1      0.07686    0.36510   0.211  0.83325     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.4672) family taken to be 1)

    Null deviance: 247.03  on 254  degrees of freedom
Residual deviance: 221.38  on 249  degrees of freedom
AIC: 751.73

Number of Fisher Scoring iterations: 1


              Theta:  0.4672 
          Std. Err.:  0.0756 

 2 x log-likelihood:  -737.7250

Call:
glm.nb(formula = counts ~ habitat * period + offset(log(obs)), 
    data = speciescounts, init.theta = 0.4672185898, link = log)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4925  -1.0366  -0.7316   0.1038   4.6313  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -2.93450    0.19567 -14.997   <2e-16 ***
habitat2          -0.36639    0.36510  -1.004   0.3156    
habitat3          -0.28255    0.22502  -1.256   0.2092    
period1            0.09005    0.19567   0.460   0.6454    
habitat2:period1   0.07686    0.36510   0.211   0.8333    
habitat3:period1   0.41913    0.22502   1.863   0.0625
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for Negative Binomial(0.4672) family taken to be 1)

    Null deviance: 247.03  on 254  degrees of freedom
Residual deviance: 221.38  on 249  degrees of freedom
AIC: 751.73

Number of Fisher Scoring iterations: 1


              Theta:  0.4672 
          Std. Err.:  0.0756 

 2 x log-likelihood:  -737.7250

So, there is a significant interaction between habitat1 and period1, but I am again not sure how to interpret it. For me it seems as if there are less animals in habitat1 during period1, but less compared to what?

I would be happy, if somebody could help! Thank you!

1

There are 1 best solutions below

0
On

For your interaction model, it seems like you have set the second level of period as the reference level (as does SAS). The intercept estimate equates to the second period at habitat 1. the 'main effects' of period refer to the differences in habitat for the base level of period which seems to be period 2.
BTW I'll add a couple of comments. Not sure about the strength of your interaction effect' and I would be looking at other possible parameterisations such as comparing periods within habitat. ... counts ~ habitat/period -1 , data = .... or treating the counts as a bivariate response for the two periods. Another idea would be to make the first count a covariate and just analyse the second count as the response. Model might be .... count2 ~ count1*habitat, ....