Model simplification (two way ANOVA)

237 Views Asked by At

I am using ANOVA to analyse results from an experiment to see whether there are any effects of my explanatory variables (Heating and Dungfauna) on my response variable (Biomass). I started by looking at the main effects and interaction:

full.model <- lm(log(Biomass) ~ Heating*Dungfauna, data= df)
anova(full.model)

I understand that it is necessary to complete model simplification, removing non-significant interactions or effects to eventually reach the simplest model which still explains the results. I tried two ways of removing the interaction. However, when I manually remove the interaction (Heating*Fauna -> Heating+Fauna), the new ANOVA gives a different output to when I use this model simplification 'shortcut':

new.model <- update(full.model, .~. -Dungfauna:Heating)
anova(model)

Which way is the appropriate way to remove the interaction and simplify the model?

In both cases the data is log transformed -

lm(log(CC_noAcari_EmergencePatSoil)~ Dungfauna*Heating, data= biomass)

ANOVA output from manually changing Heating*Dungfauna to Heating+Dungfauna:

Response: log(CC_noAcari_EmergencePatSoil)

          Df Sum Sq Mean Sq F value    Pr(>F)    
Heating    2  4.806   2.403  5.1799   0.01012 *  
Dungfauna  1 37.734  37.734 81.3432 4.378e-11 ***
Residuals 39 18.091   0.464

ANOVA output from using simplification 'shortcut':

Response: log(CC_noAcari_EmergencePatSoil)
          Df Sum Sq Mean Sq F value    Pr(>F)   
Dungfauna  1 41.790  41.790 90.0872 1.098e-11 ***
Heating    2  0.750   0.375  0.8079    0.4531    
Residuals 39 18.091   0.464                  
1

There are 1 best solutions below

0
On

R's anova and aov functions compute the Type I or "sequential" sums of squares. The order in which the predictors are specified matters. A model that specifies y ~ A + B is asking for the effect of A conditioned on B, whereas Y ~ B + A is asking for the effect of B conditioned on A. Notice that your first model specifies Dungfauna*Heating, while your comparison model uses Heating+Dungfauna.

Consider this simple example using the "mtcars" data set. Here I specify two additive models (no interactions). Both models specify the same predictors, but in different orders:

add.model <- lm(log(mpg) ~ vs + cyl, data = mtcars)
anova(add.model)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
vs         1 1.22434 1.22434  48.272 1.229e-07 ***
cyl        1 0.78887 0.78887  31.103 5.112e-06 ***
Residuals 29 0.73553 0.02536         

add.model2 <- lm(log(mpg) ~ cyl + vs, data = mtcars)
anova(add.model2)

          Df  Sum Sq Mean Sq F value    Pr(>F)    
cyl        1 2.00795 2.00795 79.1680 8.712e-10 ***
vs         1 0.00526 0.00526  0.2073    0.6523    
Residuals 29 0.73553 0.02536 

You could specify Type II or Type III sums of squares using car::Anova:

car::Anova(add.model, type = 2)
car::Anova(add.model2, type = 2)

Which gives the same result for both models:

           Sum Sq Df F value    Pr(>F)    
vs        0.00526  1  0.2073    0.6523    
cyl       0.78887  1 31.1029 5.112e-06 ***
Residuals 0.73553 29         

summary also provides equivalent (and consistent) metrics regardless of the order of predictors, though it's not quite a formal ANOVA table:

summary(add.model)

            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.92108    0.20714  18.930  < 2e-16 ***
vs          -0.04414    0.09696  -0.455    0.652    
cyl         -0.15261    0.02736  -5.577 5.11e-06 ***