fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients AND boundary (singular)

1.1k Views Asked by At

Study background: I want to see if mean caterpillar abundance for a given year and within a given population, can explain differences in bird(blue tit) density. The blue tits breed in nest boxes, so I calculated density as the total number of occupied nest boxes/the number of unoccupied nest boxes within a given year and population.

Below I show the structure of the data, my model and the error messages.

Model:

model1.1 <-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,
          data.density$number_of_nest.boxes)~population*year*caterpillar+(1|site),
   data = data.density, family=binomial)

The first error is:

fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients

I think this this is due to not having enough combinations of caterpillars with population x year.

The second error is

boundary (singular) fit: see ?isSingular

I'm just not sure how to go about fixing this. I also don't understand what the other error means and how to fix it.

I appreciate any advice.

#loading data density data

data.density<-read.csv ("nest_box_caterpillar_density.csv") 
View(data.density)


str(data.density)
#> 'data.frame':    63 obs. of  8 variables:
#>  $ year                                : int  2011 2012 2013 2014 2015 2016 2017 2018 2019 2011 ...
#>  $ number.nest.boxes.occupied.that.year: int  17 13 12 16 16 16 15 17 12 17 ...
#>  $ number_of_nest.boxes                : int  20 20 20 20 20 20 20 20 20 30 ...
#>  $ proportion_occupied_boxes           : num  0.85 0.65 0.6 0.8 0.8 0.8 0.75 0.85 0.6 0.57 ...
#>  $ site                                : Factor w/ 7 levels "ari","ava","fel",..: 5 5 5 5 5 5 5 5 5 1 ...
#>  $ population                          : Factor w/ 3 levels "D-Muro","E-Muro",..: 2 2 2 2 2 2 2 2 2 2 ...
#>  $ mean_yearly_frass                   : num  295 231 437 263 426 ...
#>  $ site_ID                             : Factor w/ 63 levels "2011_ari_","2011_ava_",..: 5 12 19 26 33 40 47 54 61 1 ...

data.density$year<-factor (data.density$year)# making year a factor (categorical variable)

str(data.density) # now we see year as a factor in the data. 
#> 'data.frame':    63 obs. of  8 variables:
#>  $ year                                : Factor w/ 9 levels "2011","2012",..: 1 2 3 4 5 6 7 8 9 1 ...
#>  $ number.nest.boxes.occupied.that.year: int  17 13 12 16 16 16 15 17 12 17 ...
#>  $ number_of_nest.boxes                : int  20 20 20 20 20 20 20 20 20 30 ...
#>  $ proportion_occupied_boxes           : num  0.85 0.65 0.6 0.8 0.8 0.8 0.75 0.85 0.6 0.57 ...
#>  $ site                                : Factor w/ 7 levels "ari","ava","fel",..: 5 5 5 5 5 5 5 5 5 1 ...
#>  $ population                          : Factor w/ 3 levels "D-Muro","E-Muro",..: 2 2 2 2 2 2 2 2 2 2 ...
#>  $ mean_yearly_frass                   : num  295 231 437 263 426 ...
#>  $ site_ID                             : Factor w/ 63 levels "2011_ari_","2011_ava_",..: 5 12 19 26 33 40 47 54 61 1 ...



density<-data.density$proportion_occupied_boxes # making a new object called density
caterpillar<-data.density$mean_yearly_frass # making new object called caterpillar

model1.1<-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,data.density$number_of_nest.boxes)~population*year*caterpillar+(1|site),data = data.density, family=binomial)  
#> fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients
#> boundary (singular) fit: see ?isSingular

summary(model1.1)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#>   Approximation) [glmerMod]
#>  Family: binomial  ( logit )
#> Formula: 
#> cbind(data.density$number.nest.boxes.occupied.that.year, data.density$number_of_nest.boxes) ~  
#>     population * year * caterpillar + (1 | site)
#>    Data: data.density
#> 
#>      AIC      BIC   logLik deviance df.resid 
#>    343.7    403.7   -143.8    287.7       35 
#> 
#> Scaled residuals: 
#>     Min      1Q  Median      3Q     Max 
#> -1.1125 -0.1379  0.0000  0.2264  0.6778 
#> 
#> Random effects:
#>  Groups Name        Variance Std.Dev.
#>  site   (Intercept) 0        0       
#> Number of obs: 63, groups:  site, 7
#> 
#> Fixed effects:
#>                              Estimate Std. Error z value Pr(>|z|)  
#> (Intercept)                -0.4054532  0.2454754  -1.652   0.0986 .
#> populationE-Muro           -0.1158123  0.2301030  -0.503   0.6147  
#> populationE-Pirio          -0.4945158  0.2932707  -1.686   0.0918 .
#> year2012                    0.0905137  0.2109513   0.429   0.6679  
#> year2013                   -0.1223076  0.2160367  -0.566   0.5713  
#> year2014                   -0.0703760  0.2304236  -0.305   0.7600  
#> year2015                   -0.0507882  0.2127083  -0.239   0.8113  
#> year2016                   -0.0562139  0.2077616  -0.271   0.7867  
#> year2017                   -0.0994962  0.2070464  -0.481   0.6308  
#> year2018                    0.0977751  0.2192755   0.446   0.6557  
#> year2019                   -0.2312869  0.2133430  -1.084   0.2783  
#> caterpillar                 0.0004598  0.0005432   0.846   0.3973  
#> populationE-Muro:year2012  -0.1217344  0.3294773  -0.369   0.7118  
#> populationE-Pirio:year2012 -0.3121173  0.2912256  -1.072   0.2838  
#> populationE-Muro:year2013  -0.0682892  0.3600992  -0.190   0.8496  
#> populationE-Pirio:year2013 -0.3345701  0.3051039  -1.097   0.2728  
#> populationE-Muro:year2014   0.1604636  0.3383121   0.474   0.6353  
#> populationE-Pirio:year2014 -0.1074231  0.3171972  -0.339   0.7349  
#> populationE-Muro:year2015   0.0838557  0.3491699   0.240   0.8102  
#> populationE-Pirio:year2015 -0.0640988  0.2943189  -0.218   0.8276  
#> populationE-Muro:year2016   0.0679017  0.3333771   0.204   0.8386  
#> populationE-Pirio:year2016 -0.0899343  0.2919975  -0.308   0.7581  
#> populationE-Muro:year2017   0.1643493  0.3300491   0.498   0.6185  
#> populationE-Pirio:year2017  0.0338824  0.2730344   0.124   0.9012  
#> populationE-Muro:year2018   0.0315607  0.3264224   0.097   0.9230  
#> populationE-Pirio:year2018 -0.4196974  0.3180515  -1.320   0.1870  
#> populationE-Muro:year2019  -0.0587593  0.3619408  -0.162   0.8710  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Correlation matrix not shown by default, as p = 27 > 12.
#> Use print(x, correlation=TRUE)  or
#>     vcov(x)        if you need it
#> fit warnings:
#> fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients
#> optimizer (Nelder_Mead) convergence code: 0 (OK)
#> boundary (singular) fit: see ?isSingular

Created on 2022-03-21 by the reprex package (v2.0.1)

I tried removing caterpillars from the model, and the error first error goes away. But the point of my model is to see how caterpillar effects density. I also still get the "boundary (singular) fit: see ?isSingular" error.

1

There are 1 best solutions below

2
On

There are a few problems with your model specification.

  • binomial responses should be specified as cbind(n_success, n_failure), not cbind(n_success, n_total) (in your case cbind(boxes_occupied, total_boxes-boxes_occupied). (I actually find it clearer to use the alternative specification boxes_occupied/total_boxes with the additional argument weights=total_boxes ...)
  • there are few absolutely hard and fast rules, but it probably makes sense to estimate the effects of at least year as a random effect, and possibly population within site (this will depend on how much you are interested in the detailed differences between populations)
  • with a total of 63 observations, you need to be parsimonious with your model; a reasonable rule of thumb is that you shouldn't try to estimate more than at most 6 parameters
  • so I would recommend something like
 glmer(boxes_occupied/total_boxes ~ caterpillar + population + (1|year) + (1|site) + caterpillar, 
    data = data.density, 
    weights = total_boxes,
    family = binomial) 

As for singularity, this is a fundamental issue in mixed models, and has been discussed in lots of places. There's a lot of information provided at ?lme4::isSingular (i.e., look up the help page for the isSingular() function), and at the GLMM FAQ, or see some of the existing answers on Stack Overflow about singular fits ...