Study background: I want to see if mean caterpillar abundance for a given year and within a given population, can explain differences in bird(blue tit) density. The blue tits breed in nest boxes, so I calculated density as the total number of occupied nest boxes/the number of unoccupied nest boxes within a given year and population.
Below I show the structure of the data, my model and the error messages.
Model:
model1.1 <-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,
data.density$number_of_nest.boxes)~population*year*caterpillar+(1|site),
data = data.density, family=binomial)
The first error is:
fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients
I think this this is due to not having enough combinations of caterpillars with population x year
.
The second error is
boundary (singular) fit: see ?isSingular
I'm just not sure how to go about fixing this. I also don't understand what the other error means and how to fix it.
I appreciate any advice.
#loading data density data
data.density<-read.csv ("nest_box_caterpillar_density.csv")
View(data.density)
str(data.density)
#> 'data.frame': 63 obs. of 8 variables:
#> $ year : int 2011 2012 2013 2014 2015 2016 2017 2018 2019 2011 ...
#> $ number.nest.boxes.occupied.that.year: int 17 13 12 16 16 16 15 17 12 17 ...
#> $ number_of_nest.boxes : int 20 20 20 20 20 20 20 20 20 30 ...
#> $ proportion_occupied_boxes : num 0.85 0.65 0.6 0.8 0.8 0.8 0.75 0.85 0.6 0.57 ...
#> $ site : Factor w/ 7 levels "ari","ava","fel",..: 5 5 5 5 5 5 5 5 5 1 ...
#> $ population : Factor w/ 3 levels "D-Muro","E-Muro",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ mean_yearly_frass : num 295 231 437 263 426 ...
#> $ site_ID : Factor w/ 63 levels "2011_ari_","2011_ava_",..: 5 12 19 26 33 40 47 54 61 1 ...
data.density$year<-factor (data.density$year)# making year a factor (categorical variable)
str(data.density) # now we see year as a factor in the data.
#> 'data.frame': 63 obs. of 8 variables:
#> $ year : Factor w/ 9 levels "2011","2012",..: 1 2 3 4 5 6 7 8 9 1 ...
#> $ number.nest.boxes.occupied.that.year: int 17 13 12 16 16 16 15 17 12 17 ...
#> $ number_of_nest.boxes : int 20 20 20 20 20 20 20 20 20 30 ...
#> $ proportion_occupied_boxes : num 0.85 0.65 0.6 0.8 0.8 0.8 0.75 0.85 0.6 0.57 ...
#> $ site : Factor w/ 7 levels "ari","ava","fel",..: 5 5 5 5 5 5 5 5 5 1 ...
#> $ population : Factor w/ 3 levels "D-Muro","E-Muro",..: 2 2 2 2 2 2 2 2 2 2 ...
#> $ mean_yearly_frass : num 295 231 437 263 426 ...
#> $ site_ID : Factor w/ 63 levels "2011_ari_","2011_ava_",..: 5 12 19 26 33 40 47 54 61 1 ...
density<-data.density$proportion_occupied_boxes # making a new object called density
caterpillar<-data.density$mean_yearly_frass # making new object called caterpillar
model1.1<-glmer(cbind(data.density$number.nest.boxes.occupied.that.year,data.density$number_of_nest.boxes)~population*year*caterpillar+(1|site),data = data.density, family=binomial)
#> fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients
#> boundary (singular) fit: see ?isSingular
summary(model1.1)
#> Generalized linear mixed model fit by maximum likelihood (Laplace
#> Approximation) [glmerMod]
#> Family: binomial ( logit )
#> Formula:
#> cbind(data.density$number.nest.boxes.occupied.that.year, data.density$number_of_nest.boxes) ~
#> population * year * caterpillar + (1 | site)
#> Data: data.density
#>
#> AIC BIC logLik deviance df.resid
#> 343.7 403.7 -143.8 287.7 35
#>
#> Scaled residuals:
#> Min 1Q Median 3Q Max
#> -1.1125 -0.1379 0.0000 0.2264 0.6778
#>
#> Random effects:
#> Groups Name Variance Std.Dev.
#> site (Intercept) 0 0
#> Number of obs: 63, groups: site, 7
#>
#> Fixed effects:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.4054532 0.2454754 -1.652 0.0986 .
#> populationE-Muro -0.1158123 0.2301030 -0.503 0.6147
#> populationE-Pirio -0.4945158 0.2932707 -1.686 0.0918 .
#> year2012 0.0905137 0.2109513 0.429 0.6679
#> year2013 -0.1223076 0.2160367 -0.566 0.5713
#> year2014 -0.0703760 0.2304236 -0.305 0.7600
#> year2015 -0.0507882 0.2127083 -0.239 0.8113
#> year2016 -0.0562139 0.2077616 -0.271 0.7867
#> year2017 -0.0994962 0.2070464 -0.481 0.6308
#> year2018 0.0977751 0.2192755 0.446 0.6557
#> year2019 -0.2312869 0.2133430 -1.084 0.2783
#> caterpillar 0.0004598 0.0005432 0.846 0.3973
#> populationE-Muro:year2012 -0.1217344 0.3294773 -0.369 0.7118
#> populationE-Pirio:year2012 -0.3121173 0.2912256 -1.072 0.2838
#> populationE-Muro:year2013 -0.0682892 0.3600992 -0.190 0.8496
#> populationE-Pirio:year2013 -0.3345701 0.3051039 -1.097 0.2728
#> populationE-Muro:year2014 0.1604636 0.3383121 0.474 0.6353
#> populationE-Pirio:year2014 -0.1074231 0.3171972 -0.339 0.7349
#> populationE-Muro:year2015 0.0838557 0.3491699 0.240 0.8102
#> populationE-Pirio:year2015 -0.0640988 0.2943189 -0.218 0.8276
#> populationE-Muro:year2016 0.0679017 0.3333771 0.204 0.8386
#> populationE-Pirio:year2016 -0.0899343 0.2919975 -0.308 0.7581
#> populationE-Muro:year2017 0.1643493 0.3300491 0.498 0.6185
#> populationE-Pirio:year2017 0.0338824 0.2730344 0.124 0.9012
#> populationE-Muro:year2018 0.0315607 0.3264224 0.097 0.9230
#> populationE-Pirio:year2018 -0.4196974 0.3180515 -1.320 0.1870
#> populationE-Muro:year2019 -0.0587593 0.3619408 -0.162 0.8710
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Correlation matrix not shown by default, as p = 27 > 12.
#> Use print(x, correlation=TRUE) or
#> vcov(x) if you need it
#> fit warnings:
#> fixed-effect model matrix is rank deficient so dropping 27 columns / coefficients
#> optimizer (Nelder_Mead) convergence code: 0 (OK)
#> boundary (singular) fit: see ?isSingular
Created on 2022-03-21 by the reprex package (v2.0.1)
I tried removing caterpillars from the model, and the error first error goes away. But the point of my model is to see how caterpillar effects density. I also still get the "boundary (singular) fit: see ?isSingular" error.
There are a few problems with your model specification.
cbind(n_success, n_failure)
, notcbind(n_success, n_total)
(in your casecbind(boxes_occupied, total_boxes-boxes_occupied)
. (I actually find it clearer to use the alternative specificationboxes_occupied/total_boxes
with the additional argumentweights=total_boxes
...)year
as a random effect, and possiblypopulation
withinsite
(this will depend on how much you are interested in the detailed differences between populations)As for singularity, this is a fundamental issue in mixed models, and has been discussed in lots of places. There's a lot of information provided at
?lme4::isSingular
(i.e., look up the help page for theisSingular()
function), and at the GLMM FAQ, or see some of the existing answers on Stack Overflow about singular fits ...