I'm getting this warning after checking the variance components of my model. I don't have NA's and the warning goes away when I remove this s(CYR.std, fSite, bs = "fs")
term. Is it possibly related to having only 1 observation associated with a fSite
in a year (CYR.std
)? I get the same warning with the full dataset.
gratia::variance_comp(m1)
# A tibble: 4 × 5
component variance std_dev lower_ci upper_ci
<chr> <dbl> <dbl> <dbl> <dbl>
1 s(sal) 4.47e-10 0.0000211 0.00000968 0.0000462
2 s(CYR.std,fSite)1 3.92e- 2 0.198 0.112 0.352
3 s(CYR.std,fSite)2 6.13e- 1 0.783 0 Inf
4 s(CYR.std,fSite)3 3.43e- 7 0.000586 0.000268 0.00128
Warning messages:
1: In lsd - crit * sd.lsd :
longer object length is not a multiple of shorter object length
2: In lsd + crit * sd.lsd :
longer object length is not a multiple of shorter object length
Random subset of data recreates the error:
MyVar <- c("fCYR", "CYR.std", "fSeason", "fSite", "area_sampled", "num", "sal")
x <- shrimp2[MyVar][sample(nrow(shrimp2[MyVar]), 100), ]
na_count <-sapply(x, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
na_count
na_count
fCYR 0
CYR.std 0
fSeason 0
fSite 0
area_sampled 0
num 0
sal 0
library(mgcv)
m1 <- bam(num ~ s(sal) +
CYR.std*fSeason +
s(CYR.std, fSite, bs = "fs"),
offset(log(area_sampled)),
method = "fREML",
discrete = TRUE,
family = nb(link = "log"),
data = x)
UPDATE:
I even get the same error with the most basic of random effects...
m <- bam(num ~ s(sal) +
CYR.std +
fSeason +
s(fSite, bs = "re") +
s(fCYR, bs = "re"),
offset(log(area_sampled)),
method = "fREML",
discrete = TRUE,
family = nb(link = "log"),
data = x)
> mgcv::gam.vcomp(m)
Standard deviations and 0.95 confidence intervals:
std.dev lower upper
s(sal) 0.01623266 0.0105176 0.02505317
s(fSite) 0.83502554 0.3549272 1.96453684
s(fCYR) 0.54846495 0.3553660 0.84649014
Rank: 2/2
Warning messages:
1: In lsd - crit * sd.lsd :
longer object length is not a multiple of shorter object length
2: In lsd + crit * sd.lsd :
longer object length is not a multiple of shorter object length
Full dataset (n=1327):
> table(shrimp2$CYR.std, shrimp2$fSite)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
0 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 2 2 2 2 2 2 2 2 2 2 2 2
3 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
4 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
5 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 2
6 2 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1
7 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
8 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
9 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 2 2 1 2 2 2 2 2 2 1 2 1 2 2 2 2 2 2 2 2 2 2 2
10 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
11 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
12 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
13 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
14 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
42 43 44 45 46 47
0 2 2 2 1 2 2
1 2 2 2 2 2 2
2 2 1 1 1 1 1
3 2 2 2 2 2 2
4 2 2 2 2 2 2
5 2 2 2 2 2 2
6 1 1 1 1 1 1
7 2 2 2 2 2 2
8 2 2 2 2 2 2
9 2 2 2 2 2 2
10 2 2 2 2 2 2
11 2 2 2 2 2 2
12 2 2 2 2 2 2
13 1 1 1 1 1 1
14 2 2 2 2 2 2
Smaller dataset:
> dput(x)
structure(list(fCYR = structure(c(9L, 9L, 12L, 15L, 11L, 11L,
8L, 11L, 2L, 9L, 1L, 14L, 13L, 10L, 3L, 15L, 14L, 6L, 7L, 1L,
9L, 8L, 6L, 8L, 2L, 4L, 3L, 6L, 7L, 15L, 11L, 1L, 2L, 9L, 5L,
5L, 2L, 6L, 11L, 6L, 13L, 12L, 9L, 6L, 6L, 6L, 7L, 5L, 1L, 12L,
1L, 14L, 14L, 5L, 8L, 2L, 4L, 1L, 15L, 3L, 8L, 12L, 12L, 15L,
4L, 11L, 5L, 1L, 9L, 6L, 4L, 7L, 15L, 7L, 3L, 6L, 11L, 14L, 3L,
10L, 11L, 13L, 7L, 4L, 11L, 3L, 10L, 10L, 9L, 1L, 10L, 10L, 1L,
10L, 9L, 9L, 7L, 15L, 11L, 14L), levels = c("2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "2020", "2021", "2022"), class = "factor"), CYR.std = c(8L,
8L, 11L, 14L, 10L, 10L, 7L, 10L, 1L, 8L, 0L, 13L, 12L, 9L, 2L,
14L, 13L, 5L, 6L, 0L, 8L, 7L, 5L, 7L, 1L, 3L, 2L, 5L, 6L, 14L,
10L, 0L, 1L, 8L, 4L, 4L, 1L, 5L, 10L, 5L, 12L, 11L, 8L, 5L, 5L,
5L, 6L, 4L, 0L, 11L, 0L, 13L, 13L, 4L, 7L, 1L, 3L, 0L, 14L, 2L,
7L, 11L, 11L, 14L, 3L, 10L, 4L, 0L, 8L, 5L, 3L, 6L, 14L, 6L,
2L, 5L, 10L, 13L, 2L, 9L, 10L, 12L, 6L, 3L, 10L, 2L, 9L, 9L,
8L, 0L, 9L, 9L, 0L, 9L, 8L, 8L, 6L, 14L, 10L, 13L), fSeason = structure(c(2L,
1L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 2L), levels = c("DRY", "WET"), class = "factor"), fSite = structure(c(18L,
33L, 27L, 15L, 47L, 8L, 3L, 29L, 34L, 6L, 44L, 8L, 11L, 14L,
4L, 43L, 19L, 39L, 15L, 47L, 24L, 36L, 19L, 39L, 39L, 33L, 16L,
35L, 40L, 11L, 42L, 31L, 40L, 25L, 6L, 35L, 38L, 16L, 34L, 42L,
47L, 9L, 27L, 11L, 29L, 35L, 36L, 46L, 11L, 21L, 1L, 31L, 10L,
32L, 6L, 22L, 42L, 25L, 44L, 28L, 15L, 20L, 8L, 14L, 23L, 6L,
8L, 32L, 2L, 45L, 31L, 20L, 7L, 37L, 46L, 39L, 36L, 35L, 4L,
1L, 19L, 16L, 2L, 4L, 23L, 20L, 10L, 27L, 15L, 34L, 40L, 8L,
13L, 26L, 39L, 43L, 34L, 39L, 22L, 46L), levels = c("1", "2",
"3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14",
"15", "16", "17", "18", "19", "20", "21", "22", "23", "24", "25",
"26", "27", "28", "29", "30", "31", "32", "33", "34", "35", "36",
"37", "38", "39", "40", "41", "42", "43", "44", "45", "46", "47"
), class = "factor"), area_sampled = c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), num = c(0L,
1L, 0L, 0L, 2L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 2L, 0L, 0L, 2L, 0L,
0L, 0L, 0L, 0L, 3L, 0L, 19L, 3L, 2L, 0L, 0L, 0L, 0L, 1L, 1L,
2L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 1L, 2L, 0L, 0L, 1L, 0L, 1L, 0L,
3L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L,
6L, 0L, 0L, 1L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 3L,
1L, 0L, 0L, 0L, 2L, 1L, 0L, 1L, 0L, 1L, 0L, 3L, 0L, 0L, 0L, 1L,
0L, 2L, 2L, 1L), sal = c(4.78, 18.55, 19.72, 25.08, 16.48, 15.47,
24.88, 15.12, 25.69, 23.64, 33.58, 24.37, 31.77, 28.98, 21.14,
25.37, 20, 23.03, 27.01, 27.46, 8.06, 29.7, 27.44, 30.41, 21.13,
22.47, 16.8, 23.54, 37.1, 30.16, 29.26, 22.08, 31.68, 10.56,
19.1, 14.73, 29.34, 17.77, 20.07, 32.71, 30.47, 27.76, 11.26,
31.8, 26.85, 10.79, 17.31, 26.85, 20.62, 26.58, 30.46, 23.9,
23.01, 22.3, 20.49, 18.19, 29.74, 21.27, 23.74, 25.42, 23.46,
26.33, 30.99, 29.99, 29.77, 21, 29, 21.37, 15.83, 32.36, 33.52,
24.76, 28.48, 17.23, 33.79, 19.89, 2.79, 21.75, 29.98, 15.18,
13.48, 29.94, 27.39, 37.13, 4.04, 21.7, 11.29, 6.3, 11.37, 21.44,
19.82, 10.03, 22.73, 21.73, 8.76, 13.49, 18.21, 35.25, 4.15,
25.88)), row.names = c(268512L, 258208L, 329600L, 387560L, 312120L,
309912L, 253424L, 298136L, 109720L, 264096L, 75128L, 381304L,
341008L, 290224L, 128304L, 383880L, 373760L, 198776L, 235208L,
65376L, 258576L, 238336L, 203192L, 245328L, 89664L, 142104L,
134560L, 202088L, 227112L, 393080L, 296112L, 65744L, 110824L,
258760L, 195280L, 187736L, 110456L, 211656L, 299056L, 199328L,
337144L, 335120L, 266304L, 204664L, 200984L, 210552L, 220488L,
172096L, 83040L, 323528L, 62800L, 375968L, 379280L, 173200L,
254160L, 107512L, 152592L, 68872L, 384064L, 132720L, 252136L,
323344L, 317456L, 395840L, 164552L, 309360L, 177248L, 65928L,
271456L, 199880L, 162344L, 231344L, 386088L, 220672L, 138792L,
207792L, 310096L, 376704L, 122048L, 275320L, 299976L, 342296L,
225272L, 166024L, 315248L, 120392L, 275504L, 282864L, 267960L,
66296L, 286544L, 276608L, 83408L, 288200L, 256736L, 264832L,
220120L, 402096L, 315064L, 378912L), class = "data.frame")
I think you'll need to speak to Simon Wood about this one; the warning is raised in his
mgcv::gam.vcomp()
, which gratia calls (allvariance_comp()
does is put some sugar on top of the hard work mgcv does; returns the output as a tibble, adds the variance as a column).I think there is something weird going on. The warnings come from these two lines in gam.vcomp:
and the problem comes from the
sd.lsd
vector. This isn't of length 4:and this is because the hessian from the outer iteration is not of the same rank when fitted with
bam()
as it is withgam()
, and I think the current code either assumes something that isgam()
-specific or fitting withbam()
is highlight a potential issue with your data/model. For example, fitting withdiscrete = FALSE
(on the subset you provided) resulted in a warning that something didn't converge (even though the outer iteration seemingly converged).In this model, fitted with
gam()
, the hessian (of the smoothing parameters) is a 5x5 matrix because the outer iteration is also over $\theta$ parameter for the negative binomial. The code ingam.vcomp()
checks to see if there are any extra parameters outwith the smoothing parameters being estimated and if there are it drops those columns from the hessian. In the case of thebam()
fit, the hessian is actually a 4x4 matrix - which seems odd - even though there should be a $\theta$ being estimated.I think that there's a column missing with
bam()
is actually telling you something about the model; it looks like it is over-fitted to the data. If I look atgam.vcomp()
on the model fitted withgam()
I see:where at least two of the parameters have effectively 0 variance and a confidence interval that goes from (effectively) 0 to (again effectively) infinity, which is a pretty strong indication that you are asking too much from this model and data combination.
The smooths estimated by
s(CYR.std,fSite)
are linear when fitted withgam()
:while with
bam()
withdiscrete
set toTRUE
we seeand with
discrete = FALSE
we seebut I doubt I'd believe the two latter fits, especially in light of the model with
discrete = FALSE
emitting a convergence failure warning, and the problems highlighted withgam.vcomp()
on thegam()
version.While I think this is ultimately a data issue — you're asking too much from a these data — there is a problem with
gam.vcomp()
as it is dropping a column from the hessian for the theta but the hessian is 1 rank lower than it should be. And I think it should be 5x5 because there is $\theta$, 1 smoothing parameter fors(sal)
and three fors(CYR.std, fSite, bs = "fs")
. But I'm not familiar enough withbam()
to know why it is returning a smaller hessian thangam()
. Unless it is the result of the rank-deficiency of the model given the data causing some problem earlier in the chain —bam()
does a lot of fancy matrix operations so something could be getting lost in one of those operations that don't happen when fitting withgam()
.When you contact Simon, feel free to link to this Q&A, but do send him the subset of data as a CSV file and the complete code needed to load the data and fit the model, raise the warning. You want a minimal working example:
Show the pertinent output in your message (that from
gam.vcomp()
and the hessians, plus thebam()
call) but make sure Simon and just run the code himself (so don't interleve output).I will admit to getting very confused about your modelling process with all your questions. It seems like non of your questions fit the same model structure, yet are seemingly being fitted to the same data. I'm a little worried you are torturing these data to the point of death and their death throws are higlighting issues. Do you really need FS smooths in the full data?