How can I ensure convergence of my BAM model with mgcv in R?

42 Views Asked by At

I am trying to fit a binomial GAM using the mgcv library in R. Because I have a high number of data points (~20,000), I am using the bam function. The structure of the model is as follows:

Model <- bam(PresAbs ~ s(X1,bs="ts", k=30) + 
                s(X2,bs="ts", k=30) + 
                s(X3,bs="ts", k=30) + 
                ti(X2, X3, k=30 ) + 
                s(X4,bs="ts", k=30) +   
                s(X5,bs="ts", k=30) +
                s(X6,bs="ts", k=30) + 
                s(Fact, bs = "re"), 
              data = DF,  family = binomial, select = TRUE) 

My problem is for some of my models (depends on what data frame I am using, as I have several), the model would not converge. I would get the following warning: Warning: Possible divergence detected in fast.REML.fit a few times, and my model is not well converged.

The responses are either 0 or 1 (see attached graph), and the response variable curves seem to diverge (oscillate between -10^17 and 10^17). I also attach an example of a response variable, they all look similar.fitted vs. response values a response variable

Family: binomial 
Link function: logit 

Formula:
PresAbs ~ s(X1, bs = "ts", k = 30) + s(X2, bs = "ts", k = 30) + 
    s(X3, bs = "ts", k = 30) + ti(X2, X3, k = 30) + 
    s(X4, bs = "ts", k = 30) + s(X5, bs = "ts", k = 30) + s(X6, 
    bs = "ts", k = 30) + s(Fact, bs = "re")

Parametric coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.040e+15  1.339e+11   -7769   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Approximate significance of smooth terms:
                 edf Ref.df    Chi.sq p-value    
s(X1)             29     29 3.614e+16  <2e-16 ***
s(X2)             28     29 1.135e+16  <2e-16 ***
s(X3)             28     29 5.698e+16  <2e-16 ***
ti(X2,X3)        768    769 5.444e+17  <2e-16 ***
s(x4)             29     29 7.571e+16  <2e-16 ***
s(X5)             28     29 8.756e+15  <2e-16 ***
s(X6)             29     29 9.352e+16  <2e-16 ***
s(Fact)           25     25 8.705e+15  <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.955   Deviance explained = 68.8%
fREML = 4.9531e+15  Scale est. = 1         n = 6445

I tried increasing k, from 10 to 15 to 30, with no change. I am a bit unsure whether I can increase it more because of computation time and because that seems like a lot of different splines to use... I also tried changing the spline basis, from ts to cr (except for Fact, that is a random effect). I also tried adding discrete = TRUE, but that did not solve the issue (and outputed a model with a deviance explained of -176% !) Any input as to what you think might be happening would be greatly appreciated. I am a bit unsure of the way to proceed now (should I increase the spline basis even more, or actually decrease it? Should I try another function so that I can use another method that fastREML? Might this have anything to do with initial conditions?)

Thanks!

0

There are 0 best solutions below