My code when running the generalized additive model with the betar family is as follow.
libary(mgcv)
b1 <- gam(ssim_exp ~ s(stage, k = 4, fx = TRUE, by = comparison_type) + comparison_type, data = df, family = betar(link = "logit", eps=.Machine$double.eps*1000))
Output
saturated likelihood may be inaccurate
Family: Beta regression(0.434)
Link function: logit
Formula:
ssim_exp_scale ~ s(stage, k = 4, fx = TRUE, by = comparison_type) +
comparison_type
Parametric coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.5572 0.1607 -3.468 0.000524 ***
comparison_typefunctions 2.0598 0.1988 10.362 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df Chi.sq p-value
s(stage):comparison_typecomplete 3 3 19.07 0.000265 ***
s(stage):comparison_typefunctions 3 3 0.88 0.830160
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = -0.00757 Deviance explained = -16.4%
-REML = -1035.1 Scale est. = 1 n = 171
saturated likelihood may be inaccuratesaturated likelihood may be inaccurate
I tried decreasing the eps but I still get the same warning "saturated likelihood may be inaccurate" and negative deviance, any idea why? And how to fix this?
For context - I do have some 0s and 1s in the data and my dependent variable is in the form of percentage from 0 - 100%, rescaled to 0 and 1. My dependent variable is a similarity measure like Jaccard similarity - https://www.learndatasci.com/glossary/jaccard-similarity/ .
This is the distribution of the dependent variable of my data