I have count data of a species which I am modelling using GAMs with negative binomial and Tweedie families. I want to compare the models visually using rootograms. This works fine for the negative binomial model but I am struggling to find a package for models with a Tweedie distribution.
I have attached some code as a reproducible example:
library(dplyr)
library(tweedie)
library(mgcv)
library(gratia)
# Set seed for reproducibility
set.seed(123)
# Number of rows
n_rows <- 50
# Create a dataframe with 50 rows and 3 columns
df <- data.frame(
y = as.integer(rtweedie(n = n_rows, mu = 1, p = 1.5, phi =1)),
x1 = rnorm(n_rows),
x2 = rnorm(n_rows)
)
# Display the first few rows of the dataframe
head(df)
model <- gam(y ~ s(x1)+s(x2), data = df, family = nb())
summary(model)
model2 <- gam(y ~ s(x1)+s(x2), data = df, family = tw())
summary(model2)
# Draw rootogram for negative binomial
rg <- gratia::rootogram(model)
gratia::draw(rg)
and when drawn I get the requested rootogram:
When I do this for the Tweedie model however:
# Draw rootogram for Tweedie
rg2 <- gratia::rootogram(model2)
I get an error message saying that "Only <Poisson, Negative Binomial, Gaussian> models supported." I have received several similar messages for other packages including countreg and topmodels.
I am therefore attempting a manual approach to recreate the rootogram steps based on the information provided in Gavin Simpson's helpful blogpost (https://fromthebottomoftheheap.net/2016/06/07/rootograms/)
The main features of the rootogram:
1) expected counts, given the model, are shown by the thick red line. In order to extract these I have predicted base on the model and the original dataframe.
expected_counts <- model$fitted.values
head(expected_counts)
2) observed counts are shown as bars, which in a hanging rootogram are show hanging from the red line of expected counts. These are just the original counts from the model:
observed_counts <- df$y
3) on the x-axis we have the count bin, 0 count, 1 count, 2 count, etc
To create the bine I have created tables of both counts (rounding the expected counts to integers).
# Tabulised counts
table_expected <- as.data.frame(table(round(expected_counts)))
names(table_expected) <- c("bin","fitted")
table_observed <- as.data.frame(table(observed_counts))
names(table_observed) <- c("bin","observed")
# merge into one dataframe
table_merged <- merge(table_observed, table_expected, by = "bin", all.x =T)
Output of table_merged:
This is very different to the output of rg. See bottom of post:
- on the y-axis we have the square root of the observed or expected count — the square root transformation allows for departures from expectations to be seen even at small frequencies. A reference line is drawn at a height of 0
I have then created a very crude rootogram in base R.
# Plot rootogram
plot(sqrt(table_merged$fitted) ~ as.numeric(table_merged$bin), type = "l",
col = "lightblue", ylim = c(-3,7), xlim = c(-0.3,4), lwd = 4)
# Define colors and width for rectangles
rect_color <- "grey"
rect_width <- 0.3
# Loop through each element in the data and add rectangles
for (i in 1:nrow(table_merged)) {
rect(xleft = table_merged$bin[i] - rect_width,
ybottom = sqrt(table_merged$fitted[i]) - sqrt(table_merged$observed[i]),
xright = table_merged$bin[i] + rect_width,
ytop = sqrt(table_merged$fitted[i]),
col = rect_color)
}
abline(h=0, lwd = 2)
abline(h=1, lwd = 2, lty = 2)
abline(h=-1, lwd = 2, lty = 2)
You will notice that in part 3 I get a considerably different answer to the table produced by the gratia package. What am I doing incorrectly in my step 3? Why do the fitted values created by Gratia and other packages have decimals if they are count data?