Manually creating a rootogram to evaluate a GAM with a Tweedie distribution?

104 Views Asked by At

I have count data of a species which I am modelling using GAMs with negative binomial and Tweedie families. I want to compare the models visually using rootograms. This works fine for the negative binomial model but I am struggling to find a package for models with a Tweedie distribution.

I have attached some code as a reproducible example:

library(dplyr)
library(tweedie)
library(mgcv)
library(gratia)

# Set seed for reproducibility
set.seed(123)

# Number of rows
n_rows <- 50

# Create a dataframe with 50 rows and 3 columns
df <- data.frame(
  y = as.integer(rtweedie(n = n_rows, mu = 1, p = 1.5, phi =1)), 
  x1 = rnorm(n_rows),                   
  x2 = rnorm(n_rows)                    
)

# Display the first few rows of the dataframe
head(df)

model <- gam(y ~ s(x1)+s(x2), data = df, family = nb())
summary(model)

model2 <- gam(y ~ s(x1)+s(x2), data = df, family = tw())
summary(model2)

# Draw rootogram for negative binomial
rg <- gratia::rootogram(model)
gratia::draw(rg)

This produces the following:
Gratia table of observed vs fitted

and when drawn I get the requested rootogram:
Rootogram produced by Gratia

When I do this for the Tweedie model however:

# Draw rootogram for Tweedie
rg2 <- gratia::rootogram(model2)

I get an error message saying that "Only <Poisson, Negative Binomial, Gaussian> models supported." I have received several similar messages for other packages including countreg and topmodels.

I am therefore attempting a manual approach to recreate the rootogram steps based on the information provided in Gavin Simpson's helpful blogpost (https://fromthebottomoftheheap.net/2016/06/07/rootograms/)

The main features of the rootogram:

1) expected counts, given the model, are shown by the thick red line. In order to extract these I have predicted base on the model and the original dataframe.

expected_counts <- model$fitted.values
head(expected_counts)

2) observed counts are shown as bars, which in a hanging rootogram are show hanging from the red line of expected counts. These are just the original counts from the model:

observed_counts <- df$y

3) on the x-axis we have the count bin, 0 count, 1 count, 2 count, etc

To create the bine I have created tables of both counts (rounding the expected counts to integers).

# Tabulised counts
table_expected <- as.data.frame(table(round(expected_counts)))
names(table_expected) <- c("bin","fitted")
table_observed <- as.data.frame(table(observed_counts))
names(table_observed) <- c("bin","observed")

# merge into one dataframe
table_merged <- merge(table_observed, table_expected, by = "bin", all.x =T)

Output of table_merged:

Output of manual creation of observed vs fitted

This is very different to the output of rg. See bottom of post:

  1. on the y-axis we have the square root of the observed or expected count — the square root transformation allows for departures from expectations to be seen even at small frequencies. A reference line is drawn at a height of 0

I have then created a very crude rootogram in base R.

# Plot rootogram
plot(sqrt(table_merged$fitted) ~ as.numeric(table_merged$bin), type = "l",
     col = "lightblue", ylim = c(-3,7), xlim = c(-0.3,4), lwd = 4)

# Define colors and width for rectangles
rect_color <- "grey"
rect_width <- 0.3

# Loop through each element in the data and add rectangles
for (i in 1:nrow(table_merged)) {
  rect(xleft = table_merged$bin[i] - rect_width, 
       ybottom = sqrt(table_merged$fitted[i]) - sqrt(table_merged$observed[i]), 
       xright = table_merged$bin[i] + rect_width, 
       ytop = sqrt(table_merged$fitted[i]), 
       col = rect_color)
}

abline(h=0, lwd = 2)
abline(h=1, lwd = 2, lty = 2)
abline(h=-1, lwd = 2, lty = 2)

You will notice that in part 3 I get a considerably different answer to the table produced by the gratia package. What am I doing incorrectly in my step 3? Why do the fitted values created by Gratia and other packages have decimals if they are count data?

0

There are 0 best solutions below