Does the `by` argument in `avg_comparisons` compute the strata specific marginal effect?

394 Views Asked by At

I'm analyzing data from an AB test we just finished running. Our outcome is binary, y, and we have stratified results by a third variable, g.

Because the intervention could vary by g, I've fit a Poisson regression with robust covariance estimation as follows

library(tidyverse)
library(sandwich)
library(marginaleffects)

fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))

From here, I'd like to know the strata specific causal risk ration (which we usually call "lift" in industry). My approach is to use avg_comparisons as follows

avg_comparisons(fit, 
                variables = 'treatment',
                newdata = model_data,
                transform_pre = 'lnratioavg', 
                transform_post = exp, 
                by=c('g'),
                vcov = 'HC')

The result seems to be consistent with calculations of the lift when I filter the data by groups in g.

Question

By passing by=c('g'), am I actually calculating the strata specific risk ratios as I suspect? Is there any hidden "gotchas" or things I have failed to consider?

I can provide data and a minimal working example if need be.

1

There are 1 best solutions below

0
On BEST ANSWER

Here’s a very simple base R example to show what is happening under-the-hood:

library(marginaleffects)
fit <- glm(carb ~ hp * am, data = mtcars, family = poisson)

Unit level estimates of log ratio associated with a change of 1 in hp:

cmp <- comparisons(fit, variables = "hp", transform_pre = "lnratio")
cmp
# 
#  Term Contrast Estimate Std. Error   z Pr(>|z|)   2.5 % 97.5 %
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0054     0.0027 2.0    0.047 0.00007 0.0107
#    hp       +1   0.0054     0.0027 2.0    0.047 0.00007 0.0107
# --- 22 rows omitted. See ?avg_comparisons and ?print.marginaleffects --- 
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
#    hp       +1   0.0056     0.0016 3.6   <0.001 0.00252 0.0086
# Prediction type:  response 
# Columns: rowid, type, term, contrast, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo, carb, hp, am

This is equivalent to:

# prediction grids with 1 unit difference
lo <- transform(mtcars, hp = hp - .5)
hi <- transform(mtcars, hp = hp + .5)

# predictions on response scale
y_lo <- predict(fit, newdata = lo, type = "response")
y_hi <- predict(fit, newdata = hi, type = "response")

# log ratio
lnratio <- log(y_hi / y_lo)

# equivalent to `comparisons()`
all(cmp$estimate == lnratio)
# [1] TRUE

Now we take the strata specific means, with mean() inside log():

by(data.frame(am = lo$am, y_lo, y_hi),
   mtcars$am,
   FUN = \(x) log(mean(x$y_hi) / mean(x$y_lo)))
# mtcars$am: 0
# [1] 0.005364414
# ------------------------------------------------------------ 
# mtcars$am: 1
# [1] 0.005566092

Same as:

avg_comparisons(fit, variables = "hp", by = "am", transform_pre = "lnratio") |>
    print(digits = 7)
# 
#  Term Contrast am    Estimate  Std. Error        z   Pr(>|z|)        2.5 %
#    hp mean(+1)  0 0.005364414 0.002701531 1.985694 0.04706726 6.951172e-05
#    hp mean(+1)  1 0.005566092 0.001553855 3.582118    < 0.001 2.520592e-03
#       97.5 %
#  0.010659317
#  0.008611592
# 
# Prediction type:  response 
# Columns: type, term, contrast, am, estimate, std.error, statistic, p.value, conf.low, conf.high, predicted, predicted_hi, predicted_lo

See the list of transformation functions here: https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations

The only thing is that by applies the function within stratas.