I'm analyzing data from an AB test we just finished running. Our outcome is binary, y
, and we have stratified results by a third variable, g
.
Because the intervention could vary by g
, I've fit a Poisson regression with robust covariance estimation as follows
library(tidyverse)
library(sandwich)
library(marginaleffects)
fit <- glm(y ~ treatment * g, data=model_data, family=poisson, offset=log(n_users))
From here, I'd like to know the strata specific causal risk ration (which we usually call "lift" in industry). My approach is to use avg_comparisons
as follows
avg_comparisons(fit,
variables = 'treatment',
newdata = model_data,
transform_pre = 'lnratioavg',
transform_post = exp,
by=c('g'),
vcov = 'HC')
The result seems to be consistent with calculations of the lift when I filter the data by groups in g
.
Question
By passing by=c('g')
, am I actually calculating the strata specific risk ratios as I suspect? Is there any hidden "gotchas" or things I have failed to consider?
I can provide data and a minimal working example if need be.
Here’s a very simple base
R
example to show what is happening under-the-hood:Unit level estimates of log ratio associated with a change of 1 in
hp
:This is equivalent to:
Now we take the strata specific means, with
mean()
insidelog()
:Same as:
See the list of transformation functions here: https://vincentarelbundock.github.io/marginaleffects/reference/comparisons.html#transformations
The only thing is that
by
applies the function within stratas.