Sample size calculation for AB testing - Non binomial ratio metric

158 Views Asked by At

I'm currently working on a sample size calculation for an upcoming AB test related to our mobile app. Up until now, I've been dealing with binomial metrics, such as the conversion rate, which is calculated as the number of customers who make a purchase divided by the number of customers who visit the page.

To perform these calculations, I've been solving for nobs1 using the tt_ind_solve_power function from statsmodels.stats.power. Additionally, I use proportion_effectsize from statsmodels.stats.proportion to determine the effect size when comparing the metric's value to the metric plus the effect I want to test.

However, my new experiment involves a ratio metric that isn't binomial. Specifically, I'm looking at the ratio of the number of daily order issues to the number of daily orders. It's important to note that a single order can be associated with 0, 1, or more order issues.

While researching, I came across a formula for calculating the sample size of a ratio metric (resulting in n * 2):

tau = (num_mean**2)/(denom_mean**2)*(num_var/(num_mean**2) + denom_var/(denom_mean**2)-2*covar/(num_mean*denom_mean))
z_alpha = norm.ppf(1-alpha/2)
z_power = norm.ppf(power)
baseline_ratio = num_mean/denom_mean
mde = baseline_ratio*relative_mde
n = math.ceil((2*tau*(z_alpha+z_power)**2)/(mde**2))

Now, here's the challenge: the results obtained using this formula are significantly different from what I would expect if I applied the "binomial formula" to this non-binomial case.

However, I suspect that my metric, despite being non-binomial, might behave similarly to a binomial metric in many cases. For instance, in many situations, one order doesn't generate more than one incident, so an order can have either 0 or 1 incident.

So, I'm a bit confused about what's the correct approach here.

Does anyone have any suggestions or insights?

Initially, I experimented with using tt_ind_solve_power for the non-binomial case as well and tried to find the correct effect size formula for this scenario. I've even attempted to use Cohen's d (mean difference divided by variance), but the results still appear to be erratic.

Any guidance or advice would be greatly appreciated.

0

There are 0 best solutions below