I am running a binomial test and I cannot understand why these two methods have different results. The probs from the second to the first are different. When we calculate the two-tailed p-value should we just double one of the tails?
from scipy.stats import binom
n, p = 50, 0.4
prob = binom.cdf(2, n, p)
first = 2*prob
from scipy import stats
second = stats.binom_test(2, n, p, alternative='two-sided')
No, because the binomial distribution is not, in general, symmetric. The one case where your calculation would work is
p = 0.5
.Here's a visualization for the two-sided binomial test. For this demonstration, I'll use
n=14
instead ofn=50
to make the plot clearer.The dashed line is drawn at the height of
binom.pmf(2, n, p)
. The probabilities that contribute to the two-sided binomial testbinom_test(2, n, p, alternative='two-sided')
are those that are less than or equal to this value. In this example, we can see that the values ofk
where this is true are [0, 1, 2] (which is the left tail) and [10, 11, 12, 13, 14] (which is the right tail). The p-value of the two-sided binomial test should be the sum of these probabilities. And that is, in fact, what we find:Note that
scipy.stats.binom_test
is deprecated. Users of SciPy 1.7.0 or later should usescipy.stats.binomtest
instead:Here's the script to generate the plot: