Trouble calculating with large binomial coefficients

318 Views Asked by At

Intro.

I'm trying to plot a hypergeometric distribution with IPython. The probability function of the distribution contains three binomial coefficients.

Since the values I will be putting in the coefficients are very large eg. 1e28, I've decided to make my own function to calculate binomial coefficients where I use the second order of Stirling's approximation.

As the binomial coefficients are too big to put in a variable and multiply directly, I decided to calculate their logarithms instead and added them to each other. To obtain the final probability I simply put the result in the exp function. As the probability is of relatively "normal" size (its maximum is at 2.7e24), there should be no more problems there... except there are.

Problem.

The 'result' I referred to, which is the log of the probability ended up way bigger than it should be with values ranging from -6.2e24 to 1.3e14. In comparison, the log of the mathematical maximum is roughly 56.

Another problem is that the curve on the plot is very jagged when zoomed in. Everything looks fine when zoomed out. The curve is smooth, its maximum is at the mean of the distribution:

Plot over the entire domain.

But when I zoom in on the mean where most of the peak of the probability function will be since the standard deviation is very small I get this:

Zoomed in on the mean.

The red line marks the average and the black lines are the average +/- the standard deviation. While it looks pretty it's not what I need which is a smooth curve.

Question.

Can someone explain why (1) the values are so big and (2) why the curve is jagged and how I can fix them?

Code.

import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

#Returns the log of "n choose k" calculated with 2nd order Stirling's approximation
def l_cb(n, k):
    if (k > n) or (n < 0) or (n < 0):
        print "Invalid values for the binomial coefficient:", "n =", n, ", k =", k, "."
        return 0.0
    if (k == n) or (k == 0) or (n == 0):
        return 0.0
    A = (n + 0.5) * np.log(n) - (k + 0.5) * np.log(k) - (n - k + 0.5) * np.log(n - k)
    B = np.log(1 + 1 / (12.0 * n)) - np.log(1 + 1 / (12.0 * k)) - np.log(1 + 1 / (12.0 * (n - k)))
    return - 1/2 * np.log(2 * np.pi) + A + B

K = 2.24e28
k = 2.24e27
N = 2.7e25

#Mathematical maximum of P. np.log(MAX) is about 56. l_P is way too big.
MAX = (k + 1) * (N + 1) / (K + 2)
#Mathematical average of n
AVG = N * k / K
#Mathematical standard deviation of P.
SD = np.sqrt(N * k * (K - N) * (K - k) / K ** 2 / (K - 1))

n = np.linspace(AVG - 50e12, AVG + 50e12, 1001)
l_P = np.zeros(len(n))

#Calculating log(P).
for i in xrange(len(l_P)):
    l_P[i] = l_cb(N, n[i]) + l_cb(K - N, k - n[i]) - l_cb(K, k)

#Marking AVG, AVG - SD, AVG + SD
y = np.linspace(-4e14, 5e14, len(n))
x_AVG = np.ones(len(n))
x_SD_L = np.ones(len(n)) - SD / AVG
x_SD_R = np.ones(len(n)) + SD / AVG

plt.plot(n / AVG, l_P, x_AVG, y, 'r', x_SD_L, y, 'k', x_SD_R, y, 'k')
plt.xlabel('n / AVG')
plt.ylabel('log(P)')
0

There are 0 best solutions below