Python: handling large numbers

109 Views Asked by At

I need to count perplexity and I try to do it with

def get_perplexity(test_set, model):
    perplexity = 1
    n = 0
    for word in test_set:
        n += 1
        perplexity = perplexity * 1 / get_prob(model, word)
    perplexity = pow(perplexity, 1/float(n))
    return perplexity

And after some steps my perplexity is equal to infinity. I need to get number and as last step to do pow(perplexity, 1/float(n))

Is any to multiply numbers like and get result?

3.887311155784627e+243
8.311806360146177e+250
1.7707049372801292e+263
1.690802669602979e+271
3.843294667766984e+278
5.954424789834101e+290
8.859529887856071e+295
7.649470766862909e+306
2

There are 2 best solutions below

4
On

The repeated multiplication is going to cause some tricky numerical instability as the results of your multiplications require more and more bits to represent. I propose you translate this into log-space and use summation rather than multiplication:

import math

def get_perplexity(test_set, model):
    log_perplexity = 0
    n = 0
    for word in test_set:
        n += 1
        log_perplexity -= math.log(get_prob(model, word))
    log_perplexity /= float(n)
    return math.exp(log_perplexity)

This way all your logarithms can be represented in the standard number of bits, and you don't get numerical blowups and loss of precision. Also, you can introduce an arbitrary degree of precision by using the decimal module:

import decimal

def get_perplexity(test_set, model):
    with decimal.localcontext() as ctx:
        ctx.prec = 100  # set as appropriate
        log_perplexity = decimal.Decimal(0)
        n = 0
        for word in test_set:
            n += 1
            log_perplexity -= decimal.Decimal(get_prob(model, word))).ln()
        log_perplexity /= float(n)
        return log_perplexity.exp()
0
On

since e+306 is just 10^306 you can make class of two parts

class BigPowerFloat:
    POWER_STEP = 10**100
    def __init__(self, input_value):
        self.value = float(input_value)
        self.power = 0

    def _move_to_power(self):
        while self.value > self.POWER_STEP:
            self.value = self.value / self.POWER_STEP
            self.power += self.POWER_STEP
        # you can add similar for negative values           


    def __mul__(self, other):
        self.value *= other
        self._move_to_power()

    # TODO other __calls for /, +, - ...

    def __str__(self):
        pass
        # make your cust to str