python - KL divergence on numpy arrays with different lengths

Question

python - KL divergence on numpy arrays with different lengths

7.9k Views Asked by Iolkos At 12 June 2025 at 03:54

I am using the SciPy implementation of KL-divergence ([http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.entropy.html]) for two different numpy arrays.

The first one , lets say "base_freq" has a standard length of 2000 The second one, "test_freq" length can take different values depending on the sample. So let's say that its length is 8000.

How can I compute the KL divergence when these two do not have the same length???

My thought was to break down the second array ("test_freq") to a number of arrays with 2000 length. But how is this done?? And what happens when the "test_freq" gets a sample of 250 length ?

Original Q&A

There are 4 best solutions below

**rjonnal** · Answer 1

I should preface by saying that I'm no information theory expert. For the one application in which I used KL-divergence, I was comparing two images pixel-wise to compute the number of bits lost. If the images had different sizes, your proposed approach would require that for each pixel in the smaller image I choose the corresponding pixel in the larger--not any old pixel. My understanding was that KL-divergence only makes sense if you're comparing two signals sampled the same way (i.e. the same temporal or spatial sampling interval).

If you want to do what you propose, you may use numpy.random.choice:

import numpy as np

def uneven_kl_divergence(pk,qk):
    if len(pk)>len(qk):
        pk = np.random.choice(pk,len(qk))
    elif len(qk)>len(pk):
        qk = np.random.choice(qk,len(pk))
    return np.sum(pk * np.log(pk/qk))

**ProteinGuy** · Answer 2

I suggest you randomly sample very large populations from both arrays and compute the KL divergence of the large populations. This way, you overcome the problem of unequal sizes as well as the variation that would arise if you drew samples of smaller sizes. Alternatively, you could draw samples of size max(test_size, base_size) many times and take the average of the KL divergence (following the central limit theorem).

base = ... #array of size 2000
test = ... #array of size 8000

# Approach 1
base2 = np.random.choice(base, replace=True, size=100000)
test2 = np.random.choice(large, replace=True, size=100000)
kld = compute_KLD(base2, test2)

# Approach 2
klds = []
size = max(len(base), len(test))
for i in range(1000):
    base2 = np.random.choice(base, replace=True, size=size)
    test2 = np.random.choice(test, replace=True, size=size)
    kld = compute_KLD(base2, test2)
    klds.append(kld)
kld = np.mean(klds)

**Lev Novitskiy** · Answer 3

In some articles I've seen people calculating histograms of arrays, before calculating the KL divergence

a = torch.tensor(np.histogram(np.array([0.1, 0.2, 0.3]))[-1])
b = torch.tensor(np.histogram(np.array([0.2, 0.4]))[-1])
loss = nn.KLDivLoss()
dist = loss(a, b)

**Rabbit** · Answer 4

Disclaimer: I am not a Statistics expert.

KL-Divergence is measure between probability distributions. That means you have to make sure the inputs for your entropy function are two valid probability distributions from the same sample space.

In your case, you have a finite number of possible values, so you have a discrete random variable. This also means that each outcome of your variable can be measured as a frequency of occurrences over a number of trials.

Let me give you a simple example. Let's say your random variable represents a non-perfect dice that has 6 possible outcomes (6 sides). You throw the dice 100 times.

Imagine that you got the following drawing distribution:

1: 10 times
2: 12 times
3: 08 times
4: 30 times
5: 20 times
6: 20 times

Since each outcome (side) happened a number of times, you just have to divide each outcome count by 100. This is your frequency, which is also you probability.

So we now have:

P(side=1) = 10/100 = .10
P(side=2) = 12/100 = .12
P(side=3) = 08/100 = .08
P(side=4) = 30/100 = .30
P(side=5) = 20/100 = .20
P(side=6) = 20/100 = .20

And finally, here is your probability distribution:

[.10, .12, .08, .30, .20, .20]

Notice it sums up to 1, as it is expected of a probability distribution.

If you do a second experiment and come up with a different probability distribution, it will still have 6 probabilities, even if your number of trials this time is not 100.

This is all to say that it does not make sense to compare two probability distributions from different sample spaces. If you have a way of converting from a sample space to another, it would be possible. However, be sure you probability distributions are representations from the same sample space. It does not make sense to compare probabilities of a 6-sided dice and a 8-sided dice, because they don't represent the same thing.

python - KL divergence on numpy arrays with different lengths

There are 4 best solutions below

Related Questions in PYTHON

Related Questions in ARRAYS

Related Questions in NUMPY

Related Questions in SCIPY

Trending Questions

Popular # Hahtags

Popular Questions