Finding 'a' value of zipf distribution

236 Views Asked by At

I found this python function that generates a zipf distribution based on an 'a' value and a 'size' value, where size is analogous to total number of elements in a frequency table: https://numpy.org/doc/stable/reference/random/generated/numpy.random.zipf.html

Now, let's say, I run this function for 'a' = 1.6 and size = '30'. I use python's dictionary data structure to store my frequency table and this is what it looks like:

    dictionary = {1:16, 2:5, 3:2, 4:1, 12:1, 13:1, 16:1, 65:1, 152:1, 531:1}

The keys represents elements 1,2,3,4,12,13,16,65,152,531 and the values represent their respective frequencies.

Is there a way to know an 'a' value based on looking at the dictionary? What I am asking is, say there is a dictionary like the one that I wrote above. It was generated from some a value. I don't know what the a value is but I know what the dictionary contains. Based on the frequencies of the elements of the dictionary, is there a way to calculate an 'a' value? Like a formula, for example?

[EDIT]

Here is something I have tried. Using KL divergence, I calculate a value that is generated using 2 consecutive elements. The formula is

    (frequency of ith element) * ((log2(frequency of ith element)/log2(frequency of (i+1)th element))

I apply this formula for any two consecutive elements and find the total sum at the end. I divide this total sum by the total frequency of the dictionary and get an 'a' value. However, this 'a' value never matches the original 'a' value.

Thank you!

1

There are 1 best solutions below

3
On

Using Tim Robert's hint:

from math import log

a = log(dictionary[1] / dictionary[2], 2)  # 1.68 for the given example