How to programmatically calculate a discrete probabilities

705 Views Asked by At

I am using EnumeratedIntegerDistribution to generated samples from my set of keys.

How to programmatically calculate a 'discrete probabilities' array. for example I might want an approximate 'normal' distribution or a Zipf's distribution.

    int[] keys = keyDomain(domainMin, domainMax);
    double[] discreteProbabilities = new double[] { ?, ?, ?, ?, .... };

    EnumeratedIntegerDistribution distribution = new EnumeratedIntegerDistribution(keys, discreteProbabilities);

    int numSamples = 100;
    int[] samples = distribution.sample(numSamples);
1

There are 1 best solutions below

1
On

As long as your distribution is truly discrete and defined over the integers in your range (e.g. Poisson distribution), there is no problem in assigning your discreteProbabilities[] array as long as you have some kind of formula you can compute for the probabilities of each integer value in your range, and then since you are restricting to the range, you divide the assigned probabilities by their sum so that you get a true distribution over your range, i.e. sum = 1.

However if your distribution is "continuous", i.e. samples can be any floating point/real number value either within a range or not, then things are more complicated. You have to decide how to convert this distribution to a distribution over the integers in your range. One way is to simply evaluate the probability density function (e.g. essentially exp(-x^2/2) for normal distribution) at your integer values, and then divide by the sum over your integer range. However that might not be very realistic if you are assuming e.g. that you are rounding a sample to the nearest integer value to get your sampled integer value. If you want to do that, then you should calculate the integral of the continuous probability density (say, with numeric integration if you don't have a formula for the anti-derivative), where the integral is between n-0.5 and n+0.5 for each integer n in your range. Then this is your probability value for the integer n, and similar to before, you divide by the sum over your integer range so that your probabilities add up to 1.