How to select a multiset from a list due to the probability amount of its value?

57 Views Asked by At

I have a list of multisets, generated from a list of candidate value, which should match a fixed amount (summing each others).

For example, given those values:

elem = [4, 16]

And a target amount:

tN = 64

Heres are the possible outcome:

4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 16
4 4 4 4 4 4 4 4 16 16
4 4 4 4 16 16 16
16 16 16 16

Now, I set some probabily on each value:

prob = [0.2, 0.9] // prob of 4 and 16

I need to select randomly a multiset which "tends" to contain more the values due to their prob amount.

In the example above, 16 16 16 16 should have "more" probability to be taken rather than 4 4 4 4 4 4 4 4 4 4 4 4 16 (since 16 has 0.9 prob, which is higher than 0.2, and that multiset is more "16" oriented).

As well 4 4 4 4 4 4 4 4 16 16 rather than 4 4 4 4 4 4 4 4 4 4 4 4 16, and so on (so rarely it should take 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4).

Which is the right algo to random pick it due to its value's prob?

It should of course works with any kind of elem and tN (so for example elem = [1, 4, 8, 16] and prob = [0.15, 0.56, 0.9, 0.7].

The only idea is: for each multiset, group and count each item, than multiply for its prop, ceil the value, and add to a list that multiset (for each value).

So:

16 16 16 16: 4 (4 time 16) * 0.9 and 0 (0 time 4) * 0.2
4 4 4 4 16 16 16: 3 (3 time 16) * 0.9 and 4 (4 time 4) * 0.2

But it seems to be not very "proportional" in case of lots item repetitions...

1

There are 1 best solutions below

3
btilly On

So to be clear, the problem is that we are given a list of elements elem, a target tN, and want to pick a multiset out of the elements such that a random value elem[i] shows up in the output at proportion prob[i] of the time.

The random procedure will look like randomly picking elements from elem where the ith element has probability q[i] of being chosen. If we wind up exceeding the target, we'll throw our attempt away and try again. We'd like to pick q such that the proportions in the final answer come out to prob.

Given q, we can figure out the proportions in the final answer with dynamic programming. If q is [0.2, 0.8] and elem = [2, 5], the table you'd need to fill in starts like this:

table = [
    {'i': 0, 'p': 1, 'expected': [0.0, 0.0]},
    {'i': 1, 'p': 0, 'expected': [0.0, 0.0]},
    {'i': 2, 'p': 0.2, 'expected': [0.2, 0.0]},
    {'i': 3, 'p': 0, 'expected': [0.0, 0.0]},
    {'i': 4, 'p': 0.04, 'expected': [0.08, 0.0]},
    {'i': 5, 'p': 0.8, 'expected': [0.0, 0.8]},
    {'i': 6, 'p': 0.008, 'expected': [0.024, 0.0]},
    {'i': 7, 'p': 0.32, 'expected': [0.32, 0.32]},
    ...
]

where each row is the probability p of having arrived at i during an attempt, and expected gives the expected number of times we expect to use that element in arriving at that point. So, for example, one of the rows is:

    {'i': 4, 'p': 0.04, 'expected': [0.08, 0.0]},

Which means:

  • We might at some point have a sum of 4.
  • The probability we did was 0.04.
  • The expected number of 2's used in arriving at 4 is 0.08. (Probability 0.04 of using 2 2s, comes out to 0.08.)
  • The expected number of 5's used in arriving at 4 is 0.0. (Probability 0.04 of using 0 5s, comes out to 0.0.)

The rule is that for the kth element, you can try each value of elem, and say, "If I arrived at k, I must have previously been at k-elem[i]."

The contribution to table[k]['p'] is now table[k-elem[i]]['p']*q[i]. (Probability of arriving at k-elem[i] times probability of choosing elem[i] next.)

For j not equal to i the contribution to table[k]['expected'][j] is table[k-elem[i]]['expected'][j] * q[i].

The contribution to table[k]['expected'][i] is (1 + table[k-elem[i]]['expected'][i]) * q[i].

This is sufficient to fill in the table.


OK, now how do we find q?

Well, we want the following to be true:

sum of q = 1.0
for each i:
    prob[i] = table[tN]['expected'][i] / (sum of table[tN]['expected'])

Let's let f(q) be the sum of squares of the errors in these equations. We can minimize f (subject to the elements of q being between 0 and 1). If we get f close to 0, then we've got a pretty good answer to your original problem.

We can minimize f with gradient descent. If you're not familiar with it, I described it pretty well in a previous answer.

This will not always work. For example in the list I used, every way to arrive at tN = 9 needs 2 2s and a 5. You can't get any other ratio.

But if it IS possible to get the desired ratios, we should figure out how to.