I have a list of multisets, generated from a list of candidate value, which should match a fixed amount (summing each others).
For example, given those values:
elem = [4, 16]
And a target amount:
tN = 64
Heres are the possible outcome:
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
4 4 4 4 4 4 4 4 4 4 4 4 16
4 4 4 4 4 4 4 4 16 16
4 4 4 4 16 16 16
16 16 16 16
Now, I set some probabily on each value:
prob = [0.2, 0.9] // prob of 4 and 16
I need to select randomly a multiset which "tends" to contain more the values due to their prob amount.
In the example above, 16 16 16 16 should have "more" probability to be taken rather than 4 4 4 4 4 4 4 4 4 4 4 4 16 (since 16 has 0.9 prob, which is higher than 0.2, and that multiset is more "16" oriented).
As well 4 4 4 4 4 4 4 4 16 16 rather than 4 4 4 4 4 4 4 4 4 4 4 4 16, and so on (so rarely it should take 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4).
Which is the right algo to random pick it due to its value's prob?
It should of course works with any kind of elem and tN (so for example elem = [1, 4, 8, 16] and prob = [0.15, 0.56, 0.9, 0.7].
The only idea is: for each multiset, group and count each item, than multiply for its prop, ceil the value, and add to a list that multiset (for each value).
So:
16 16 16 16: 4 (4 time 16) * 0.9 and 0 (0 time 4) * 0.2
4 4 4 4 16 16 16: 3 (3 time 16) * 0.9 and 4 (4 time 4) * 0.2
But it seems to be not very "proportional" in case of lots item repetitions...
So to be clear, the problem is that we are given a list of elements
elem, a targettN, and want to pick a multiset out of the elements such that a random valueelem[i]shows up in the output at proportionprob[i]of the time.The random procedure will look like randomly picking elements from
elemwhere theith element has probabilityq[i]of being chosen. If we wind up exceeding the target, we'll throw our attempt away and try again. We'd like to pickqsuch that the proportions in the final answer come out toprob.Given
q, we can figure out the proportions in the final answer with dynamic programming. Ifqis[0.2, 0.8]andelem = [2, 5], the table you'd need to fill in starts like this:where each row is the probability
pof having arrived atiduring an attempt, andexpectedgives the expected number of times we expect to use that element in arriving at that point. So, for example, one of the rows is:Which means:
4.0.04.2's used in arriving at4is0.08. (Probability0.04of using 22s, comes out to0.08.)5's used in arriving at4is0.0. (Probability0.04of using 05s, comes out to0.0.)The rule is that for the
kth element, you can try each value ofelem, and say, "If I arrived atk, I must have previously been atk-elem[i]."The contribution to
table[k]['p']is nowtable[k-elem[i]]['p']*q[i]. (Probability of arriving atk-elem[i]times probability of choosingelem[i]next.)For
jnot equal toithe contribution totable[k]['expected'][j]istable[k-elem[i]]['expected'][j] * q[i].The contribution to
table[k]['expected'][i]is(1 + table[k-elem[i]]['expected'][i]) * q[i].This is sufficient to fill in the table.
OK, now how do we find
q?Well, we want the following to be true:
Let's let
f(q)be the sum of squares of the errors in these equations. We can minimizef(subject to the elements ofqbeing between0and1). If we getfclose to 0, then we've got a pretty good answer to your original problem.We can minimize
fwith gradient descent. If you're not familiar with it, I described it pretty well in a previous answer.This will not always work. For example in the list I used, every way to arrive at
tN = 9needs 22s and a5. You can't get any other ratio.But if it IS possible to get the desired ratios, we should figure out how to.