I'm working on a problem for one of my classes. The problem is this: a person starts with $0 and rolls an N-sided dice (N could range from 1 to 30) and wins money according to the dice side they roll. X sides (ones) of the N-sided die result in the person losing all their money (current balance) and the game ending; for instance, if the die is [0,0,0,1,1,1,1], a person would receive $1 if they roll 1, $2 if they roll 2, or $3 if they roll 3, but they would lose everything if they rolled 4,5,6, OR 7.
What is the expected value for this N-sided dice problem? I tried value iteration but can't seem to get it right.
So for this dice [1,1,1,0,0,0,0], our first state (1 roll) expected value is 1/7*(4)+1/7*(5)+1/7*(6)+1/7*(7) = 3.1428
For the value iteration, next we have to calculate the value of state 4 (balance=$4), state 5 (balance=$5), state 6 (balance=$6), state 7 (balance=$7)
V(s) = Max_actions [Sum_probabilities[R(s)+V(s']]
V(4) = Max($4 {quit the game}, 1/7*(4+4)+1/7*(4+5)+1/7*(4+6)+1/7*(4+7) {keep playing}) -> 5.428
V(5) = Max($5 {quit the game}, 1/7*(5+4)+1/7*(5+5)+1/7*(5+6)+1/7*(5+7){keep playing}) -> 6
V(6) = Max($6 {quit the game}, 1/7*(6+4)+1/7*(6+5)+1/7*(6+6)+1/7*(6+7){keep playing}) -> 6.57
V(7) = Max($7 {quit the game}, 1/7*(7+4)+1/7*(7+5)+1/7*(7+6)+1/7*(7+7){keep playing}) -> 7.14
Now these V(4), V(5), V(6), and V(7)'s will branch out to their next states. So V(4) will become V(8), V(9), V(10), V(11), so on and so forth.
V(8) ($8 current< $7.74 expected), V(9) ($9 current <$8.28 expected), V(10)($10 current < $8.85 expected), V(11)($11 current<$9.42 expected), V(12)($12 current<$10 expected), V(13)($11 current<$10.57 expected), V(14)($14 current <$11.14 expected).
So, that suggests that V(8), V(9), V(10), V(11), V(12), V(13), V(14) are terminal states --> V(4), V(5), V(6), V(7) do not need to be changed.
Finally, we re-calculate the value of V(0) because the values of V(4), V(5), V(6), and V(7) were changed --> V(0) = 1/7* V(4)+1/7* V(5)+1/7* V(6)+1/7* V(7) => 3.59 ... This is the final expected reward for this game.
Does this make sense? I'm not looking for code to solve the problem, just some advice on whether this approach is correct.
Thanks
Edited based on comments below to make the post more concise.
Yes, your approach is complex, but essentially correct. The expected winnings are
3 + 29/49 = 3.591836734693877551...
In general keep rolling if expected winnings exceed expected losses.
Expected losses if you have
y
money arey * X / N
.Expected winnings are
avg(value of dice roll)
.I would suggest using a dynamic programming approach for efficiency.