In MCTS, is choosing the node with most visits equivalent to choosing the node with the highest expected value?

60 Views Asked by At

The last step in MCTS (after doing the rollouts) is to pick the 'best' node. As I see it, there are two options:

  1. Pick the child node that was visited most often
  2. Pick the child node with the highest average reward

My intuition was to do the latter, but all recources I've read use the first option.

I understand that a high visit count reflects that the algorithm has found the node in question to be 'promising'. So are the two options completely equivalent then? Could you give me an example where the two options would lead to a different choice?

I've tried: thinking and drawing trees.
I expected: to come to the conclusion that one of the options was superior.
What happened: I actually think they're equivalent and I don't understand why the (arguably) less intuitive option is always chosen.

0

There are 0 best solutions below