The last step in MCTS (after doing the rollouts) is to pick the 'best' node. As I see it, there are two options:
- Pick the child node that was visited most often
- Pick the child node with the highest average reward
My intuition was to do the latter, but all recources I've read use the first option.
I understand that a high visit count reflects that the algorithm has found the node in question to be 'promising'. So are the two options completely equivalent then? Could you give me an example where the two options would lead to a different choice?
I've tried: thinking and drawing trees.
I expected: to come to the conclusion that one of the options was superior.
What happened: I actually think they're equivalent and I don't understand why the (arguably) less intuitive option is always chosen.