So I am familiar with more basic tree search algorithms like game search w/ minimax, but I've been trying to learn more about the Monte-Carlo Tree Search algorithm, and was wondering how it deals with 'precise lines.'
In the context of chess, where you might be in a position where you have 30 losing moves but 1 winning line, how would the MTCS Algorithm, more specifically the UCB1 function deal with this? The way I understand UCB1 is that it essentially does a sort of average over its child nodes so the UCB1 value of a line of chess where you have 30 losing moves and one winning one should be deceptively low?
I'm still learning about MCTS but I've always had this question and was hoping someone could explain how MCTS still converges to minimax even if a UCB1 value might be very low.
Any knowledge would be appreciated! Thanks
From the UCT formula w_i/n_i + c*sqrt(ln(N)/n_i) we can see that the exploration term is proportional to the inverse of the square root of child visits, n_i. What this means in that the child node with the best win rate will be heavily favored and therefore will have far more visits. So the UCT score of the parent will be an average heavily weighted toward the win rate of best child node.
This effect will propagate back up the tree, leading to the best line having the most visits and an accurate win rate at each node. In this way MCTS converges to the minimax result as the number of simulations increases.
For a more theoretical discussion, see the main result of Bandit based Monte-Carlo Planning: