When should a monto carlo tree search be reset?

106 Views Asked by At

I am trying to build/understand exactly how the monte carlo tree search algorithm (mcts) works in conjunction with a neural network to learn how to play games (like chess). But I'm having trouble understanding when to reset the tree.

I have looked at https://github.com/suragnair/alpha-zero-general as an example. But one thing that doesn't make sense to me about this implementation is that it resets the tree after each individual game which doesn't seem right to me. (So a new tree is created everytime a new game is made). I thought the idea of mcts was to accumulate knowledge over a lot of games and only reset the tree once you have trained your network to predict new probabilities for each boardstate?

Is this me misunderstanding mcts or is this a bug in that particular implementation?

1

There are 1 best solutions below

2
On

I'll assume you're talking about AlphaZero since that's what the repo you linked to does, but MCTS is a more general term.

Keeping old search trees around forever isn't really practical, it would take too much memory and most of them wouldn't end up being used ever again (since games generated later in the training process hopefully look very different from the early ones). The whole point is that we distill the knowledge we gained through tree search into a neural network, which we then use in future games to do an even better tree search.

As a small optimization it still makes makes sense to either keep the tree around for the next move, or to instead temporarily cache neural network outputs. I assume this is what you saw in the repo you linked.