I've created a small AI program which can play Othello. The algorithmn I use (MCTS UTC) has a parameter where I can tune the exploration vs exploitation ratio. This is a single float value ranging from 0 to 10 (infinity is possible but high values don't make a lot of sense)
I can easily let the algorithm play versus itself with different values of this parameter. This would give me an idea which of the two values is better.
What is a good algorithm to optimize this parameter?
(I prefer an algorithm that has some research or publications to go indepth as to why or when it work best.)
Consider something on the order of a genetic algorithm where the program plays with itself and the winners ratio is kept, and varied a little. Keep track of the values. Over time it may converge to a 'best' balance.