A Rollout-Based Search Algorithm Unifying MCTS and Alpha-Beta
- 555 Downloads
Monte Carlo Tree Search (MCTS) has been found to be a weaker player than minimax in some tactical domains, partly due to its highly selective focus only on the most promising moves. In order to combine the strategic strength of MCTS and the tactical strength of minimax, MCTS-minimax hybrids have been introduced in prior work, embedding shallow minimax searches into the MCTS framework. This paper continues this line of research by integrating MCTS and minimax even more tightly into one rollout-based hybrid search algorithm, MCTS- \(\alpha \beta \). The hybrid is able to execute two types of rollouts: MCTS rollouts and alpha-beta rollouts, i.e. rollouts implementing minimax with alpha-beta pruning and iterative deepening. During the search, all nodes accumulate both MCTS value estimates as well as alpha-beta value bounds. The two types of information are combined in a given tree node whenever alpha-beta completes a deepening iteration rooted in that node—by increasing the MCTS value estimates for the best move found by alpha-beta. A single parameter, the probability of executing MCTS rollouts vs. alpha-beta rollouts, makes it possible for the hybrid to subsume both MCTS as well as alpha-beta search as extreme cases, while allowing for a spectrum of new search algorithms in between.
Preliminary results in the game of Breakthrough show the proposed hybrid to outperform its special cases of alpha-beta and MCTS. These results are promising for the further development of rollout-based algorithms that unify MCTS and minimax approaches.
KeywordsSelection Policy Heuristic Evaluation Simulation Phase Search Depth Default Policy
The author thanks the Games and AI group, Department of Data Science and Knowledge Engineering, Maastricht University, for computational support.
- 2.Baier, H.: Monte-Carlo Tree Search Enhancements for One-Player and Two-Player Domains. Ph.D. thesis, Maastricht University, Maastricht, The Netherlands (2015)Google Scholar
- 8.Finnsson, H., Björnsson, Y.: Game-tree properties and MCTS performance. In: IJCAI 2011 Workshop on General Intelligence in Game Playing Agents (GIGA 2011), pp. 23–30 (2011)Google Scholar
- 9.Huang, B.: Pruning game tree by rollouts. In: Bonet, B., Koenig, S. (eds.) Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 1165–1173. AAAI Press (2015)Google Scholar
- 12.Lanctot, M., Winands, M.H.M., Pepels, T., Sturtevant, N.R.: Monte Carlo tree search with heuristic evaluations using implicit minimax backups. In: 2014 IEEE Conference on Computational Intelligence and Games, CIG 2014, pp. 341–348. IEEE (2014)Google Scholar
- 16.Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: Brafman, R.I., Geffner, H., Hoffmann, J., Kautz, H.A. (eds.) 20th International Conference on Automated Planning and Scheduling, ICAPS 2010, pp. 242–245. AAAI (2010)Google Scholar
- 17.Ramanujan, R., Sabharwal, A., Selman, B.: Understanding sampling style adversarial search methods. In: Grünwald, P., Spirtes, P. (eds.) 26th Conference on Uncertainty in Artificial Intelligence, UAI 2010, pp. 474–483 (2010)Google Scholar
- 18.Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Bacchus, F., Domshlak, C., Edelkamp, S., Helmert, M. (eds.) 21st International Conference on Automated Planning and Scheduling, ICAPS 2011. AAAI (2011)Google Scholar
- 19.Weinstein, A., Littman, M.L., Goschin, S.: Rollout-based game-tree search outprunes traditional alpha-beta. In: Deisenroth, M.P., Szepesvári, C., Peters, J. (eds.) JMLR Proceedings Tenth European Workshop on Reinforcement Learning, EWRL 2012, vol. 24, pp. 155–167 (2012)Google Scholar
- 20.Winands, M.H.M., Björnsson, Y.: Alpha-beta-based play-outs in Monte-Carlo tree search. In: Cho, S.B., Lucas, S.M., Hingston, P. (eds.) 2011 IEEE Conference on Computational Intelligence and Games, CIG 2011, pp. 110–117. IEEE (2011)Google Scholar