Advertisement

A Rollout-Based Search Algorithm Unifying MCTS and Alpha-Beta

  • Hendrik BaierEmail author
Conference paper
  • 555 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 705)

Abstract

Monte Carlo Tree Search (MCTS) has been found to be a weaker player than minimax in some tactical domains, partly due to its highly selective focus only on the most promising moves. In order to combine the strategic strength of MCTS and the tactical strength of minimax, MCTS-minimax hybrids have been introduced in prior work, embedding shallow minimax searches into the MCTS framework. This paper continues this line of research by integrating MCTS and minimax even more tightly into one rollout-based hybrid search algorithm, MCTS- \(\alpha \beta \). The hybrid is able to execute two types of rollouts: MCTS rollouts and alpha-beta rollouts, i.e. rollouts implementing minimax with alpha-beta pruning and iterative deepening. During the search, all nodes accumulate both MCTS value estimates as well as alpha-beta value bounds. The two types of information are combined in a given tree node whenever alpha-beta completes a deepening iteration rooted in that node—by increasing the MCTS value estimates for the best move found by alpha-beta. A single parameter, the probability of executing MCTS rollouts vs. alpha-beta rollouts, makes it possible for the hybrid to subsume both MCTS as well as alpha-beta search as extreme cases, while allowing for a spectrum of new search algorithms in between.

Preliminary results in the game of Breakthrough show the proposed hybrid to outperform its special cases of alpha-beta and MCTS. These results are promising for the further development of rollout-based algorithms that unify MCTS and minimax approaches.

Keywords

Selection Policy Heuristic Evaluation Simulation Phase Search Depth Default Policy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgment

The author thanks the Games and AI group, Department of Data Science and Knowledge Engineering, Maastricht University, for computational support.

References

  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefzbMATHGoogle Scholar
  2. 2.
    Baier, H.: Monte-Carlo Tree Search Enhancements for One-Player and Two-Player Domains. Ph.D. thesis, Maastricht University, Maastricht, The Netherlands (2015)Google Scholar
  3. 3.
    Baier, H., Winands, M.H.M.: Monte-Carlo tree search and minimax hybrids with heuristic evaluation functions. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 45–63. Springer, Cham (2014). doi: 10.1007/978-3-319-14923-3_4 CrossRefGoogle Scholar
  4. 4.
    Baier, H., Winands, M.H.M.: MCTS-minimax hybrids. IEEE Trans. Comput. Intell. AI Games 7(2), 167–179 (2015)CrossRefGoogle Scholar
  5. 5.
    Browne, C., Powley, E.J., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)CrossRefGoogle Scholar
  6. 6.
    Chen, J., Wu, I., Tseng, W., Lin, B., Chang, C.: Job-level alpha-beta search. IEEE Trans. Comput. Intell. AI Games 7(1), 28–38 (2015)CrossRefGoogle Scholar
  7. 7.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-75538-8_7 CrossRefGoogle Scholar
  8. 8.
    Finnsson, H., Björnsson, Y.: Game-tree properties and MCTS performance. In: IJCAI 2011 Workshop on General Intelligence in Game Playing Agents (GIGA 2011), pp. 23–30 (2011)Google Scholar
  9. 9.
    Huang, B.: Pruning game tree by rollouts. In: Bonet, B., Koenig, S. (eds.) Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI 2015, pp. 1165–1173. AAAI Press (2015)Google Scholar
  10. 10.
    Knuth, D.E., Moore, R.W.: An analysis of alpha-beta pruning. Artif. Intell. 6(4), 293–326 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). doi: 10.1007/11871842_29 CrossRefGoogle Scholar
  12. 12.
    Lanctot, M., Winands, M.H.M., Pepels, T., Sturtevant, N.R.: Monte Carlo tree search with heuristic evaluations using implicit minimax backups. In: 2014 IEEE Conference on Computational Intelligence and Games, CIG 2014, pp. 341–348. IEEE (2014)Google Scholar
  13. 13.
    Lorentz, R.: Early Playout Termination in MCTS. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 12–19. Springer, Cham (2015). doi: 10.1007/978-3-319-27992-3_2 CrossRefGoogle Scholar
  14. 14.
    Nijssen, J.P.A.M., Winands, M.H.M.: Playout search for Monte-Carlo tree search in multi-player games. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 72–83. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31866-5_7 CrossRefGoogle Scholar
  15. 15.
    Plaat, A., Schaeffer, J., Pijls, W., de Bruin, A.: Best-first fixed-depth minimax algorithms. Artif. Intell. 87(1), 255–293 (1996)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: Brafman, R.I., Geffner, H., Hoffmann, J., Kautz, H.A. (eds.) 20th International Conference on Automated Planning and Scheduling, ICAPS 2010, pp. 242–245. AAAI (2010)Google Scholar
  17. 17.
    Ramanujan, R., Sabharwal, A., Selman, B.: Understanding sampling style adversarial search methods. In: Grünwald, P., Spirtes, P. (eds.) 26th Conference on Uncertainty in Artificial Intelligence, UAI 2010, pp. 474–483 (2010)Google Scholar
  18. 18.
    Ramanujan, R., Selman, B.: Trade-offs in sampling-based adversarial planning. In: Bacchus, F., Domshlak, C., Edelkamp, S., Helmert, M. (eds.) 21st International Conference on Automated Planning and Scheduling, ICAPS 2011. AAAI (2011)Google Scholar
  19. 19.
    Weinstein, A., Littman, M.L., Goschin, S.: Rollout-based game-tree search outprunes traditional alpha-beta. In: Deisenroth, M.P., Szepesvári, C., Peters, J. (eds.) JMLR Proceedings Tenth European Workshop on Reinforcement Learning, EWRL 2012, vol. 24, pp. 155–167 (2012)Google Scholar
  20. 20.
    Winands, M.H.M., Björnsson, Y.: Alpha-beta-based play-outs in Monte-Carlo tree search. In: Cho, S.B., Lucas, S.M., Hingston, P. (eds.) 2011 IEEE Conference on Computational Intelligence and Games, CIG 2011, pp. 110–117. IEEE (2011)Google Scholar
  21. 21.
    Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte-Carlo tree search solver. In: van den Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 25–36. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87608-3_3 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Advanced Concepts Team European Space AgencyNoordwijkThe Netherlands

Personalised recommendations