Nested Rollout Policy Adaptation with Selective Policies

  • Tristan CazenaveEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 705)


Monte Carlo Tree Search (MCTS) is a general search algorithm that has improved the state of the art for multiple games and optimization problems. Nested Rollout Policy Adaptation (NRPA) is an MCTS variant that has found record-breaking solutions for puzzles and optimization problems. It learns a playout policy online that dynamically adapts the playouts to the problem at hand. We propose to enhance NRPA using more selectivity in the playouts. The idea is applied to three different problems: Bus regulation, SameGame and Weak Schur numbers. We improve on standard NRPA for all three problems.


Selective Policy Monte Carlo Tree Search (MCTS) Playout Policy MCTS Variant Nested Monte-Carlo Search (NMCS) 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Baier, H., Winands, M.H.M.: Nested Monte-Carlo tree search for online planning in large MDPs. In: ECAI 2012–20th European Conference on Artificial Intelligence, pp. 109–114. IOS press (2012)Google Scholar
  2. 2.
    Bouzy, B.: Monte-Carlo fork search for cooperative path-finding. In: Cazenave, T., Winands, M.H.M., Iida, H. (eds.) CGW 2013. CCIS, vol. 408, pp. 1–15. Springer, Cham (2014). doi: 10.1007/978-3-319-05428-5_1 CrossRefGoogle Scholar
  3. 3.
    Bouzy, B.: An abstract procedure to compute Weak Schur number lower bounds. Technical report 2, LIPADE, Université Paris Descartes (2015)Google Scholar
  4. 4.
    Bouzy, B.: An experimental investigation on the pancake problem. In: Cazenave, T., Winands, M.H.M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds.) CGW/GIGA -2015. CCIS, vol. 614, pp. 30–43. Springer, Cham (2016). doi: 10.1007/978-3-319-39402-2_3 Google Scholar
  5. 5.
    Breuker, D.M.: Memory versus search in games. Ph.D. thesis, Universiteit Maastricht, Maastricht, The Netherlands (1998)Google Scholar
  6. 6.
    Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)CrossRefGoogle Scholar
  7. 7.
    Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)Google Scholar
  8. 8.
    Cazenave, T.: Playout policy adaptation with move features. Theor. Comput. Sci. 644, 43–52 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Cazenave, T., Balbo, F., Pinson, S.: Monte-Carlo bus regulation. In: ITSC, pp. 340–345. St. Louis (2009)Google Scholar
  10. 10.
    Cazenave, T., Teytaud, F.: Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 42–54. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-34413-8_4 CrossRefGoogle Scholar
  11. 11.
    Cazenave, T., Teytaud, F.: Beam nested rollout policy adaptation. In: Computer Games Workshop, CGW 2012, ECAI 2012, pp. 1–12 (2012)Google Scholar
  12. 12.
    Coulom, R.: Computing Elo ratings of move patterns in the game of Go. ICGA J. 30(4), 198–208 (2007)Google Scholar
  13. 13.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-75538-8_7 CrossRefGoogle Scholar
  14. 14.
    Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: 2013 IEEE Symposium on Computational Intelligence in Scheduling (SCIS), pp. 44–51. IEEE (2013)Google Scholar
  15. 15.
    Edelkamp, S., Gath, M., Greulich, C., Humann, M., Herzog, O., Lawo, M.: Monte-Carlo tree search for logistics. In: Clausen, U., Friedrich, H., Thaller, C., Geiger, C. (eds.) Commercial Transport. LNL, pp. 427–440. Springer, Cham (2016). doi: 10.1007/978-3-319-21266-1_28 CrossRefGoogle Scholar
  16. 16.
    Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS (LNAI), vol. 8736, pp. 285–296. Springer, Cham (2014). doi: 10.1007/978-3-319-11206-0_28 Google Scholar
  17. 17.
    Edelkamp, S., Greulich, C.: Solving physical traveling salesman problems with policy adaptation. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2014)Google Scholar
  18. 18.
    Edelkamp, S., Tang, Z.: Monte-carlo tree search for the multiple sequence alignment problem. In: Eighth Annual Symposium on Combinatorial Search (2015)Google Scholar
  19. 19.
    Eliahou, S., Fonlupt, C., Fromentin, J., Marion-Poty, V., Robilliard, D., Teytaud, F.: Investigating Monte-Carlo methods on the Weak Schur problem. In: Middendorf, M., Blum, C. (eds.) EvoCOP 2013. LNCS, vol. 7832, pp. 191–201. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-37198-1_17 CrossRefGoogle Scholar
  20. 20.
    Graf, T., Platzner, M.: Adaptive playouts in Monte-Carlo tree search with policy-gradient reinforcement learning. In: Plaat, A., Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 1–11. Springer, Cham (2015). doi: 10.1007/978-3-319-27992-3_1 CrossRefGoogle Scholar
  21. 21.
    Hauer, B., Hayward, R., Kondrak, G.: Solving substitution ciphers with combined language models. In: COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, August 23–29, 2014, Dublin, Ireland, pp. 2314–2325 (2014)Google Scholar
  22. 22.
    Huang, S., Arneson, B., Hayward, R.B., Müller, M., Pawlewicz, J.: MoHex 2.0: a pattern-based MCTS hex player. In: Computers and Games - 8th International Conference, CG 2013, Yokohama, Japan, August 13–15, 2013, Revised Selected Papers, pp. 60–71 (2014)Google Scholar
  23. 23.
    Kinny, D.: A new approach to the snake-in-the-box problem. In: ECAI, vol. 242, pp. 462–467 (2012)Google Scholar
  24. 24.
    Klein, S.: Attacking SameGame using Monte-Carlo tree search: using randomness as guidance in puzzles. Master’s thesis, KTH Royal Institute of Technology, Stockholm, Sweden (2015)Google Scholar
  25. 25.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). doi: 10.1007/11871842_29 CrossRefGoogle Scholar
  26. 26.
    Lucas, S.M., Samothrakis, S., Pérez, D.: Fast evolutionary adaptation for Monte Carlo tree search. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 349–360. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-45523-4_29 Google Scholar
  27. 27.
    Rimmel, A., Teytaud, F., Cazenave, T.: Optimization of the nested Monte-Carlo algorithm on the traveling salesman problem with time windows. In: Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6625, pp. 501–510. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-20520-0_51 CrossRefGoogle Scholar
  28. 28.
    Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI, pp. 649–654 (2011)Google Scholar
  29. 29.
    Schadd, M.P.D., Winands, M.H.M., Tak, M.J.W., Uiterwijk, J.W.H.M.: Single-player Monte-Carlo tree search for SameGame. Knowl. Based Syst. 34, 3–11 (2012)CrossRefGoogle Scholar
  30. 30.
    Schadd, M.P.D., Winands, M.H.M., Herik, H.J., Chaslot, G.M.J.-B., Uiterwijk, J.W.H.M.: Single-player Monte-Carlo tree search. In: Herik, H.J., Xu, X., Ma, Z., Winands, M.H.M. (eds.) CG 2008. LNCS, vol. 5131, pp. 1–12. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-87608-3_1 CrossRefGoogle Scholar
  31. 31.
    Zobrist, A.L.: A new hashing method with application for game playing. ICCA J. 13(2), 69–73 (1990)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.PSL-Université Paris-Dauphine, LAMSADE CNRS UMR 7243ParisFrance

Personalised recommendations