Advertisement

Memorizing the Playout Policy

  • Tristan Cazenave
  • Eustache Diemert
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 818)

Abstract

Monte Carlo Tree Search (MCTS) is the state-of-the-art algorithm for General Game Playing (GGP). Playout Policy Adaptation with move Features (PPAF) is a state-of-the-art MCTS algorithm that learns a playout policy online. We propose a simple modification to PPAF consisting in memorizing the learned policy from one move to the next. We test PPAF with memorization (PPAFM) against PPAF and UCT for Atarigo, Breakthrough, Misere Breakthrough, Domineering, Misere Domineering, Knightthrough, Misere Knightthrough and Nogo.

References

  1. 1.
    Browne, C., Powley, E., Whitehouse, D., Lucas, S., Cowling, P., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)CrossRefGoogle Scholar
  2. 2.
    Cazenave, T.: Nested Monte-Carlo search. In: Boutilier, C. (ed.) IJCAI, pp. 456–461 (2009)Google Scholar
  3. 3.
    Cazenave, T.: Generalized rapid action value estimation. In: Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, IJCAI 2015, Buenos Aires, Argentina, 25–31 July 2015, pp. 754–760 (2015)Google Scholar
  4. 4.
    Cazenave, T.: Playout policy adaptation for games. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 20–28. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-27992-3_3 CrossRefGoogle Scholar
  5. 5.
    Cazenave, T.: Sequential halving applied to trees. IEEE Trans. Comput. Intell. AI Games 7(1), 102–105 (2015)CrossRefGoogle Scholar
  6. 6.
    Cazenave, T.: Playout policy adaptation with move features. Theor. Comput. Sci. 644, 43–52 (2016)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Cazenave, T., Teytaud, F.: Application of the nested rollout policy adaptation algorithm to the traveling salesman problem with time windows. In: Hamadi, Y., Schoenauer, M. (eds.) LION 2012. LNCS, pp. 42–54. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-34413-8_4 CrossRefGoogle Scholar
  8. 8.
    Chou, C.-W., Teytaud, O., Yen, S.-J.: Revisiting Monte-Carlo tree search on a normal form game: NoGo. In: Di Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6624, pp. 73–82. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20525-5_8 CrossRefGoogle Scholar
  9. 9.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-75538-8_7 CrossRefGoogle Scholar
  10. 10.
    Coulom, R.: Computing elo ratings of move patterns in the game of Go. ICGA J. 30(4), 198–208 (2007)Google Scholar
  11. 11.
    Edelkamp, S., Gath, M., Cazenave, T., Teytaud, F.: Algorithm and knowledge engineering for the TSPTW problem. In: 2013 IEEE Symposium on Computational Intelligence in Scheduling (SCIS), pp. 44–51. IEEE (2013)Google Scholar
  12. 12.
    Edelkamp, S., Gath, M., Greulich, C., Humann, M., Herzog, O., Lawo, M.: Monte-Carlo tree search for logistics. In: Clausen, U., Friedrich, H., Thaller, C., Geiger, C. (eds.) Commercial Transport. LNL, pp. 427–440. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-21266-1_28 CrossRefGoogle Scholar
  13. 13.
    Edelkamp, S., Gath, M., Rohde, M.: Monte-Carlo tree search for 3D packing with object orientation. In: Lutz, C., Thielscher, M. (eds.) KI 2014. LNCS (LNAI), vol. 8736, pp. 285–296. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-11206-0_28 Google Scholar
  14. 14.
    Edelkamp, S., Greulich, C.: Solving physical traveling salesman problems with policy adaptation. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2014)Google Scholar
  15. 15.
    Edelkamp, S., Tang, Z.: Monte-carlo tree search for the multiple sequence alignment problem. In: Eighth Annual Symposium on Combinatorial Search (2015)Google Scholar
  16. 16.
    Enzenberger, M., Muller, M., Arneson, B., Segal, R.: Fuego - an open-source framework for board games and Go engine based on Monte Carlo tree search. IEEE Trans. Comput. Intell. AI Games 2(4), 259–270 (2010)CrossRefGoogle Scholar
  17. 17.
    Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: AAAI, pp. 259–264 (2008)Google Scholar
  18. 18.
    Finnsson, H., Björnsson, Y.: Learning simulation control in general game-playing agents. In: AAAI, pp. 954–959 (2010)Google Scholar
  19. 19.
    Finnsson, H., Björnsson, Y.: Cadiaplayer: Search-control techniques. KI-Künstliche Intelligenz 25(1), 9–16 (2011)CrossRefGoogle Scholar
  20. 20.
    Gardner, M.: Mathematical games. Sci. Am. 230, 106–108 (1974)CrossRefGoogle Scholar
  21. 21.
    Gelly, S., Silver, D.: Monte-Carlo tree search and rapid action value estimation in computer Go. Artif. Intell. 175(11), 1856–1875 (2011)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Genesereth, M.R., Love, N., Pell, B.: General game playing: overview of the AAAI competition. AI Mag. 26(2), 62–72 (2005)Google Scholar
  23. 23.
    Graf, T., Platzner, M.: Adaptive playouts in Monte-Carlo tree search with policy-gradient reinforcement learning. In: Plaat, A., van den Herik, J., Kosters, W. (eds.) ACG 2015. LNCS, vol. 9525, pp. 1–11. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-27992-3_1 CrossRefGoogle Scholar
  24. 24.
    Graf, T., Platzner, M.: Adaptive playouts for online learning of policies during Monte Carlo tree search. Theor. Comput. Sci. 644, 53–62 (2016)MathSciNetCrossRefMATHGoogle Scholar
  25. 25.
    Huang, S.-C., Arneson, B., Hayward, R.B., Müller, M., Pawlewicz, J.: MoHex 2.0: a pattern-based MCTS hex player. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 60–71. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09165-5_6 Google Scholar
  26. 26.
    Huang, S.-C., Coulom, R., Lin, S.-S.: Monte-Carlo simulation balancing in practice. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 81–92. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-17928-0_8 CrossRefGoogle Scholar
  27. 27.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006).  https://doi.org/10.1007/11871842_29 CrossRefGoogle Scholar
  28. 28.
    Lorentz, R., Horey, T.: Programming breakthrough. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 49–59. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-09165-5_5 Google Scholar
  29. 29.
    Méhat, J., Cazenave, T.: A parallel general game player. KI 25(1), 43–47 (2011)Google Scholar
  30. 30.
    Perez, D., Samothrakis, S., Lucas, S.: Knowledge-based fast evolutionary MCTS for general video game playing. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 1–8. IEEE (2014)Google Scholar
  31. 31.
    Pitrat, J.: Realization of a general game-playing program. In: IFIP Congress (2), pp. 1570–1574 (1968)Google Scholar
  32. 32.
    Rimmel, A., Teytaud, F., Cazenave, T.: Optimization of the nested Monte-Carlo algorithm on the traveling salesman problem with time windows. In: Di Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6625, pp. 501–510. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-20520-0_51 CrossRefGoogle Scholar
  33. 33.
    Rimmel, A., Teytaud, F., Teytaud, O.: Biasing Monte-Carlo simulations through RAVE values. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 59–68. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-17928-0_6 CrossRefGoogle Scholar
  34. 34.
    Rosin, C.D.: Nested rollout policy adaptation for Monte Carlo tree search. In: IJCAI, pp. 649–654 (2011)Google Scholar
  35. 35.
    Saffidine, A., Jouandeau, N., Cazenave, T.: Solving Breakthrough with race patterns and job-level proof number search. In: ACG, pp. 196–207 (2011)Google Scholar
  36. 36.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  37. 37.
    Swiechowski, M., Mandziuk, J.: Self-adaptation of playing strategies in general game playing. IEEE Trans. Comput. Intell. AI Games 6(4), 367–381 (2014)CrossRefMATHGoogle Scholar
  38. 38.
    Tak, M.J.W., Winands, M.H.M., Björnsson, Y.: N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)CrossRefGoogle Scholar
  39. 39.
    Trutman, M., Schiffel, S.: Creating action heuristics for general game playing agents. In: Cazenave, T., Winands, M.H.M., Edelkamp, S., Schiffel, S., Thielscher, M., Togelius, J. (eds.) CGW/GIGA 2015. CCIS, vol. 614, pp. 149–164. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39402-2_11 Google Scholar
  40. 40.
    Uiterwijk, J.W.H.M.: Perfectly solving domineering boards. In: Cazenave, T., Winands, M.H.M., Iida, H. (eds.) CGW 2013. CCIS, vol. 408, pp. 97–121. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-05428-5_8 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Université Paris-Dauphine, PSL Research University, CNRS, LAMSADEParisFrance
  2. 2.CRITEOGrenobleFrance

Personalised recommendations