On-Line Parameter Tuning for Monte-Carlo Tree Search in General Game Playing

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 818)


Many enhancements have been proposed for Monte-Carlo Tree Search (MCTS). Some of them have been applied successfully in the context of General Game Playing (GGP). MCTS and its enhancements are usually controlled by multiple parameters that require extensive and time-consuming computation to be tuned in advance. Moreover, in GGP optimal parameter values may vary depending on the considered game. This paper proposes a method to automatically tune search-control parameters on-line for GGP. This method considers the tuning problem as a Combinatorial Multi-Armed Bandit (CMAB). Four strategies designed to deal with CMABs are evaluated for this particular problem. Experiments show that on-line tuning in GGP almost reaches the same performance as off-line tuning. It can be considered as a valid alternative for domains where off-line parameter tuning is costly or infeasible.


Monte Carlo Tree Search (MCTS) General Game Playing (GGP) Search Control Parameters Multi-armed Bandit (MAB) MCTS Simulation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is funded by the Netherlands Organisation for Scientific Research (NWO) in the framework of the project GoGeneral, grant number 612.001.121.


  1. 1.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)CrossRefzbMATHGoogle Scholar
  2. 2.
    Benbassat, A., Sipper, M.: EvoMCTS: A scalable approach for general game learning. IEEE Trans. Comput. Intell. AI Games 6(4), 382–394 (2014)CrossRefGoogle Scholar
  3. 3.
    Björnsson, Y., Finnsson, H.: CadiaPlayer: A simulation-based general game player. IEEE Trans. Comput. Intell. AI Games 1(1), 4–15 (2009)CrossRefGoogle Scholar
  4. 4.
    Björnsson, Y., Marsland, T.A.: Learning extension parameters in game-tree search. Inf. Sci. 154(3), 95–118 (2003)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Bouzy, B., Helmstetter, B.: Monte-carlo go developments. In: Van Den Herik, H.J., Iida, H., Heinz, E.A. (eds.) Advances in Computer Games. IFIP, vol. 135, pp. 159–174. Springer, Boston, MA (2004). CrossRefGoogle Scholar
  6. 6.
    Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)CrossRefGoogle Scholar
  7. 7.
    Brügmann, B.: Monte Carlo Go. Technical report, Max Planck Institute of Physics, München, Germany (1993)Google Scholar
  8. 8.
    Burke, E.K., Gendreau, M., Hyde, M., Kendall, G., Ochoa, G., Özcan, E., Qu, R.: Hyper-heuristics: A survey of the state of the art. J. Oper. Res. Soc. 64(12), 1695–1724 (2013)CrossRefGoogle Scholar
  9. 9.
    Cazenave, T.: Generalized rapid action value estimation. In: Proceedings of the 24th International Joint Conference on Artificial Intelligence, pp. 754–760. AAAI Press (2015)Google Scholar
  10. 10.
    Chaslot, G.M.J.B., Winands, M.H.M., Szita, I., van den Herik, H.J.: Cross-entropy for Monte-Carlo tree search. ICGA J. 31(3), 145–156 (2008)Google Scholar
  11. 11.
    Chaslot, G.M.J.B., Winands, M.H.M., van den Herik, H.J., Uiterwijk, J.W.H.M., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. New Math. Nat. Comput. 4(3), 343–357 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Cole, N., Louis, S.J., Miles, C.: Using a genetic algorithm to tune first-person shooter bots. In: 2004 Congress on Evolutionary Computation (CEC2004), vol. 1, pp. 139–145. IEEE (2004)Google Scholar
  13. 13.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007). CrossRefGoogle Scholar
  14. 14.
    Coulom, R.: CLOP: Confident local optimization for noisy black-box parameter tuning. In: van den Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 146–157. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  15. 15.
    Finnsson, H., Björnsson, Y.: Simulation-based approach to general game playing. In: AAAI, vol. 8, pp. 259–264 (2008)Google Scholar
  16. 16.
    Finnsson, H., Björnsson, Y.: Learning simulation control in general game-playing agents. In: AAAI, vol. 10, pp. 954–959 (2010)Google Scholar
  17. 17.
    Fürnkranz, J.: Recent advances in machine learning and game playing. ÖGAI J. 26(2), 19–28 (2007)Google Scholar
  18. 18.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Proceedings of the 24th International Conference on Machine Learning, pp. 273–280. ACM (2007)Google Scholar
  19. 19.
    Karnin, Z., Koren, T., Somekh, O.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning, pp. 1238–1246 (2013)Google Scholar
  20. 20.
    Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced parameter optimization in games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T., Donkers, H.H.L.M.J. (eds.) ACG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006). CrossRefGoogle Scholar
  21. 21.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). CrossRefGoogle Scholar
  22. 22.
    Kunanusont, K., Gaina, R.D., Liu, J., Perez-Liebana, D., Lucas, S.M.: The N-tuple bandit evolutionary algorithm for automatic game improvement. In: 2017 Congress on Evolutionary Computation, pp. 2201–2208. IEEE (2017)Google Scholar
  23. 23.
    Levine, J., Congdon, C.B., Ebner, M., Kendall, G., Lucas, S.M., Miikkulainen, R., Schaul, T., Thompson, T.: General video game playing. In: Artificial and Computational Intelligence in Games. Dagstuhl Follow-up, vol. 6, pp. 77–83 (2013)Google Scholar
  24. 24.
    Lucas, S.M., Samothrakis, S., Pérez, D.: Fast evolutionary adaptation for Monte Carlo tree search. In: Esparcia-Alcázar, A.I., Mora, A.M. (eds.) EvoApplications 2014. LNCS, vol. 8602, pp. 349–360. Springer, Heidelberg (2014). Google Scholar
  25. 25.
    Mendes, A., Togelius, J., Nealen, A.: Hyper-heuristic general video game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 94–101. IEEE (2016)Google Scholar
  26. 26.
    Nijssen, J.P.A.M., Winands, M.H.M.: Enhancements for multi-player Monte-Carlo tree search. In: van den Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2010. LNCS, vol. 6515, pp. 238–249. Springer, Heidelberg (2011). CrossRefGoogle Scholar
  27. 27.
    Ontanón, S.: The combinatorial multi-armed bandit problem and its application to real-time strategy games. In: Proceedings of the Ninth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, pp. 58–64. AAAI Press (2013)Google Scholar
  28. 28.
    Ontanón, S.: Combinatorial multi-armed bandits for real-time strategy games. J. Artif. Intell. Res. 58, 665–702 (2017)MathSciNetzbMATHGoogle Scholar
  29. 29.
    Perez, D., Samothrakis, S., Lucas, S.: Knowledge-based fast evolutionary MCTS for general video game playing. In: 2014 IEEE Conference on Computational Intelligence and Games (CIG), pp. 68–75. IEEE (2014)Google Scholar
  30. 30.
    Roelofs, G.J.: Action Space Representation in Combinatorial Multi-Armed Bandits. Master’s thesis, Department of Knowledge Engineering, Maastricht University, Maastricht, The Netherlands (2015)Google Scholar
  31. 31.
    Schreiber, S.: Games - base repository (2017).
  32. 32.
    Schreiber, S., Landau, A.: The General Game Playing base package (2017).
  33. 33.
    Shleyfman, A., Komenda, A., Domshlak, C.: On combinatorial actions and CMABs with linear side information. In: Proceedings of the Twenty-first European Conference on Artificial Intelligence, pp. 825–830. IOS Press (2014)Google Scholar
  34. 34.
    Sironi, C.F., Winands, M.H.M.: Comparison of rapid action value estimation variants for general game playing. In: 2016 IEEE Conference on Computational Intelligence and Games (CIG), pp. 309–316. IEEE (2016)Google Scholar
  35. 35.
    Świechowski, M., Mańdziuk, J.: Self-adaptation of playing strategies in general game playing. IEEE Trans. Comput. Intell. AI Games 6(4), 367–381 (2014)CrossRefzbMATHGoogle Scholar
  36. 36.
    Tak, M.J.W., Winands, M.H.M., Björnsson, Y.: N-grams and the last-good-reply policy applied in general game playing. IEEE Trans. Comput. Intell. AI Games 4(2), 73–83 (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Games and AI Group, Department of Data Science and Knowledge EngineeringMaastricht UniversityMaastrichtThe Netherlands

Personalised recommendations