Skip to main content
Log in

Can Monte-Carlo Tree Search learn to sacrifice?

  • Published:
Journal of Heuristics Aims and scope Submit manuscript

Abstract

One of the most basic activities performed by an intelligent agent is deciding what to do next. The decision is usually about selecting the move with the highest expectation, or exploring new scenarios. Monte-Carlo Tree Search (MCTS), which was developed as a game playing agent, deals with this exploration–exploitation ‘dilemma’ using a multi-armed bandits strategy. The success of MCTS in a wide range of problems, such as combinatorial optimisation, reinforcement learning, and games, is due to its ability to rapidly evaluate problem states without requiring domain-specific knowledge. However, it has been acknowledged that the trade-off between exploration and exploitation is crucial for the performance of the algorithm, and affects the efficiency of the agent in learning deceptive states. One type of deception is states that give immediate rewards, but lead to a suboptimal solution in the long run. These states are known as trap states, and have been thoroughly investigated in previous research. In this work, we study the opposite of trap states, known as sacrifice states, which are deceptive moves that result in a local loss but are globally optimal, and investigate the efficiency of MCTS enhancements in identifying this type of moves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.bbc.com/news/technology-35420579.

References

  • Arneson, B., Ryan, H., Henderson, P.: Mohex wins hex tournament. ICGA J. 32(2), 114 (2009)

    Article  Google Scholar 

  • Baba, S., Joe, Y., Iwasaki, A., Yokoo, M.: Real-time solving of quantified csps based on monte-carlo game tree search. In: International Joint Conferences on Artificial Intelligence (IJCAI), pp. 1–10 (2011)

  • Bjornsson, Y., Finnsson, H.: Cadiaplayer: A simulation-based general game player. In: IEEE Transactions on Computational Intelligence and AI in Games, pp. 4–15 (2009)

  • Blum, C., Dorigo, M.: Deception in ant colony optimization. In Ant Colony Optimization and Swarm Intelligence, pp. 118–129. Springer, (2004)

  • Bouzy, B.: Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. In: Joint Conference on Information Sciences, pp. 505–508 (2003)

  • Bouzy, B.: Associating shallow and selective global tree search with monte carlo for 9\(\times \) 9 go. In: Computers and Games, pp. 67–80. Springer (2006)

  • Bravi, I., Khalifa, A., Holmgård, C., Togelius, J.: Evolving uct alternatives for general video game playing. In: The IJCAI-16 Workshop on General Game Playing, p. 63 (2016)

  • Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)

    Article  Google Scholar 

  • Cazenave, T., Balbo, F., Pinson, S.: Using a monte-carlo approach for bus regulation. In: International IEEE Conference on Intelligent Transportation Systems, pp. 1–6. IEEE (2009)

  • Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games, vol. 1. Cambridge University Press, Cambridge (2006)

    Book  MATH  Google Scholar 

  • Chaslot, G., Bakkes, S., Szita, I., Spronck, P.: Monte-Carlo tree search: a new framework for game AI. In: Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 216–217. The AAAI Press (2008)

  • Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: Computers and Games, pp. 72–83. Springer (2007)

  • Enzenberger, M., Muller, M., Arneson, B., Segal, R.: FUEGO-an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Trans. Comput. Intell. AI Games 2(4), 259–270 (2010)

    Article  Google Scholar 

  • Enzenberger, M.: Muller, Martin, Arneson, Broderick, Segal, Richard: Fuego-an open-source framework for board games and go engine based on monte carlo tree search. IEEE Trans. Comput. Intell. AIGames 2(4), 259–270 (2010)

    Article  Google Scholar 

  • Finnsson, H., Björnsson, Y.: Game-tree properties and MCTS performance. In: International Joint Conferences on Artificial Intelligence, IJCAI, Workshop on General Game Playing (GIGA), pp. 23–30 (2011)

  • Frydenberg, F., Andersen, K.R., Risi, S., Togelius, J.: Investigating mcts modifications in general video game playing. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 107–113. IEEE (2015)

  • Gelly, S., Silver, D.: Monte-carlo tree search and rapid action value estimation in computer go. Artif. Intell. 175(11), 1856–1875 (2011)

    Article  MathSciNet  Google Scholar 

  • Greiner, R., Hayward, R., Jankowska, M., Molloy, M.: Finding optimal satisficing strategies for and-or trees. Artif. Intell. 170(1), 19–58 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Hayward, R.B., Arneson, B., Henderson, P.: Monte Carlo tree search in hex. IEEE Trans. Comput. Intell. AI Games 2(4), 251–258 (2010)

    Article  Google Scholar 

  • Helmbold, D.P., Parker-Wood, A.: All-moves-as-first heuristics in monte-carlo go. In: Arabnia, H.R., de la Fuente, D., Olivas, J.A. (eds.) IC-AI, pp. 605–610. CSREA Press (2009)

  • Horn, J., Goldberg, D.E.: Genetic algorithm difficulty and the modality of fitness landscapes. In: Whitley, L.D., Vose, M.D. (eds.) Foundations of genetic algorithms vol. 3, pp. 243–269. Morgan Kaufmann (1994)

  • Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Machine learning: ECML 2006, pp. 282–293. Springer (2006)

  • Kocsis, L., Szepesvári, C., Willemson, J.: Improved monte-carlo search. University of Tartu, Estonia, Techical Report 1, (2006)

  • Levinson, R.: General game-playing and reinforcement learning. Comput. Intell. 12(1), 155–176 (1996)

    Article  Google Scholar 

  • Mahlmann, T., Togelius, J., Yannakakis, G.N.: Towards procedural strategy game generation: evolving complementary unit types. In: Applications of Evolutionary Computation, pp. 93–102. Springer, (2011)

  • Park, H., Kim, K.-J.: Mcts with influence map for general video game playing. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 534–535. IEEE (2015)

  • Pell, B.: A strategic metagame player for general chess-like games. Comput. Intell. 12(1), 177–198 (1996)

    Article  Google Scholar 

  • Potvin, J.-Y., Bengio, S.: The vehicle routing problem with time windows part ii: genetic search. INFORMS J. Comput. 8(2), 165–172 (1996)

    Article  MATH  Google Scholar 

  • Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: The International Conference on Automated Planning and Scheduling (ICAPS), pp. 242–245 (2010a)

  • Ramanujan, R., Sabharwal, A., Selman, B.: Understanding sampling style adversarial search methods. In: Grünwald, P., Spirtes, P. (eds.) UAI, pp. 474–483. AUAI Press (2010b)

  • Richard, J.L.: Amazons discover Monte-Carlo. In: Proceedings of the International Conference on Computers and Games, CG ’08, pp. 13–24, Berlin, Heidelberg. Springer (2008)

  • Rimmel, A., Teytaud, F., Cazenave, T.: Optimization of the nested monte-carlo algorithm on the traveling salesman problem with time windows. In: Applications of Evolutionary Computation, pp. 501–510 (2011)

  • Sato, Y., Takahashi, D., Grimbergen, R.: A shogi program based on monte-carlo tree search. Int. Comput. Games Assoc. 33(2), 80–92 (2010)

    Google Scholar 

  • Shibahara, K., Yoshiyuki, K.: Combining final score with winning percentage by sigmoid function in monte-carlo simulations. In IEEE Symposium on Computational Intelligence and Games, pp. 183–190. IEEE (2008)

  • Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, P.: George, Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  • Szita, I., Chaslot, G., Spronck, P.: Monte-Carlo tree search in settlers of catan. In: Proceedings of the 12th International Conference on Advances in Computer Games, ACG’09, pp. 21–32, Berlin, Heidelberg. Springer (2010)

  • Tesauro, G., Rajan, V.T., Segal, R.: Bayesian inference in monte-carlo tree search. In: Grünwald, P., Spirtes, P. (eds.) Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 580–588. AUAI Press (2010)

  • Teytaud, F. Teytaud, O.: On the huge benefit of decisive moves in monte-carlo tree search algorithms. In: Yannakakis, G.N., Togelius, J. (eds.) IEEE Conference on Computational Intelligence and Games, pp. 359–364. IEEE (2010)

  • Tsai, C.-T., Liaw, C., Huang, H.-C., Ko, C.-H.: An evolutionary strategy for a computer team game. Comput. Intell. 27(2), 218–234 (2011)

    Article  MathSciNet  Google Scholar 

  • Weinstein, A., Mansley, C., Littman, M.: Sample-based planning for continuous action markov decision processes. In: Bacchus, F. Domshlak, C., Edelkamp, S., Helmert, M. (eds.) Proceedings of the International Conference on Automated Planning and Scheduling, ICAPS, pp. 335–338. AAAI (2011)

  • Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte carlo tree search in lines of action. IEEE Trans. Comput. Intell. AI Games 2(4), 239–250 (2010)

    Article  Google Scholar 

  • Xie, F., Liu, Z.: Backpropagation modification in monte-carlo game tree search. In: International Symposium on Intelligent Information Technology Application, vol. 2, pp. 125–128. IEEE, (2009)

  • Xin, B., Chen, J., Pan, F.: Problem difficulty analysis for particle swarm optimization: deception and modality. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 623–630. ACM (2009)

Download references

Acknowledgments

This research was supported under Australian Research Council’s Discovery Projects funding scheme, Project Number DE 140100017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aldeida Aleti.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Companez, N., Aleti, A. Can Monte-Carlo Tree Search learn to sacrifice?. J Heuristics 22, 783–813 (2016). https://doi.org/10.1007/s10732-016-9320-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10732-016-9320-y

Keywords

Navigation