Can Monte-Carlo Tree Search learn to sacrifice?

Companez, Nathan; Aleti, Aldeida

doi:10.1007/s10732-016-9320-y

Can Monte-Carlo Tree Search learn to sacrifice?

Published: 13 October 2016

Volume 22, pages 783–813, (2016)
Cite this article

Journal of Heuristics Aims and scope Submit manuscript

577 Accesses
5 Citations
Explore all metrics

Abstract

One of the most basic activities performed by an intelligent agent is deciding what to do next. The decision is usually about selecting the move with the highest expectation, or exploring new scenarios. Monte-Carlo Tree Search (MCTS), which was developed as a game playing agent, deals with this exploration–exploitation ‘dilemma’ using a multi-armed bandits strategy. The success of MCTS in a wide range of problems, such as combinatorial optimisation, reinforcement learning, and games, is due to its ability to rapidly evaluate problem states without requiring domain-specific knowledge. However, it has been acknowledged that the trade-off between exploration and exploitation is crucial for the performance of the algorithm, and affects the efficiency of the agent in learning deceptive states. One type of deception is states that give immediate rewards, but lead to a suboptimal solution in the long run. These states are known as trap states, and have been thoroughly investigated in previous research. In this work, we study the opposite of trap states, known as sacrifice states, which are deceptive moves that result in a local loss but are globally optimal, and investigate the efficiency of MCTS enhancements in identifying this type of moves.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Monte-Carlo Tree Search for Multi-agent Pathfinding: Preliminary Results

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Notes

http://www.bbc.com/news/technology-35420579.

References

Arneson, B., Ryan, H., Henderson, P.: Mohex wins hex tournament. ICGA J. 32(2), 114 (2009)
Article Google Scholar
Baba, S., Joe, Y., Iwasaki, A., Yokoo, M.: Real-time solving of quantified csps based on monte-carlo game tree search. In: International Joint Conferences on Artificial Intelligence (IJCAI), pp. 1–10 (2011)
Bjornsson, Y., Finnsson, H.: Cadiaplayer: A simulation-based general game player. In: IEEE Transactions on Computational Intelligence and AI in Games, pp. 4–15 (2009)
Blum, C., Dorigo, M.: Deception in ant colony optimization. In Ant Colony Optimization and Swarm Intelligence, pp. 118–129. Springer, (2004)
Bouzy, B.: Associating domain-dependent knowledge and Monte Carlo approaches within a Go program. In: Joint Conference on Information Sciences, pp. 505–508 (2003)
Bouzy, B.: Associating shallow and selective global tree search with monte carlo for 9\(\times \) 9 go. In: Computers and Games, pp. 67–80. Springer (2006)
Bravi, I., Khalifa, A., Holmgård, C., Togelius, J.: Evolving uct alternatives for general video game playing. In: The IJCAI-16 Workshop on General Game Playing, p. 63 (2016)
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Cazenave, T., Balbo, F., Pinson, S.: Using a monte-carlo approach for bus regulation. In: International IEEE Conference on Intelligent Transportation Systems, pp. 1–6. IEEE (2009)
Cesa-Bianchi, N., Lugosi, G.: Prediction, Learning, and Games, vol. 1. Cambridge University Press, Cambridge (2006)
Book MATH Google Scholar
Chaslot, G., Bakkes, S., Szita, I., Spronck, P.: Monte-Carlo tree search: a new framework for game AI. In: Proceedings of the Artificial Intelligence and Interactive Digital Entertainment Conference, pp. 216–217. The AAAI Press (2008)
Coulom, R.: Efficient selectivity and backup operators in monte-carlo tree search. In: Computers and Games, pp. 72–83. Springer (2007)
Enzenberger, M., Muller, M., Arneson, B., Segal, R.: FUEGO-an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Trans. Comput. Intell. AI Games 2(4), 259–270 (2010)
Article Google Scholar
Enzenberger, M.: Muller, Martin, Arneson, Broderick, Segal, Richard: Fuego-an open-source framework for board games and go engine based on monte carlo tree search. IEEE Trans. Comput. Intell. AIGames 2(4), 259–270 (2010)
Article Google Scholar
Finnsson, H., Björnsson, Y.: Game-tree properties and MCTS performance. In: International Joint Conferences on Artificial Intelligence, IJCAI, Workshop on General Game Playing (GIGA), pp. 23–30 (2011)
Frydenberg, F., Andersen, K.R., Risi, S., Togelius, J.: Investigating mcts modifications in general video game playing. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 107–113. IEEE (2015)
Gelly, S., Silver, D.: Monte-carlo tree search and rapid action value estimation in computer go. Artif. Intell. 175(11), 1856–1875 (2011)
Article MathSciNet Google Scholar
Greiner, R., Hayward, R., Jankowska, M., Molloy, M.: Finding optimal satisficing strategies for and-or trees. Artif. Intell. 170(1), 19–58 (2006)
Article MathSciNet MATH Google Scholar
Hayward, R.B., Arneson, B., Henderson, P.: Monte Carlo tree search in hex. IEEE Trans. Comput. Intell. AI Games 2(4), 251–258 (2010)
Article Google Scholar
Helmbold, D.P., Parker-Wood, A.: All-moves-as-first heuristics in monte-carlo go. In: Arabnia, H.R., de la Fuente, D., Olivas, J.A. (eds.) IC-AI, pp. 605–610. CSREA Press (2009)
Horn, J., Goldberg, D.E.: Genetic algorithm difficulty and the modality of fitness landscapes. In: Whitley, L.D., Vose, M.D. (eds.) Foundations of genetic algorithms vol. 3, pp. 243–269. Morgan Kaufmann (1994)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Machine learning: ECML 2006, pp. 282–293. Springer (2006)
Kocsis, L., Szepesvári, C., Willemson, J.: Improved monte-carlo search. University of Tartu, Estonia, Techical Report 1, (2006)
Levinson, R.: General game-playing and reinforcement learning. Comput. Intell. 12(1), 155–176 (1996)
Article Google Scholar
Mahlmann, T., Togelius, J., Yannakakis, G.N.: Towards procedural strategy game generation: evolving complementary unit types. In: Applications of Evolutionary Computation, pp. 93–102. Springer, (2011)
Park, H., Kim, K.-J.: Mcts with influence map for general video game playing. In: 2015 IEEE Conference on Computational Intelligence and Games (CIG), pp. 534–535. IEEE (2015)
Pell, B.: A strategic metagame player for general chess-like games. Comput. Intell. 12(1), 177–198 (1996)
Article Google Scholar
Potvin, J.-Y., Bengio, S.: The vehicle routing problem with time windows part ii: genetic search. INFORMS J. Comput. 8(2), 165–172 (1996)
Article MATH Google Scholar
Ramanujan, R., Sabharwal, A., Selman, B.: On adversarial search spaces and sampling-based planning. In: The International Conference on Automated Planning and Scheduling (ICAPS), pp. 242–245 (2010a)
Ramanujan, R., Sabharwal, A., Selman, B.: Understanding sampling style adversarial search methods. In: Grünwald, P., Spirtes, P. (eds.) UAI, pp. 474–483. AUAI Press (2010b)
Richard, J.L.: Amazons discover Monte-Carlo. In: Proceedings of the International Conference on Computers and Games, CG ’08, pp. 13–24, Berlin, Heidelberg. Springer (2008)
Rimmel, A., Teytaud, F., Cazenave, T.: Optimization of the nested monte-carlo algorithm on the traveling salesman problem with time windows. In: Applications of Evolutionary Computation, pp. 501–510 (2011)
Sato, Y., Takahashi, D., Grimbergen, R.: A shogi program based on monte-carlo tree search. Int. Comput. Games Assoc. 33(2), 80–92 (2010)
Google Scholar
Shibahara, K., Yoshiyuki, K.: Combining final score with winning percentage by sigmoid function in monte-carlo simulations. In IEEE Symposium on Computational Intelligence and Games, pp. 183–190. IEEE (2008)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, P.: George, Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Szita, I., Chaslot, G., Spronck, P.: Monte-Carlo tree search in settlers of catan. In: Proceedings of the 12th International Conference on Advances in Computer Games, ACG’09, pp. 21–32, Berlin, Heidelberg. Springer (2010)
Tesauro, G., Rajan, V.T., Segal, R.: Bayesian inference in monte-carlo tree search. In: Grünwald, P., Spirtes, P. (eds.) Proceedings of the Conference on Uncertainty in Artificial Intelligence, pp. 580–588. AUAI Press (2010)
Teytaud, F. Teytaud, O.: On the huge benefit of decisive moves in monte-carlo tree search algorithms. In: Yannakakis, G.N., Togelius, J. (eds.) IEEE Conference on Computational Intelligence and Games, pp. 359–364. IEEE (2010)
Tsai, C.-T., Liaw, C., Huang, H.-C., Ko, C.-H.: An evolutionary strategy for a computer team game. Comput. Intell. 27(2), 218–234 (2011)
Article MathSciNet Google Scholar
Weinstein, A., Mansley, C., Littman, M.: Sample-based planning for continuous action markov decision processes. In: Bacchus, F. Domshlak, C., Edelkamp, S., Helmert, M. (eds.) Proceedings of the International Conference on Automated Planning and Scheduling, ICAPS, pp. 335–338. AAAI (2011)
Winands, M.H.M., Björnsson, Y., Saito, J.-T.: Monte carlo tree search in lines of action. IEEE Trans. Comput. Intell. AI Games 2(4), 239–250 (2010)
Article Google Scholar
Xie, F., Liu, Z.: Backpropagation modification in monte-carlo game tree search. In: International Symposium on Intelligent Information Technology Application, vol. 2, pp. 125–128. IEEE, (2009)
Xin, B., Chen, J., Pan, F.: Problem difficulty analysis for particle swarm optimization: deception and modality. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 623–630. ACM (2009)

Download references

Acknowledgments

This research was supported under Australian Research Council’s Discovery Projects funding scheme, Project Number DE 140100017.

Author information

Authors and Affiliations

Faculty of Information Technology, Monash University, Clayton, Australia
Nathan Companez & Aldeida Aleti

Authors

Nathan Companez
View author publications
You can also search for this author in PubMed Google Scholar
Aldeida Aleti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldeida Aleti.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Companez, N., Aleti, A. Can Monte-Carlo Tree Search learn to sacrifice?. J Heuristics 22, 783–813 (2016). https://doi.org/10.1007/s10732-016-9320-y

Download citation

Received: 11 February 2016
Revised: 30 September 2016
Accepted: 03 October 2016
Published: 13 October 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s10732-016-9320-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Can Monte-Carlo Tree Search learn to sacrifice?

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Monte-Carlo Tree Search for Multi-agent Pathfinding: Preliminary Results

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Can Monte-Carlo Tree Search learn to sacrifice?

Abstract

Access this article

Similar content being viewed by others

Monte Carlo Tree Search: a review of recent modifications and applications

Monte-Carlo Tree Search for Multi-agent Pathfinding: Preliminary Results

Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation