Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search

Liu, Yun-Ching; Tsuruoka, Yoshimasa

doi:10.1007/978-3-319-27992-3_6

Yun-Ching Liu¹⁶ &
Yoshimasa Tsuruoka¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9525))

Included in the following conference series:

Advances in Computer Games

687 Accesses

Abstract

The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree Search (MCTS), is currently the most widely used variant of MCTS. Recently, a number of investigations into applying other bandit algorithms to MCTS have produced interesting results. In this research, we will investigate the possibility of combining the improved UCB algorithm, proposed by Auer et al. [2], with MCTS. However, various characteristics and properties of the improved UCB algorithm may not be ideal for a direct application to MCTS. Therefore, some modifications were made to the improved UCB algorithm, making it more suitable for the task of game-tree search. The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees. The performance of Mi-UCT is demonstrated on the games of \(9\times 9\) Go and \(9\times 9\) NoGo, and has shown to outperform the plain UCT algorithm when only a small number of playouts are given, and rougly on the same level when more playouts are available.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4 (1985)
Article MathSciNet MATH Google Scholar
Auer, P., Ortner, R.: UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem. Periodica Math. Hung. 61, 1–2 (2010)
Article MathSciNet MATH Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Article MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Google Scholar
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Article Google Scholar
Tolpin, D., Shimony, S.E.: MCTS based on simple regret. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 570–576 (2012)
Google Scholar
Cazenave, T.: Sequential halving applied to trees. IEEE Trans. Comput. Intell. AI Games 7, 102–105 (2014)
Article Google Scholar
Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in monte-carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014)
Chapter Google Scholar
Imagawa, T., Kaneko, T.: Applying multi armed bandit algorithms to MCTS and those analysis. In: Proceedings of the 19th Game Programming Workshop (GPW-14), pp. 145–150 (2014)
Google Scholar
Karnin, Z., Koren, T., Oren, S.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning (ICML’13), pp. 1238–1246 (2013)
Google Scholar
Garivier, A., Cappe, A.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Proceedings of 24th Annual Conference on Learning Theory (COLT ’11), pp. 359–376 (2011)
Google Scholar
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering and Information Systems, University of Tokyo, Tokyo, Japan
Yun-Ching Liu & Yoshimasa Tsuruoka

Authors

Yun-Ching Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Tsuruoka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yun-Ching Liu .

Editor information

Editors and Affiliations

Leiden University , Leiden, The Netherlands
Aske Plaat
Leiden University , Leiden, The Netherlands
Jaap van den Herik
Leiden University , Leiden, The Netherlands
Walter Kosters

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, YC., Tsuruoka, Y. (2015). Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search. In: Plaat, A., van den Herik, J., Kosters, W. (eds) Advances in Computer Games. ACG 2015. Lecture Notes in Computer Science(), vol 9525. Springer, Cham. https://doi.org/10.1007/978-3-319-27992-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-27992-3_6
Published: 25 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27991-6
Online ISBN: 978-3-319-27992-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics