Abstract
The UCT algorithm, which combines the UCB algorithm and Monte-Carlo Tree Search (MCTS), is currently the most widely used variant of MCTS. Recently, a number of investigations into applying other bandit algorithms to MCTS have produced interesting results. In this research, we will investigate the possibility of combining the improved UCB algorithm, proposed by Auer et al. [2], with MCTS. However, various characteristics and properties of the improved UCB algorithm may not be ideal for a direct application to MCTS. Therefore, some modifications were made to the improved UCB algorithm, making it more suitable for the task of game-tree search. The Mi-UCT algorithm is the application of the modified UCB algorithm applied to trees. The performance of Mi-UCT is demonstrated on the games of \(9\times 9\) Go and \(9\times 9\) NoGo, and has shown to outperform the plain UCT algorithm when only a small number of playouts are given, and rougly on the same level when more playouts are available.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1), 4 (1985)
Auer, P., Ortner, R.: UCB revisited: improved regret bounds for the stochastic multi-armed bandit problem. Periodica Math. Hung. 61, 1–2 (2010)
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47(2–3), 235–256 (2002)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA (1998)
Browne, C.B., Powley, E., Whitehouse, D., Lucas, S.M., Cowling, P.I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., Colton, S.: A survey of monte carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4(1), 1–43 (2012)
Tolpin, D., Shimony, S.E.: MCTS based on simple regret. In: Proceedings of the 26th AAAI Conference on Artificial Intelligence, pp. 570–576 (2012)
Cazenave, T.: Sequential halving applied to trees. IEEE Trans. Comput. Intell. AI Games 7, 102–105 (2014)
Pepels, T., Cazenave, T., Winands, M.H.M., Lanctot, M.: Minimizing simple and cumulative regret in monte-carlo tree search. In: Cazenave, T., Winands, M.H.M., Björnsson, Y. (eds.) CGW 2014. CCIS, vol. 504, pp. 1–15. Springer, Heidelberg (2014)
Imagawa, T., Kaneko, T.: Applying multi armed bandit algorithms to MCTS and those analysis. In: Proceedings of the 19th Game Programming Workshop (GPW-14), pp. 145–150 (2014)
Karnin, Z., Koren, T., Oren, S.: Almost optimal exploration in multi-armed bandits. In: Proceedings of the 30th International Conference on Machine Learning (ICML’13), pp. 1238–1246 (2013)
Garivier, A., Cappe, A.: The KL-UCB algorithm for bounded stochastic bandits and beyond. In: Proceedings of 24th Annual Conference on Learning Theory (COLT ’11), pp. 359–376 (2011)
Kaufmann, E., Korda, N., Munos, R.: Thompson sampling: an asymptotically optimal finite-time analysis. In: Bshouty, N.H., Stoltz, G., Vayatis, N., Zeugmann, T. (eds.) ALT 2012. LNCS, vol. 7568, pp. 199–213. Springer, Heidelberg (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Liu, YC., Tsuruoka, Y. (2015). Adapting Improved Upper Confidence Bounds for Monte-Carlo Tree Search. In: Plaat, A., van den Herik, J., Kosters, W. (eds) Advances in Computer Games. ACG 2015. Lecture Notes in Computer Science(), vol 9525. Springer, Cham. https://doi.org/10.1007/978-3-319-27992-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-27992-3_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27991-6
Online ISBN: 978-3-319-27992-3
eBook Packages: Computer ScienceComputer Science (R0)