Advertisement

Multiple Overlapping Tiles for Contextual Monte Carlo Tree Search

  • Arpad Rimmel
  • Fabien Teytaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6024)

Abstract

Monte Carlo Tree Search is a recent algorithm that achieves more and more successes in various domains. We propose an improvement of the Monte Carlo part of the algorithm by modifying the simulations depending on the context. The modification is based on a reward function learned on a tiling of the space of Monte Carlo simulations. The tiling is done by regrouping the Monte Carlo simulations where two moves have been selected by one player. We show that it is very efficient by experimenting on the game of Havannah.

Keywords

Monte Carlo Simulation Reinforcement Learning Reward Function Average Reward Reinforcement Learning Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kocsis, L., Szepesvari, C.: Bandit-based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  2. 2.
    Pearl, J.: Heuristics. Intelligent search strategies for computer problem solving. Addison-Wesley, Reading (1984)Google Scholar
  3. 3.
    Bertsekas, D.P.: Neuro-dynamic programming. In: Encyclopedia of Optimization, pp. 2555–2560 (2009)Google Scholar
  4. 4.
    De Mesmay, F., Rimmel, A., Voronenko, Y., Püschel, M.: Bandit-Based Optimization on Graphs with Application to Library Performance Tuning. In: International Conference on Machine Learning, Montréal Canada (2009)Google Scholar
  5. 5.
    Rolet, P., Sebag, M., Teytaud, O.: Optimal active learning through billiards and upper confidence trees in continous domains. In: Proceedings of the European Conference on Machine Learning (2009)Google Scholar
  6. 6.
    Chaslot, G., Fiter, C., Hoock, J.B., Rimmel, A., Teytaud, O.: Adding expert knowledge and exploration in Monte-Carlo Tree Search. In: Advances in Computer Games, Pamplona Espagne. Springer, Heidelberg (2009)Google Scholar
  7. 7.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  8. 8.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning, 9–44 (1988)Google Scholar
  9. 9.
    Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2/3), 235–256 (2002)zbMATHCrossRefGoogle Scholar
  10. 10.
    Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6, 4–22 (1985)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Sherstov, E.A., Stone, P.: Function approximation via tile coding: Automating parameter choice. In: Zucker, J.-D., Saitta, L. (eds.) SARA 2005. LNCS (LNAI), vol. 3607, pp. 194–205. Springer, Heidelberg (2005)Google Scholar
  12. 12.
    Teytaud, F., Teytaud, O.: Creating an Upper-Confidence-Tree program for Havannah. In: Advances in Computer Games 12, Pamplona Espagne (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Arpad Rimmel
    • 1
  • Fabien Teytaud
    • 1
  1. 1.TAO (Inria), LRI, UMR 8623(CNRS - Univ. Paris-Sud)OrsayFrance

Personalised recommendations