Advertisement

NeuroHex: A Deep Q-learning Hex Agent

  • Kenny YoungEmail author
  • Gautham Vasan
  • Ryan Hayward
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 705)

Abstract

DeepMind’s recent spectacular success in using deep convolutional neural nets and machine learning to build superhuman level agents—e.g. for Atari games via deep Q-learning and for the game of Go via other deep Reinforcement Learning methods—raises many questions, including to what extent these methods will succeed in other domains. In this paper we consider DQL for the game of Hex: after supervised initializing, we use self-play to train NeuroHex, an 11-layer convolutional neural network that plays Hex on the 13 \(\times \) 13 board. Hex is the classic two-player alternate-turn stone placement game played on a rhombus of hexagonal cells in which the winner is whomever connects their two opposing sides. Despite the large action and state space, our system trains a Q-network capable of strong play with no search. After two weeks of Q-learning, NeuroHex achieves respective win-rates of 20.4% as first player and 2.1% as second player against a 1-s/move version of MoHex, the current ICGA Olympiad Hex champion. Our data suggests further improvement might be possible with more training time.

Keywords

Optimal Policy Reinforcement Learning Gradient Descent Convolutional Neural Network Policy Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Anshelevich, V.V.: The game of Hex: an automatic theorem proving approach to game programming. In: AAAI/IAAI, pp. 189–194 (2000)Google Scholar
  2. 2.
    Arneson, B., Hayward, R., Henderson, P.: Wolve wins Hex tournament. ICGA J. 32, 49–53 (2008)CrossRefGoogle Scholar
  3. 3.
    Arneson, B., Hayward, R.B., Henderson, P.: Monte Carlo tree search in Hex. IEEE Trans. Comput. Intell. AI Games 2(4), 251–258 (2010)CrossRefGoogle Scholar
  4. 4.
    Bastien, F., Lamblin, P., Pascanu, R., Bergstra, J., Goodfellow, I.J., Bergeron, A., Bouchard, N., Bengio, Y.: Theano: new features and speed improvements. In: NIPS 2012 Deep Learning and Unsupervised Feature Learning Workshop (2012)Google Scholar
  5. 5.
    Baudiš, P., Gailly, J.: PACHI: state of the art open source go program. In: Herik, H.J., Plaat, A. (eds.) ACG 2011. LNCS, vol. 7168, pp. 24–38. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-31866-5_3 CrossRefGoogle Scholar
  6. 6.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy) (2010). Oral PresentationGoogle Scholar
  7. 7.
    Gardner, M.: Mathematical games. Sci. Am. 197(1), 145–150 (1957)CrossRefGoogle Scholar
  8. 8.
    Hayward, R.B.: MoHex wins Hex tournament. ICGA J. 36(3), 180–183 (2013)CrossRefGoogle Scholar
  9. 9.
    Huang, S.-C., Arneson, B., Hayward, R.B., Müller, M., Pawlewicz, J.: MoHex 2.0: a pattern-based MCTS Hex player. In: Herik, H.J., Iida, H., Plaat, A. (eds.) CG 2013. LNCS, vol. 8427, pp. 60–71. Springer, Cham (2014). doi: 10.1007/978-3-319-09165-5_6 Google Scholar
  10. 10.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  11. 11.
    Reisch, S.: Hex ist PSPACE-vollständig. Acta Informatica 15, 167–191 (1981)MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Shannon, C.E.: Computers and automata. Proc. Inst. Radio Eng. 41, 1234–1241 (1953)MathSciNetGoogle Scholar
  13. 13.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  14. 14.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  15. 15.
    Tesauro, G.: Temporal difference learning and TD-gammon. Commun. ACM 38(3), 58–68 (1995)CrossRefGoogle Scholar
  16. 16.
    Tieleman, T., Hinton, G.: Lecture 6.5–RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA Neural Netw. Mach. Learn. (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computing ScienceUniversity of AlbertaEdmontonCanada

Personalised recommendations