Deep or Wide? Learning Policy and Value Neural Networks for Combinatorial Games

  • Stefan EdelkampEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 705)


The success in learning how to play Go at a professional level is based on training a deep neural network on a wider selection of human expert games and raises the question on the availability, the limits, and the possibilities of this technique for other combinatorial games, especially when there is a lack of access to a larger body of additional expert knowledge.

As a step towards this direction, we trained a value network for TicTacToe, providing perfect winning information obtained by retrograde analysis. Next, we trained a policy network for the SameGame, a challenging combinatorial puzzle. Here, we discuss the interplay of deep learning with nested rollout policy adaptation (NRPA), a randomized algorithm for optimizing the outcome of single-player games.

In both cases we observed that ordinary feed-forward neural networks can perform better than convolutional ones both in accuracy and efficiency.


Partial State Deep Learning Convolutional Neural Network Stochastic Gradient Deep Neural Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bottou, L.: Stochastic learning. In: Bousquet, O., Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 146–168. Springer, Heidelberg (2004). doi: 10.1007/978-3-540-28650-9_7 CrossRefGoogle Scholar
  2. 2.
    Cazenave, T.: Nested Monte-Carlo Search. In: IJCAI, pp. 456–461 (2009)Google Scholar
  3. 3.
    Cazenave, T.: Combining tactical search and deep learning in the game of go. In: IJCAI-Workshop on Deep Learning for Artificial Intelligence (DLAI) (2016)Google Scholar
  4. 4.
    Edelkamp, S., Schrödl, S.: Heuristic Search - Theory and Applications. Academic Press, London (2012)zbMATHGoogle Scholar
  5. 5.
    Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: ECML, pp. 282–293 (2006)Google Scholar
  6. 6.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  7. 7.
    Rojas, R.: Neural Networks: A Systematic Introduction. Springer, New York (1996)CrossRefzbMATHGoogle Scholar
  8. 8.
    Rosin, C.D.: Nested rollout policy adaptation for Monte-Carlo tree search. In: IJCAI, pp. 649–654 (2011)Google Scholar
  9. 9.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–503 (2016)CrossRefGoogle Scholar
  10. 10.
    Solovay, R.M., Strassen, V.: A fast Monte-Carlo test for primality. SIAM J. Comput. 6(1), 84–85 (1977)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Solovay, R.M., Strassen, V.: Erratum a fast Monte-Carlo test for primality. SIAM J. Comput. 7(1), 118 (1978)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Faculty of Mathematics and Computer ScienceUniversity of BremenBremenGermany

Personalised recommendations