Deep or Wide? Learning Policy and Value Neural Networks for Combinatorial Games
- 543 Downloads
The success in learning how to play Go at a professional level is based on training a deep neural network on a wider selection of human expert games and raises the question on the availability, the limits, and the possibilities of this technique for other combinatorial games, especially when there is a lack of access to a larger body of additional expert knowledge.
As a step towards this direction, we trained a value network for TicTacToe, providing perfect winning information obtained by retrograde analysis. Next, we trained a policy network for the SameGame, a challenging combinatorial puzzle. Here, we discuss the interplay of deep learning with nested rollout policy adaptation (NRPA), a randomized algorithm for optimizing the outcome of single-player games.
In both cases we observed that ordinary feed-forward neural networks can perform better than convolutional ones both in accuracy and efficiency.
KeywordsPartial State Deep Learning Convolutional Neural Network Stochastic Gradient Deep Neural Network
- 2.Cazenave, T.: Nested Monte-Carlo Search. In: IJCAI, pp. 456–461 (2009)Google Scholar
- 3.Cazenave, T.: Combining tactical search and deep learning in the game of go. In: IJCAI-Workshop on Deep Learning for Artificial Intelligence (DLAI) (2016)Google Scholar
- 5.Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: ECML, pp. 282–293 (2006)Google Scholar
- 6.Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
- 8.Rosin, C.D.: Nested rollout policy adaptation for Monte-Carlo tree search. In: IJCAI, pp. 649–654 (2011)Google Scholar
- 9.Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–503 (2016)CrossRefGoogle Scholar