Deep Preference Neural Network for Move Prediction in Board Games

Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 818)

Abstract

The training of deep neural networks for move prediction in board games using comparison training is studied. Specifically, the aim is to predict moves for the game Othello from championship tournament game data. A general deep preference neural network will be presented based on a twenty year old model by Tesauro. The problem of over-fitting becomes an immediate concern when training the deep preference neural networks. It will be shown how dropout may combat this problem to a certain extent. How classification test accuracy does not necessarily correspond to move accuracy is illustrated and the key difference between preference training versus single-label classification is discussed. The careful use of dropout coupled with richer game data produces an evaluation function that is a better move predictor but will not necessarily produce a stronger game player.

References

  1. 1.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
  2. 2.
    Binkley, K.J., Seehart, K., Hagiwara, M.: A study of artificial neural network architectures for Othello evaluation functions. Inf. Media Technol. 2(4), 1129–1139 (2007)Google Scholar
  3. 3.
    Buro, M.: Logistello: a strong learning Othello program. In: 19th Annual Conference Gesellschaft für Klassifikation eV, vol. 2 (1995)Google Scholar
  4. 4.
    Burrow, P.: Hybridising evolution and temporal difference learning. Ph.D. thesis, University of Essex, UK (2011)Google Scholar
  5. 5.
    Foullon-Perez, A., Lucas, S.M.: Orientational features with the SNT-grid. In: 2009 International Joint Conference on Neural Networks, pp. 877–881 (2009)Google Scholar
  6. 6.
    Fürnkranz, J., Hüllermeier, E.: Preference learning: an introduction. In: Fürnkranz, J., Hüllermeier, E. (eds.) Preference Learning, pp. 1–17. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-14125-6_1 Google Scholar
  7. 7.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  8. 8.
    Lagoudakis, M., Parr, R.: Reinforcement learning as classification: leveraging modern classifiers. In: ICML, vol. 20, pp. 424–431 (2003)Google Scholar
  9. 9.
    Lazaric, A., Ghavamzadeh, M., Munos, R.: Analysis of a classification-based policy iteration algorithm. In: Proceedings of the 27th International Conference on Machine Learning, pp. 607–614 (2010)Google Scholar
  10. 10.
    Li, L., Bulitko, V., Greiner, R.: Focus of attention in reinforcement learning. J. Univ. Comput. Sci. 13(9), 1246–1269 (2007)Google Scholar
  11. 11.
    Rigutini, L., Papini, T., Maggini, M., Scarselli, F.: Sortnet: learning to rank by a neural preference function. IEEE Trans. Neural Netw. 22(9), 1368–1380 (2011)CrossRefGoogle Scholar
  12. 12.
    Rimmel, A., Teytaud, O., Lee, C.S., Yen, S.J., Wang, M.H., Tsai, S.R.: Current frontiers in computer Go. IEEE Trans. Comput. Intell. AI Games 2(4), 229–238 (2010)CrossRefGoogle Scholar
  13. 13.
    Runarsson, T.P., Lucas, S.M.: Preference learning for move prediction and evaluation function approximation in Othello. IEEE Trans. Comput. Intell. AI Games 6(3), 300–313 (2014)CrossRefGoogle Scholar
  14. 14.
    Runarsson, T., Lucas, S.: Imitating play from game trajectories: temporal difference learning versus preference learning. In: IEEE Conference on Computational Intelligence and Games, pp. 79–82 (2012)Google Scholar
  15. 15.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham, J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel, T., Hassabis, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)CrossRefGoogle Scholar
  16. 16.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATHGoogle Scholar
  17. 17.
    Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)MATHGoogle Scholar
  18. 18.
    Tesauro, G.: Connectionist learning of expert preferences by comparison training. In: NIPS, vol. 1, pp. 99–106 (1988)Google Scholar
  19. 19.
    Tesauro, G.: Neurogammon wins computer olympiad. Neural Comput. 1(3), 321–323 (1989)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Engineering and Natural SciencesUniversity of IcelandReykjavikIceland

Personalised recommendations