Advertisement

Learning from the Memory of Atari 2600

  • Jakub SygnowskiEmail author
  • Henryk Michalewski
Conference paper
  • 542 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 705)

Abstract

We train a number of neural networks to play the games Bowling, Breakout and Seaquest using information stored in the memory of a video game console Atari 2600. We consider four models of neural networks which differ in size and architecture: two networks which use only information contained in the RAM and two mixed networks which use both information in the RAM and information from the screen.

As the benchmark we used the convolutional model proposed in [17] and received comparable results in all considered games. Quite surprisingly, in the case of Seaquest we were able to train RAM-only agents which behave better than the benchmark screen-only agent. Mixing screen and RAM did not lead to an improved performance comparing to screen-only and RAM-only agents.

Notes

Acknowledgements

This research was carried out with the support of grant GG63-11 awarded by the Interdisciplinary Centre for Mathematical and Computational Modelling (ICM) University of Warsaw. We would like to express our thanks to Marc G. Bellemare for suggesting this research topic.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Lasagne - lightweight library to build and train neural networks in Theano. https://github.com/lasagne/lasagne
  6. 6.
    Nathan Sprague’s implementation of DQN. https://github.com/spragunr/deep_q_rl
  7. 7.
    The repository of our code. https://github.com/sygi/deep_q_rl
  8. 8.
  9. 9.
  10. 10.
    Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)Google Scholar
  12. 12.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), Oral Presentation (2010)Google Scholar
  13. 13.
    Braylan, A., Hollenbeck, M., Meyerson, E., Miikkulainen, R.: Frame skip is a powerful parameter for learning to play Atari. In: AAAI-15 Workshop on Learning for General Competency in Video Games (2015)Google Scholar
  14. 14.
    Defazio, A., Graepel, T.: A comparison of learning algorithms on the Arcade learning environment. CoRR abs/1410.8620 (2014). http://arxiv.org/abs/1410.8620
  15. 15.
    Liang, Y., Machado, M.C., Talvitie, E., Bowling, M.: State of the art control of Atari games using shallow reinforcement learning. arXiv preprint arXiv:1512.01563 (2015)
  16. 16.
    Lipovetzky, N., Ramirez, M., Geffner, H.: Classical planning with simulators: results on the Atari video games. In: International Joint Conference on Artificial Intelligence (IJCAI), pp. 1610–1616 (2015)Google Scholar
  17. 17.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing Atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)Google Scholar
  18. 18.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  19. 19.
    Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach, 3 internat edn. Pearson Education, Englewood Cliffs (2010)zbMATHGoogle Scholar
  20. 20.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. arXiv preprint arXiv:1509.06461 (2015)
  22. 22.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning. ICML 2008, pp. 1096–1103. ACM, New York (2008)Google Scholar
  23. 23.
    Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures preprint arXiv:1511.06581 (2015)
  24. 24.
    Warde-Farley, D., Goodfellow, I.J., Courville, A., Bengio, Y.: An empirical analysis of dropout in piecewise linear networks. In: ICLR 2014 (2014)Google Scholar
  25. 25.
    Watkins, C.J.C.H., Dayan, P.: Technical note Q-learning. Mach. Learn. 8, 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Mathematics, Informatics, and MechanicsUniversity of WarsawWarsawPoland

Personalised recommendations