Abstract
Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. Unlike the selective attention models used in previous attempts, which constrain the attention via preconceived notions of importance, our implementation utilises the Markovian properties inherent in the state input. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). CoRR abs/1409.0473. http://arxiv.org/abs/1409.0473
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Broadbent, D.E.: Perception and Communication (1958)
Choi, J., Lee, B., Zhang, B.: Multi-focus attention network for efficient deep reinforcement learning. CoRR abs/1712.04603 (2017). http://arxiv.org/abs/1712.04603
Dhariwal, P., et al.: Openai baselines (2017). https://github.com/openai/baselines
Fang, S., Xie, H., Zha, Z.J., Sun, N., Tan, J., Zhang, Y.: Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of the 26th ACM International Conference on Multimedia, MM 2018, pp. 248–256. ACM, New York (2018). https://doi.org/10.1145/3240508.3240571
Fortunato, M., et al.: Noisy networks for exploration. CoRR abs/1706.10295 (2017). http://arxiv.org/abs/1706.10295
Gregor, M., Nemec, D., Janota, A., Pirnik, R.: A visual attention operator for playing Pac-Man, pp. 1–6, May 2018. https://doi.org/10.1109/ELEKTRO.2018.8398308
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. CoRR abs/1711.00138 (2017). http://arxiv.org/abs/1711.00138
Han, Y.: Explore multi-step reasoning in video question answering. In: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild. p. 5. CoVieW 2018. ACM, New York (2018). https://doi.org/10.1145/3265987.3265996
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable mdps. CoRR abs/1507.06527 (2015). http://arxiv.org/abs/1507.06527
Horgan, D., et al.: Distributed prioritized experience replay. CoRR abs/1803.00933 (2018). http://arxiv.org/abs/1803.00933
Kastaniotis, D., Ntinou, I., Tsourounis, D., Economou, G., Fotopoulos, S.: Attention-aware generative adversarial networks (ATA-GANS). CoRR abs/1802.09070 (2018). http://arxiv.org/abs/1802.09070
Kay, W., et al.: The kinetics human action video dataset. CoRR abs/1705.06950 (2017). http://arxiv.org/abs/1705.06950
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. CoRR abs/1406.6247 (2014). http://arxiv.org/abs/1406.6247
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Oh, J., Chockalingam, V., Singh, S.P., Lee, H.: Control of memory, active perception, and action in minecraft. CoRR abs/1605.09128 (2016). http://arxiv.org/abs/1605.09128
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347
Shi, J., Zhang, H., Li, J.: Explainable and explicit visual reasoning over scene graphs. CoRR abs/1812.01855 (2018). http://arxiv.org/abs/1812.01855
Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. CoRR abs/1604.01753 (2016). http://arxiv.org/abs/1604.01753
Sorokin, I., Seleznev, A., Pavlov, M., Fedorov, A., Ignateva, A.: Deep attention recurrent q-network. CoRR abs/1512.01693 (2015). http://arxiv.org/abs/1512.01693
Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5279–5288. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7112-scalable-trust-region-method-for-deep-reinforcement-learning-using-kronecker-factored-approximation.pdf
Xu, T., et al.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. CoRR abs/1711.10485 (2017). http://arxiv.org/abs/1711.10485
Yuezhang, L., Zhang, R., Ballard, D.H.: An initial attempt of combining visual selective attention with deep reinforcement learning. CoRR abs/1811.04407 (2018). http://arxiv.org/abs/1811.04407
Zhang, R., et al.: AGIL: learning attention from human for visuomotor tasks. CoRR abs/1806.03960 (2018). http://arxiv.org/abs/1806.03960
Zhao, S., Zhang, Z.: Attention-via-attention neural machine translation (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16534
Acknowledgments
We would like to thank Michele Sasdelli for his helpful discussions, and Damien Teney for his feed-back and advice on writing this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Manchin, A., Abbasnejad, E., van den Hengel, A. (2019). Reinforcement Learning with Attention that Works: A Self-Supervised Approach. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1143. Springer, Cham. https://doi.org/10.1007/978-3-030-36802-9_25
Download citation
DOI: https://doi.org/10.1007/978-3-030-36802-9_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36801-2
Online ISBN: 978-3-030-36802-9
eBook Packages: Computer ScienceComputer Science (R0)