Reinforcement Learning with Attention that Works: A Self-Supervised Approach

Manchin, Anthony; Abbasnejad, Ehsan; van den Hengel, Anton

doi:10.1007/978-3-030-36802-9_25

Anthony Manchin⁹,
Ehsan Abbasnejad⁹ &
Anton van den Hengel⁹

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1143))

Included in the following conference series:

International Conference on Neural Information Processing

2894 Accesses
26 Citations

Abstract

Attention models have had a significant positive impact on deep learning across a range of tasks. However previous attempts at integrating attention with reinforcement learning have failed to produce significant improvements. Unlike the selective attention models used in previous attempts, which constrain the attention via preconceived notions of importance, our implementation utilises the Markovian properties inherent in the state input. We propose the first combination of self attention and reinforcement learning that is capable of producing significant improvements, including new state of the art results in the Arcade Learning Environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). CoRR abs/1409.0473. http://arxiv.org/abs/1409.0473
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Article Google Scholar
Broadbent, D.E.: Perception and Communication (1958)
Book Google Scholar
Choi, J., Lee, B., Zhang, B.: Multi-focus attention network for efficient deep reinforcement learning. CoRR abs/1712.04603 (2017). http://arxiv.org/abs/1712.04603
Dhariwal, P., et al.: Openai baselines (2017). https://github.com/openai/baselines
Fang, S., Xie, H., Zha, Z.J., Sun, N., Tan, J., Zhang, Y.: Attention and language ensemble for scene text recognition with convolutional sequence modeling. In: Proceedings of the 26th ACM International Conference on Multimedia, MM 2018, pp. 248–256. ACM, New York (2018). https://doi.org/10.1145/3240508.3240571
Fortunato, M., et al.: Noisy networks for exploration. CoRR abs/1706.10295 (2017). http://arxiv.org/abs/1706.10295
Gregor, M., Nemec, D., Janota, A., Pirnik, R.: A visual attention operator for playing Pac-Man, pp. 1–6, May 2018. https://doi.org/10.1109/ELEKTRO.2018.8398308
Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding Atari agents. CoRR abs/1711.00138 (2017). http://arxiv.org/abs/1711.00138
Han, Y.: Explore multi-step reasoning in video question answering. In: Proceedings of the 1st Workshop and Challenge on Comprehensive Video Understanding in the Wild. p. 5. CoVieW 2018. ACM, New York (2018). https://doi.org/10.1145/3265987.3265996
Hausknecht, M.J., Stone, P.: Deep recurrent q-learning for partially observable mdps. CoRR abs/1507.06527 (2015). http://arxiv.org/abs/1507.06527
Horgan, D., et al.: Distributed prioritized experience replay. CoRR abs/1803.00933 (2018). http://arxiv.org/abs/1803.00933
Kastaniotis, D., Ntinou, I., Tsourounis, D., Economou, G., Fotopoulos, S.: Attention-aware generative adversarial networks (ATA-GANS). CoRR abs/1802.09070 (2018). http://arxiv.org/abs/1802.09070
Kay, W., et al.: The kinetics human action video dataset. CoRR abs/1705.06950 (2017). http://arxiv.org/abs/1705.06950
Mnih, V., Heess, N., Graves, A., Kavukcuoglu, K.: Recurrent models of visual attention. CoRR abs/1406.6247 (2014). http://arxiv.org/abs/1406.6247
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015). https://doi.org/10.1038/nature14236
Article Google Scholar
Oh, J., Chockalingam, V., Singh, S.P., Lee, H.: Control of memory, active perception, and action in minecraft. CoRR abs/1605.09128 (2016). http://arxiv.org/abs/1605.09128
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347
Shi, J., Zhang, H., Li, J.: Explainable and explicit visual reasoning over scene graphs. CoRR abs/1812.01855 (2018). http://arxiv.org/abs/1812.01855
Sigurdsson, G.A., Varol, G., Wang, X., Farhadi, A., Laptev, I., Gupta, A.: Hollywood in homes: crowdsourcing data collection for activity understanding. CoRR abs/1604.01753 (2016). http://arxiv.org/abs/1604.01753
Sorokin, I., Seleznev, A., Pavlov, M., Fedorov, A., Ignateva, A.: Deep attention recurrent q-network. CoRR abs/1512.01693 (2015). http://arxiv.org/abs/1512.01693
Wang, X., Girshick, R.B., Gupta, A., He, K.: Non-local neural networks. CoRR abs/1711.07971 (2017). http://arxiv.org/abs/1711.07971
Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 5279–5288. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/7112-scalable-trust-region-method-for-deep-reinforcement-learning-using-kronecker-factored-approximation.pdf
Xu, T., et al.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. CoRR abs/1711.10485 (2017). http://arxiv.org/abs/1711.10485
Yuezhang, L., Zhang, R., Ballard, D.H.: An initial attempt of combining visual selective attention with deep reinforcement learning. CoRR abs/1811.04407 (2018). http://arxiv.org/abs/1811.04407
Zhang, R., et al.: AGIL: learning attention from human for visuomotor tasks. CoRR abs/1806.03960 (2018). http://arxiv.org/abs/1806.03960
Zhao, S., Zhang, Z.: Attention-via-attention neural machine translation (2018). https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16534

Download references

Acknowledgments

We would like to thank Michele Sasdelli for his helpful discussions, and Damien Teney for his feed-back and advice on writing this paper.

Author information

Authors and Affiliations

The Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia
Anthony Manchin, Ehsan Abbasnejad & Anton van den Hengel

Authors

Anthony Manchin
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Abbasnejad
View author publications
You can also search for this author in PubMed Google Scholar
Anton van den Hengel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anthony Manchin .

Editor information

Editors and Affiliations

Australian National University, Canberra, ACT, Australia
Tom Gedeon
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manchin, A., Abbasnejad, E., van den Hengel, A. (2019). Reinforcement Learning with Attention that Works: A Self-Supervised Approach. In: Gedeon, T., Wong, K., Lee, M. (eds) Neural Information Processing. ICONIP 2019. Communications in Computer and Information Science, vol 1143. Springer, Cham. https://doi.org/10.1007/978-3-030-36802-9_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-36802-9_25
Published: 05 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36801-2
Online ISBN: 978-3-030-36802-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics