Skip to main content

Visual Rationalizations in Deep Reinforcement Learning for Atari Games

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1021))

Abstract

Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games. However, clearly explaining why a certain action is taken by the agent can be as important as the decision itself. Deep reinforcement learning models, as other deep learning models, tend to be opaque in their decision-making process. In this work, we propose to make deep reinforcement learning more transparent by visualizing the evidence on which the agent bases its decision. In this work, we emphasize the importance of producing a justification for an observed action, which could be applied to a black-box decision agent.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Here we assume problems where partial observability can be addressed by representing a state as a small number of past observations.

  2. 2.

    k is usually chosen to be the last convolutional layer in the CNN.

  3. 3.

    the Exponential Linear Unit has been chosen in favor of the ReLU used in the original Grad-CAM paper due to the dying ReLU effect described in [22].

  4. 4.

    evaluated in this case means having forwarded each state that has been manually sampled through the model.

References

  1. Andreas, J., Rohrbach, M., Darrell, T., Klein, D.: Neural module networks. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  2. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. CoRR abs/1207.4708 (2012). http://arxiv.org/abs/1207.4708

  3. Biran, O., McKeown, K.: Justification narratives for individual classifications. In: Proceedings of the AutoML Workshop at ICML 2014 (2014)

    Google Scholar 

  4. Greydanus, S., Koul, A., Dodge, J., Fern, A.: Visualizing and understanding atari agents. CoRR abs/1711.00138 (2017). http://arxiv.org/abs/1711.00138

  5. Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDPs. CoRR, abs/1507.06527 (2015)

    Google Scholar 

  6. Heess, N., et al.: Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286 (2017)

  7. Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., Meger, D.: Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560 (2017)

  8. Hendricks, L.A., Akata, Z., Rohrbach, M., Donahue, J., Schiele, B., Darrell, T.: Generating visual explanations. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 3–19. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_1

    Chapter  Google Scholar 

  9. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv preprint arXiv:1710.02298 (2017)

  10. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  11. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  12. Park, D.H., et al.: Multimodal explanations: justifying decisions and pointing to the evidence. In: IEEE CVPR (2018)

    Google Scholar 

  13. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)

    Google Scholar 

  14. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. CoRR abs/1707.06347 (2017). http://arxiv.org/abs/1707.06347

  15. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: IEEE ICCV (2017)

    Google Scholar 

  16. Sharma, S., Lakshminarayanan, A.S., Ravindran, B.: Learning to repeat: fine grained action repetition for deep reinforcement learning. CoRR abs/1702.06054 (2017). http://arxiv.org/abs/1702.06054

  17. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  18. Teach, R.L., Shortliffe, E.H.: An analysis of physician attitudes regarding computer-based clinical consultation systems. In: Anderson, J.G., Jay, S.J. (eds.) Use and Impact of Computers in Clinical Medicine. Computers and Medicine, pp. 68–85. Springer, New York (1981). https://doi.org/10.1007/978-1-4613-8674-2_6

    Chapter  Google Scholar 

  19. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: AAAI, vol. 16, pp. 2094–2100 (2016)

    Google Scholar 

  20. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. In: Sutton, R.S. (ed.) Reinforcement Learning. The Springer International Series in Engineering and Computer Science (Knowledge Representation, Learning and Expert Systems), vol. 173, pp. 5–32. Springer, Boston (1992). https://doi.org/10.1007/978-1-4615-3618-5_2

    Chapter  Google Scholar 

  21. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992). https://doi.org/10.1007/BF00992696

    Article  MathSciNet  MATH  Google Scholar 

  22. Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015). http://arxiv.org/abs/1505.00853

  23. Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: prediction difference analysis. In: ICLR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laurens Weitkamp .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Weitkamp, L., van der Pol, E., Akata, Z. (2019). Visual Rationalizations in Deep Reinforcement Learning for Atari Games. In: Atzmueller, M., Duivesteijn, W. (eds) Artificial Intelligence. BNAIC 2018. Communications in Computer and Information Science, vol 1021. Springer, Cham. https://doi.org/10.1007/978-3-030-31978-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-31978-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-31977-9

  • Online ISBN: 978-3-030-31978-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics