Skip to main content

Deep Reinforcement Learning

Abstract

“The reward of suffering is experience.”—Harry S. Truman

Keywords

  • Reinforcement Learning
  • Monte Carlo Tree Search
  • Policy Gradient
  • Convolutional Neural Network
  • Zero Alpha

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-319-94463-0_9
  • Chapter length: 45 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-319-94463-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   79.99
Price excludes VAT (USA)
Hardcover Book
USD   69.99
Price excludes VAT (USA)
Figure 9.1
Figure 9.2
Figure 9.3
Figure 9.4
Figure 9.5
Figure 9.6
Figure 9.7
Figure 9.8
Figure 9.9
Figure 9.10
Figure 9.11

Bibliography

  1. D. Amodei at al. Concrete problems in AI safety. arXiv:1606.06565, 2016. https://arxiv.org/abs/1606.06565

  2. B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv:1611.02167, 2016. https://arxiv.org/abs/1611.02167

  3. J. Baxter, A. Tridgell, and L. Weaver. Knightcap: a chess program that learns by combining td (lambda) with game-tree search. arXiv cs/9901002, 1999.

    Google Scholar 

  4. M. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, pp. 253–279, 2013.

    CrossRef  Google Scholar 

  5. R. E. Bellman. Dynamic Programming. Princeton University Press, 1957.

    Google Scholar 

  6. M. Bojarski et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016.https://arxiv.org/abs/1604.07316

  7. M. Bojarski et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. arXiv:1704.07911, 2017.https://arxiv.org/abs/1704.07911

  8. C. Browne et al. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp. 1–43, 2012.

    CrossRef  Google Scholar 

  9. C. Clark and A. Storkey. Training deep convolutional neural networks to play go. ICML Confererence, pp. 1766–1774, 2015.

    Google Scholar 

  10. S. Gelly et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Communcations of the ACM, 55, pp. 106–113, 2012.

    CrossRef  Google Scholar 

  11. P. Glynn. Likelihood ratio gradient estimation: an overview, Proceedings of the 1987 Winter Simulation Conference, pp. 366–375, 1987.

    Google Scholar 

  12. I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, 42(6), pp. 1291–1307, 2012.

    CrossRef  Google Scholar 

  13. X. Guo, S. Singh, H. Lee, R. Lewis, and X. Wang. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. Advances in NIPS Conference, pp. 3338–3346, 2014.

    Google Scholar 

  14. H. van Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. AAAI Conference, 2016.

    Google Scholar 

  15. N. Heess et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286, 2017.https://arxiv.org/abs/1707.02286 Video 1 at: https://www.youtube.com/watch?v=hx_bgoTF7bs Video 2 at: https://www.youtube.com/watch?v=gn4nRCC9TwQ&feature=youtu.be

  16. S. Kakade. A natural policy gradient. NIPS Conference, pp. 1057–1063, 2002.

    Google Scholar 

  17. L. Kocsis and C. Szepesvari. Bandit based monte-carlo planning. ECML Conference, pp. 282–293, 2006.

    Google Scholar 

  18. M. Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv:1509.01549, 2015.

    Google Scholar 

  19. S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), pp. 1–40, 2016.Video at: https://sites.google.com/site/visuomotorpolicy/

  20. M. Lewis, D. Yarats, Y. Dauphin, D. Parikh, and D. Batra. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. arXiv:1706.05125, 2017.https://arxiv.org/abs/1706.05125

  21. J. Li, W. Monroe, A. Ritter, M. Galley,, J. Gao, and D. Jurafsky. Deep reinforcement learning for dialogue generation. arXiv:1606.01541, 2016.https://arxiv.org/abs/1606.01541

  22. Y. Li. Deep reinforcement learning: An overview. arXiv:1701.07274, 2017.https://arxiv.org/abs/1701.07274

  23. L.-J. Lin. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document, 1993.

    Google Scholar 

  24. C. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. International Conference on Learning Representations, 2015.

    Google Scholar 

  25. V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518 (7540), pp. 529–533, 2015.

    CrossRef  Google Scholar 

  26. V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement learning. arXiv:1312.5602., 2013.https://arxiv.org/abs/1312.5602

  27. V. Mnih et al. Asynchronous methods for deep reinforcement learning. ICML Confererence, pp. 1928–1937, 2016.

    Google Scholar 

  28. V. Mnih, N. Heess, and A. Graves. Recurrent models of visual attention. NIPS Conference, pp. 2204–2212, 2014.

    Google Scholar 

  29. A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1), pp. 103–130, 1993.

    Google Scholar 

  30. M. Müller, M. Enzenberger, B. Arneson, and R. Segal. Fuego - an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2, pp. 259–270, 2010.

    CrossRef  Google Scholar 

  31. K. S. Narendra and K. Parthasarathy. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1), pp. 4–27, 1990.

    CrossRef  Google Scholar 

  32. A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. Uncertainity in Artificial Intelligence, pp. 406–415, 2000.

    Google Scholar 

  33. J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), pp. 682–697, 2008.

    CrossRef  Google Scholar 

  34. D. Pomerleau. ALVINN, an autonomous land vehicle in a neural network. Technical Report, Carnegie Mellon University, 1989.

    Google Scholar 

  35. G. Rummery and M. Niranjan. Online Q-learning using connectionist systems (Vol. 37). University of Cambridge, Department of Engineering, 1994.

    Google Scholar 

  36. A. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, pp. 210–229, 1959.

    MathSciNet  CrossRef  Google Scholar 

  37. W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv:1707.05173, 2017.https://arxiv.org/abs/1707.05173

  38. S. Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6), pp. 233–242, 1999.

    CrossRef  Google Scholar 

  39. T. Schaul, J. Quan, I. Antonoglou, and D. Silver. Prioritized experience replay. arXiv:1511.05952, 2015.https://arxiv.org/abs/1511.05952

  40. J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. ICML Conference, 2015.

    Google Scholar 

  41. J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. ICLR Conference, 2016.

    Google Scholar 

  42. I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.

    Google Scholar 

  43. D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529.7587, pp. 484–489, 2016.

    CrossRef  Google Scholar 

  44. D. Silver et al. Mastering the game of go without human knowledge. Nature, 550.7676, pp. 354–359, 2017.

    CrossRef  Google Scholar 

  45. D. Silver et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv, 2017.https://arxiv.org/abs/1712.01815

  46. H. Simon. The Sciences of the Artificial. MIT Press, 1996.

    Google Scholar 

  47. I. Sutskever and V. Nair. Mimicking Go experts with convolutional neural networks. International Conference on Artificial Neural Networks, pp. 101–110, 2008.

    Google Scholar 

  48. R. Sutton. Learning to Predict by the Method of Temporal Differences, Machine Learning, 3, pp. 9–44, 1988.

    Google Scholar 

  49. R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.

    Google Scholar 

  50. R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. NIPS Conference, pp. 1057–1063, 2000.

    Google Scholar 

  51. G. Tesauro. Practical issues in temporal difference learning. Advances in NIPS Conference, pp. 259–266, 1992.

    Google Scholar 

  52. G. Tesauro. Td-gammon: A self-teaching backgammon program. Applications of Neural Networks, Springer, pp. 267–285, 1992.

    Google Scholar 

  53. G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), pp. 58–68, 1995.

    CrossRef  Google Scholar 

  54. S. Thrun. Learning to play the game of chess NIPS Conference, pp. 1069–1076, 1995.

    Google Scholar 

  55. Y. Tian, Q. Gong, W. Shang, Y. Wu, and L. Zitnick. ELF: An extensive, lightweight and flexible research platform for real-time strategy games. arXiv:1707.01067, 2017.https://arxiv.org/abs/1707.01067

  56. O. Vinyals and Q. Le. A Neural Conversational Model. arXiv:1506.05869, 2015.https://arxiv.org/abs/1506.05869

  57. C. J. H. Watkins. Learning from delayed rewards. PhD Thesis, King’s College, Cambridge, 1989.

    Google Scholar 

  58. C. J. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3–4), pp. 279–292, 1992.

    MATH  Google Scholar 

  59. R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), pp. 229–256, 1992.

    MATH  Google Scholar 

  60. K. Xu et al. Show, attend, and tell: Neural image caption generation with visual attention. ICML Confererence, 2015.

    Google Scholar 

  61. V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017.https://arxiv.org/abs/1709.00103

  62. B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv:1611.01578, 2016.https://arxiv.org/abs/1611.01578

  63. https://www.cs.toronto.edu/~kriz/cifar.html

  64. http://www.bbc.com/news/technology-35785875

  65. https://deepmind.com/blog/exploring-mysteries-alphago/

  66. http://selfdrivingcars.mit.edu/

  67. http://karpathy.github.io/2016/05/31/rl/

  68. https://github.com/hughperkins/kgsgo-dataset-preprocessor

  69. https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/

  70. https://qz.com/639952/ googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/

  71. http://www.mujoco.org/

  72. https://sites.google.com/site/gaepapersupp/home

  73. https://drive.google.com/file/d/0B9raQzOpizn1TkRIa241ZnBEcjQ/view

  74. https://www.youtube.com/watch?v=1L0TKZQcUtA&list=PLrAXtmErZgOeiKm4sgNOkn– GvNjby9efdf

  75. https://openai.com/

  76. https://www.youtube.com/watch?v=2pWv7GOvuf0

  77. https://gym.openai.com

  78. https://universe.openai.com

  79. https://github.com/facebookresearch/ParlAI

  80. https://github.com/openai/baselines

  81. https://github.com/carpedm20/deep-rl-tensorflow

  82. https://github.com/matthiasplappert/keras-rl

  83. http://apollo.auto/

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and Permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Aggarwal, C.C. (2018). Deep Reinforcement Learning. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94463-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94462-3

  • Online ISBN: 978-3-319-94463-0

  • eBook Packages: Computer ScienceComputer Science (R0)