Deep Reinforcement Learning

Aggarwal, Charu C.

doi:10.1007/978-3-319-94463-0_9

Charu C. Aggarwal²

431k Accesses
3 Citations

Abstract

“The reward of suffering is experience.”—Harry S. Truman

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

D. Amodei at al. Concrete problems in AI safety. arXiv:1606.06565, 2016. https://arxiv.org/abs/1606.06565
B. Baker, O. Gupta, N. Naik, and R. Raskar. Designing neural network architectures using reinforcement learning. arXiv:1611.02167, 2016. https://arxiv.org/abs/1611.02167
J. Baxter, A. Tridgell, and L. Weaver. Knightcap: a chess program that learns by combining td (lambda) with game-tree search. arXiv cs/9901002, 1999.
Google Scholar
M. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, pp. 253–279, 2013.
Article Google Scholar
R. E. Bellman. Dynamic Programming. Princeton University Press, 1957.
Google Scholar
M. Bojarski et al. End to end learning for self-driving cars. arXiv:1604.07316, 2016.https://arxiv.org/abs/1604.07316
M. Bojarski et al. Explaining How a Deep Neural Network Trained with End-to-End Learning Steers a Car. arXiv:1704.07911, 2017.https://arxiv.org/abs/1704.07911
C. Browne et al. A survey of monte carlo tree search methods. IEEE Transactions on Computational Intelligence and AI in Games, 4(1), pp. 1–43, 2012.
Article Google Scholar
C. Clark and A. Storkey. Training deep convolutional neural networks to play go. ICML Confererence, pp. 1766–1774, 2015.
Google Scholar
S. Gelly et al. The grand challenge of computer Go: Monte Carlo tree search and extensions. Communcations of the ACM, 55, pp. 106–113, 2012.
Article Google Scholar
P. Glynn. Likelihood ratio gradient estimation: an overview, Proceedings of the 1987 Winter Simulation Conference, pp. 366–375, 1987.
Google Scholar
I. Grondman, L. Busoniu, G. A. Lopes, and R. Babuska. A survey of actor-critic reinforcement learning: Standard and natural policy gradients. IEEE Transactions on Systems, Man, and Cybernetics, 42(6), pp. 1291–1307, 2012.
Article Google Scholar
X. Guo, S. Singh, H. Lee, R. Lewis, and X. Wang. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. Advances in NIPS Conference, pp. 3338–3346, 2014.
Google Scholar
H. van Hasselt, A. Guez, and D. Silver. Deep Reinforcement Learning with Double Q-Learning. AAAI Conference, 2016.
Google Scholar
N. Heess et al. Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286, 2017.https://arxiv.org/abs/1707.02286 Video 1 at: https://www.youtube.com/watch?v=hx_bgoTF7bs Video 2 at: https://www.youtube.com/watch?v=gn4nRCC9TwQ&feature=youtu.be
S. Kakade. A natural policy gradient. NIPS Conference, pp. 1057–1063, 2002.
Google Scholar
L. Kocsis and C. Szepesvari. Bandit based monte-carlo planning. ECML Conference, pp. 282–293, 2006.
Google Scholar
M. Lai. Giraffe: Using deep reinforcement learning to play chess. arXiv:1509.01549, 2015.
Google Scholar
S. Levine, C. Finn, T. Darrell, and P. Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39), pp. 1–40, 2016.Video at: https://sites.google.com/site/visuomotorpolicy/
M. Lewis, D. Yarats, Y. Dauphin, D. Parikh, and D. Batra. Deal or No Deal? End-to-End Learning for Negotiation Dialogues. arXiv:1706.05125, 2017.https://arxiv.org/abs/1706.05125
J. Li, W. Monroe, A. Ritter, M. Galley,, J. Gao, and D. Jurafsky. Deep reinforcement learning for dialogue generation. arXiv:1606.01541, 2016.https://arxiv.org/abs/1606.01541
Y. Li. Deep reinforcement learning: An overview. arXiv:1701.07274, 2017.https://arxiv.org/abs/1701.07274
L.-J. Lin. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document, 1993.
Google Scholar
C. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. International Conference on Learning Representations, 2015.
Google Scholar
V. Mnih et al. Human-level control through deep reinforcement learning. Nature, 518 (7540), pp. 529–533, 2015.
Article Google Scholar
V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller. Playing atari with deep reinforcement learning. arXiv:1312.5602., 2013.https://arxiv.org/abs/1312.5602
V. Mnih et al. Asynchronous methods for deep reinforcement learning. ICML Confererence, pp. 1928–1937, 2016.
Google Scholar
V. Mnih, N. Heess, and A. Graves. Recurrent models of visual attention. NIPS Conference, pp. 2204–2212, 2014.
Google Scholar
A. Moore and C. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13(1), pp. 103–130, 1993.
Google Scholar
M. Müller, M. Enzenberger, B. Arneson, and R. Segal. Fuego - an open-source framework for board games and Go engine based on Monte-Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games, 2, pp. 259–270, 2010.
Article Google Scholar
K. S. Narendra and K. Parthasarathy. Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks, 1(1), pp. 4–27, 1990.
Article Google Scholar
A. Ng and M. Jordan. PEGASUS: A policy search method for large MDPs and POMDPs. Uncertainity in Artificial Intelligence, pp. 406–415, 2000.
Google Scholar
J. Peters and S. Schaal. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4), pp. 682–697, 2008.
Article Google Scholar
D. Pomerleau. ALVINN, an autonomous land vehicle in a neural network. Technical Report, Carnegie Mellon University, 1989.
Google Scholar
G. Rummery and M. Niranjan. Online Q-learning using connectionist systems (Vol. 37). University of Cambridge, Department of Engineering, 1994.
Google Scholar
A. Samuel. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development, 3, pp. 210–229, 1959.
Article MathSciNet Google Scholar
W. Saunders, G. Sastry, A. Stuhlmueller, and O. Evans. Trial without Error: Towards Safe Reinforcement Learning via Human Intervention. arXiv:1707.05173, 2017.https://arxiv.org/abs/1707.05173
S. Schaal. Is imitation learning the route to humanoid robots? Trends in Cognitive Sciences, 3(6), pp. 233–242, 1999.
Article Google Scholar
T. Schaul, J. Quan, I. Antonoglou, and D. Silver. Prioritized experience replay. arXiv:1511.05952, 2015.https://arxiv.org/abs/1511.05952
J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz. Trust region policy optimization. ICML Conference, 2015.
Google Scholar
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. ICLR Conference, 2016.
Google Scholar
I. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building end-to-end dialogue systems using generative hierarchical neural network models. AAAI Conference, pp. 3776–3784, 2016.
Google Scholar
D. Silver et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529.7587, pp. 484–489, 2016.
Article Google Scholar
D. Silver et al. Mastering the game of go without human knowledge. Nature, 550.7676, pp. 354–359, 2017.
Article Google Scholar
D. Silver et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv, 2017.https://arxiv.org/abs/1712.01815
H. Simon. The Sciences of the Artificial. MIT Press, 1996.
Google Scholar
I. Sutskever and V. Nair. Mimicking Go experts with convolutional neural networks. International Conference on Artificial Neural Networks, pp. 101–110, 2008.
Google Scholar
R. Sutton. Learning to Predict by the Method of Temporal Differences, Machine Learning, 3, pp. 9–44, 1988.
Google Scholar
R. Sutton and A. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
Google Scholar
R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. NIPS Conference, pp. 1057–1063, 2000.
Google Scholar
G. Tesauro. Practical issues in temporal difference learning. Advances in NIPS Conference, pp. 259–266, 1992.
Google Scholar
G. Tesauro. Td-gammon: A self-teaching backgammon program. Applications of Neural Networks, Springer, pp. 267–285, 1992.
Google Scholar
G. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), pp. 58–68, 1995.
Article Google Scholar
S. Thrun. Learning to play the game of chess NIPS Conference, pp. 1069–1076, 1995.
Google Scholar
Y. Tian, Q. Gong, W. Shang, Y. Wu, and L. Zitnick. ELF: An extensive, lightweight and flexible research platform for real-time strategy games. arXiv:1707.01067, 2017.https://arxiv.org/abs/1707.01067
O. Vinyals and Q. Le. A Neural Conversational Model. arXiv:1506.05869, 2015.https://arxiv.org/abs/1506.05869
C. J. H. Watkins. Learning from delayed rewards. PhD Thesis, King’s College, Cambridge, 1989.
Google Scholar
C. J. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3–4), pp. 279–292, 1992.
MATH Google Scholar
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4), pp. 229–256, 1992.
MATH Google Scholar
K. Xu et al. Show, attend, and tell: Neural image caption generation with visual attention. ICML Confererence, 2015.
Google Scholar
V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv:1709.00103, 2017.https://arxiv.org/abs/1709.00103
B. Zoph and Q. V. Le. Neural architecture search with reinforcement learning. arXiv:1611.01578, 2016.https://arxiv.org/abs/1611.01578
https://www.cs.toronto.edu/~kriz/cifar.html
http://www.bbc.com/news/technology-35785875
https://deepmind.com/blog/exploring-mysteries-alphago/
http://selfdrivingcars.mit.edu/
http://karpathy.github.io/2016/05/31/rl/
https://github.com/hughperkins/kgsgo-dataset-preprocessor
https://www.wired.com/2016/03/two-moves-alphago-lee-sedol-redefined-future/
https://qz.com/639952/ googles-ai-won-the-game-go-by-defying-millennia-of-basic-human-instinct/
http://www.mujoco.org/
https://sites.google.com/site/gaepapersupp/home
https://drive.google.com/file/d/0B9raQzOpizn1TkRIa241ZnBEcjQ/view
https://www.youtube.com/watch?v=1L0TKZQcUtA&list=PLrAXtmErZgOeiKm4sgNOkn– GvNjby9efdf
https://openai.com/
https://www.youtube.com/watch?v=2pWv7GOvuf0
https://gym.openai.com
https://universe.openai.com
https://github.com/facebookresearch/ParlAI
https://github.com/openai/baselines
https://github.com/carpedm20/deep-rl-tensorflow
https://github.com/matthiasplappert/keras-rl
http://apollo.auto/

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, International Business Machines, Yorktown Heights, NY, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2018). Deep Reinforcement Learning. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-94463-0_9
Published: 26 August 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94462-3
Online ISBN: 978-3-319-94463-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics