Reinforcement Learning with Neural Networks: Tricks of the Trade

Gatti, Christopher J.; Embrechts, Mark J.

doi:10.1007/978-3-642-28696-4_11

Christopher J. Gatti⁴ &
Mark J. Embrechts⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 410))

2191 Accesses
3 Citations
3 Altmetric

Abstract

Reinforcement learning enables the learning of optimal behavior in tasks that require the selection of sequential actions. This method of learning is based on interactions between an agent and its environment. Through repeated interactions with the environment, and the receipt of rewards, the agent learns which actions are associated with the greatest cumulative reward.

This work describes the computational implementation of reinforcement learning. Specifically, we present reinforcement learning using a neural network to represent the valuation function of the agent, as well as the temporal difference algorithm, which is used to train the neural network. The purpose of this work is to present the bare essentials in terms of what is necessary for one to understand how to apply reinforcement learning using a neural network. Additionally, we describe two example implementations of reinforcement learning using the board games of Tic-Tac-Toe and Chung Toi, a challenging extension to Tic-Tac-Toe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Groen, F., Amato, N., Bonarini, A., Yoshida, E., Kröse, B. (eds.) Proc. of the 8th Conf. on Intell., Amsterdam, The Netherlands, pp. 438–445 (2004)
Google Scholar
Binkley, K.J., Seehart, K., Hagiwara, M.: A study of artificial neural network architectures for Othello evaluation functions. Trans. Jpn. Soc. Artif. Intell. 22(5), 461–471 (2007)
Article Google Scholar
Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (1999)
Article Google Scholar
Embrechts, M.J., Hargis, B.J., Linton, J.D.: An augmented efficient backpropagation training strategy for deep autoassociative neural networks. In: Proc. of the 15th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 28-30, pp. 141–146 (2010)
Google Scholar
Gatti, C.J., Linton, J.D., Embrechts, M.J.: A brief tutorial on reinforcement learning: The game of Chung Toi. In: Proc. of the 19th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 27-29 (2011)
Google Scholar
Ghory, I.: Reinforcement Learning in Board Games. Technical Report CSTR-04-004, Department of Computer Science. University of Bristol (2004)
Google Scholar
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, New York (2008)
Google Scholar
Konen, W., Bartz–Beielstein, T.: Reinforcement Learning: Insights from Interesting Failures in Parameter Selection. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 478–487. Springer, Heidelberg (2008)
Chapter Google Scholar
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient Backprop. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)
Chapter Google Scholar
Mannen, H., Wiering, M.: Learning to play chess using TD(λ)–learning with database games. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 72–79 (2004)
Google Scholar
Moore, A.: Efficient memory-based learning for robot control. PhD Thesis. University of Cambridge (1990)
Google Scholar
Patist, J.P., Wiering, M.: Learning to play draughts using temporal difference learning with neural networks and databases. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 87–94 (2004)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1. MIT Press, Cambridge (1986)
Google Scholar
Sutton, R.S.: Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1988)
Google Scholar
Tesauro, G.: Neurogammon: A neural network backgammon program. In: Proc. of the International Joint Conference on Neural Networks., vol. 3, pp. 33–40 (1990)
Google Scholar
Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
MATH Google Scholar
Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 8(3), 58–68 (1995)
Article Google Scholar
Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation. Harvard University, Cambridge, MA (1974)
Google Scholar
Wiering, M.A.: TD learning of game evaluation functions with hierarchical neural architectures. Master’s Thesis. University of Amsterdam (1995)
Google Scholar
Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal different learning. J. Intell. Learn. Syst. & Appl. 2, 57–68 (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Rensselaer Polytechnic Institute, Troy, NY, USA
Christopher J. Gatti & Mark J. Embrechts

Authors

Christopher J. Gatti
View author publications
You can also search for this author in PubMed Google Scholar
Mark J. Embrechts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher J. Gatti .

Editor information

Editors and Affiliations

, Dep. of Electronics Telecommunications, University of Aveiro, Aveiro, 3800 - 193, Portugal
Petia Georgieva
, School of Computing and Communications, Lancaster University, InfoLab21, South Drive, Lancaster, LA1 4WA, Montserrat
Lyudmila Mihaylova
, School of Electrical and Information, University of South Australia, Adelaide, SA 5095, Australia
Lakhmi C Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gatti, C.J., Embrechts, M.J. (2013). Reinforcement Learning with Neural Networks: Tricks of the Trade. In: Georgieva, P., Mihaylova, L., Jain, L. (eds) Advances in Intelligent Signal Processing and Data Mining. Studies in Computational Intelligence, vol 410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28696-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-28696-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28695-7
Online ISBN: 978-3-642-28696-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics