Skip to main content

Reinforcement Learning with Neural Networks: Tricks of the Trade

  • Chapter
Advances in Intelligent Signal Processing and Data Mining

Part of the book series: Studies in Computational Intelligence ((SCI,volume 410))

Abstract

Reinforcement learning enables the learning of optimal behavior in tasks that require the selection of sequential actions. This method of learning is based on interactions between an agent and its environment. Through repeated interactions with the environment, and the receipt of rewards, the agent learns which actions are associated with the greatest cumulative reward.

This work describes the computational implementation of reinforcement learning. Specifically, we present reinforcement learning using a neural network to represent the valuation function of the agent, as well as the temporal difference algorithm, which is used to train the neural network. The purpose of this work is to present the bare essentials in terms of what is necessary for one to understand how to apply reinforcement learning using a neural network. Additionally, we describe two example implementations of reinforcement learning using the board games of Tic-Tac-Toe and Chung Toi, a challenging extension to Tic-Tac-Toe.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Groen, F., Amato, N., Bonarini, A., Yoshida, E., Kröse, B. (eds.) Proc. of the 8th Conf. on Intell., Amsterdam, The Netherlands, pp. 438–445 (2004)

    Google Scholar 

  2. Binkley, K.J., Seehart, K., Hagiwara, M.: A study of artificial neural network architectures for Othello evaluation functions. Trans. Jpn. Soc. Artif. Intell. 22(5), 461–471 (2007)

    Article  Google Scholar 

  3. Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (1999)

    Article  Google Scholar 

  4. Embrechts, M.J., Hargis, B.J., Linton, J.D.: An augmented efficient backpropagation training strategy for deep autoassociative neural networks. In: Proc. of the 15th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 28-30, pp. 141–146 (2010)

    Google Scholar 

  5. Gatti, C.J., Linton, J.D., Embrechts, M.J.: A brief tutorial on reinforcement learning: The game of Chung Toi. In: Proc. of the 19th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 27-29 (2011)

    Google Scholar 

  6. Ghory, I.: Reinforcement Learning in Board Games. Technical Report CSTR-04-004, Department of Computer Science. University of Bristol (2004)

    Google Scholar 

  7. Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, New York (2008)

    Google Scholar 

  8. Konen, W., Bartz–Beielstein, T.: Reinforcement Learning: Insights from Interesting Failures in Parameter Selection. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 478–487. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient Backprop. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Mannen, H., Wiering, M.: Learning to play chess using TD(λ)–learning with database games. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 72–79 (2004)

    Google Scholar 

  11. Moore, A.: Efficient memory-based learning for robot control. PhD Thesis. University of Cambridge (1990)

    Google Scholar 

  12. Patist, J.P., Wiering, M.: Learning to play draughts using temporal difference learning with neural networks and databases. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 87–94 (2004)

    Google Scholar 

  13. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1. MIT Press, Cambridge (1986)

    Google Scholar 

  14. Sutton, R.S.: Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988)

    Google Scholar 

  15. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1988)

    Google Scholar 

  16. Tesauro, G.: Neurogammon: A neural network backgammon program. In: Proc. of the International Joint Conference on Neural Networks., vol. 3, pp. 33–40 (1990)

    Google Scholar 

  17. Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)

    MATH  Google Scholar 

  18. Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 8(3), 58–68 (1995)

    Article  Google Scholar 

  19. Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation. Harvard University, Cambridge, MA (1974)

    Google Scholar 

  20. Wiering, M.A.: TD learning of game evaluation functions with hierarchical neural architectures. Master’s Thesis. University of Amsterdam (1995)

    Google Scholar 

  21. Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal different learning. J. Intell. Learn. Syst. & Appl. 2, 57–68 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher J. Gatti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Gatti, C.J., Embrechts, M.J. (2013). Reinforcement Learning with Neural Networks: Tricks of the Trade. In: Georgieva, P., Mihaylova, L., Jain, L. (eds) Advances in Intelligent Signal Processing and Data Mining. Studies in Computational Intelligence, vol 410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28696-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28696-4_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28695-7

  • Online ISBN: 978-3-642-28696-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics