Abstract
Reinforcement learning enables the learning of optimal behavior in tasks that require the selection of sequential actions. This method of learning is based on interactions between an agent and its environment. Through repeated interactions with the environment, and the receipt of rewards, the agent learns which actions are associated with the greatest cumulative reward.
This work describes the computational implementation of reinforcement learning. Specifically, we present reinforcement learning using a neural network to represent the valuation function of the agent, as well as the temporal difference algorithm, which is used to train the neural network. The purpose of this work is to present the bare essentials in terms of what is necessary for one to understand how to apply reinforcement learning using a neural network. Additionally, we describe two example implementations of reinforcement learning using the board games of Tic-Tac-Toe and Chung Toi, a challenging extension to Tic-Tac-Toe.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Groen, F., Amato, N., Bonarini, A., Yoshida, E., Kröse, B. (eds.) Proc. of the 8th Conf. on Intell., Amsterdam, The Netherlands, pp. 438–445 (2004)
Binkley, K.J., Seehart, K., Hagiwara, M.: A study of artificial neural network architectures for Othello evaluation functions. Trans. Jpn. Soc. Artif. Intell. 22(5), 461–471 (2007)
Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (1999)
Embrechts, M.J., Hargis, B.J., Linton, J.D.: An augmented efficient backpropagation training strategy for deep autoassociative neural networks. In: Proc. of the 15th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 28-30, pp. 141–146 (2010)
Gatti, C.J., Linton, J.D., Embrechts, M.J.: A brief tutorial on reinforcement learning: The game of Chung Toi. In: Proc. of the 19th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 27-29 (2011)
Ghory, I.: Reinforcement Learning in Board Games. Technical Report CSTR-04-004, Department of Computer Science. University of Bristol (2004)
Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, New York (2008)
Konen, W., Bartz–Beielstein, T.: Reinforcement Learning: Insights from Interesting Failures in Parameter Selection. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 478–487. Springer, Heidelberg (2008)
LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient Backprop. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)
Mannen, H., Wiering, M.: Learning to play chess using TD(λ)–learning with database games. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 72–79 (2004)
Moore, A.: Efficient memory-based learning for robot control. PhD Thesis. University of Cambridge (1990)
Patist, J.P., Wiering, M.: Learning to play draughts using temporal difference learning with neural networks and databases. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 87–94 (2004)
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1. MIT Press, Cambridge (1986)
Sutton, R.S.: Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1988)
Tesauro, G.: Neurogammon: A neural network backgammon program. In: Proc. of the International Joint Conference on Neural Networks., vol. 3, pp. 33–40 (1990)
Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)
Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 8(3), 58–68 (1995)
Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation. Harvard University, Cambridge, MA (1974)
Wiering, M.A.: TD learning of game evaluation functions with hierarchical neural architectures. Master’s Thesis. University of Amsterdam (1995)
Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal different learning. J. Intell. Learn. Syst. & Appl. 2, 57–68 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Gatti, C.J., Embrechts, M.J. (2013). Reinforcement Learning with Neural Networks: Tricks of the Trade. In: Georgieva, P., Mihaylova, L., Jain, L. (eds) Advances in Intelligent Signal Processing and Data Mining. Studies in Computational Intelligence, vol 410. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28696-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-28696-4_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28695-7
Online ISBN: 978-3-642-28696-4
eBook Packages: EngineeringEngineering (R0)