Abstract
In order to scale to large state-spaces, reinforcement learning (RL) algorithms need to apply function approximation techniques. Research on function approximation for RL has so far focused either on global methods with a static structure or on constructive architectures using locally responsive units. The former, whilst achieving some notable successes, has also failed on some relatively simple tasks. The locally constructive approach is more stable, but may scale poorly to higher-dimensional inputs. This paper examines two globally constructive algorithms based on the Cascor supervised-learning algorithm. These algorithms are applied within the sarsa RL algorithm, and their performance compared against a multi-layer perceptron and a locally constructive algorithm (the Resource Allocating Network). It is shown that the globally constructive algorithms are less stable, but that on some tasks they achieve similar performance to the RAN, whilst generating more compact solutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
Crites, R.H., Barto, A.G.: Improving Elevator Performance Using Reinforcement Learning, NIPS-8 (1996)
Tesauro, G.J.: Temporal difference learning and TD-Gammon. Communications of the ACM 38(3), 58–68 (1995)
Sutton, R.S.: Generalisation in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky, D.S., Mozer, M.C., Hasselmo, M.E. (eds.) Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1038–1044. The MIT Press, Cambridge (1996)
Kretchmar, R.M., Anderson, C.W.: Comparison of CMACs and RBFs for local function approximators in reinforcement learning. IEEE International Conference on Neural Networks (1997)
Coulom, R.: Feedforward Neural Networks in Reinforcement Learning Applied to High-dimensional Motor Control. In: ALT 2002. Springer, Heidelberg (2002)
Thrun, S., Schwartz, A.: Issues in Using Function Approximation for Reinforcement Learning. In: Proceedings of the Fourth Connectionist Models Summer School, Hillsdale, NJ (December 1993)
Anderson, C.W.: Q-learning with hidden unit restarting. Advances in Neural Information Processing Systems (1993)
Ratitch, B., Precup, D.: Sparse Distributed Memories for On-Line Value-Based Reinforcement Learning. In: ECML (2004)
Fahlman, S.E., Lebiere, C.: The Cascade-Correlation Learning Architecture. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing II. Morgan Kauffman, San Francisco (1990)
Waugh, S.G.: Extending and benchmarking Cascade-Correlation, PhD thesis, Department of Computer Science, University of Tasmania (1995)
Rivest, F., Precup, D.: Combining TD-learning with Cascade-correlation Networks. In: Twentieth International Conference on Machine Learning, Washington DC (2003)
Bellemare, M.G., Precup, D., Rivest, F.: Reinforcement Learning Using Cascade-Correlation Neural Networks, Technical Report RL-3.04, McGill University (2004)
Platt, J.: A Resource-Allocating Network for Function Interpolation. Neural Computation 3, 213–225 (1991)
Rummery, G., Niranjan, M.: On-line Q-Learning Using Connectionist Systems. Cambridge, Cambridge University Engineering Department (1994)
Adams, A., Waugh, S.: Function Evaluation and the Cascade-Correlation Architecture. In: IEEE International Conference on Neural Networks, pp. 942–946 (1995)
Hwang, J.-H., You, S.-S., et al.: The Cascade-Correlation Learning: A Projection Pursuit Learning Perspective. IEEE Transactions on Neural Networks 7(2), 278–288 (1996)
Prechelt, L.: Investigation of the CasCor Family of Learning Algorithms. Neural Networks 10(5), 885–896 (1997)
Lahnajarvi, J.J.T., Lehtokangas, M.I., Saarinen, J.P.P.: Evaluation of constructive neural networks with cascaded architectures. Neurocomputing 48, 573–607 (2002)
Fahlman, S.E.: Faster-Learning Variations on Back-Propagation: An Empirical Study. In: Proceedings of the 1988 Connectionist Models Summer School (1988)
Boyan, J.A., Moore, A.W.: Generalization in reinforcement learning: Safely approximating the value function, NIPS-7 (1995)
Adams, A., Vamplew, P.: Encoding and Decoding Cyclic Data. The South Pacific Journal of Natural Science 16, 54–58 (1998)
Sutton, R., Barto, S.: Reinforcement Learning. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vamplew, P., Ollington, R. (2005). Global Versus Local Constructive Function Approximation for On-Line Reinforcement Learning. In: Zhang, S., Jarvis, R. (eds) AI 2005: Advances in Artificial Intelligence. AI 2005. Lecture Notes in Computer Science(), vol 3809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11589990_14
Download citation
DOI: https://doi.org/10.1007/11589990_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30462-3
Online ISBN: 978-3-540-31652-7
eBook Packages: Computer ScienceComputer Science (R0)